# LEIP Optimize

LEIP Optimize is a state-of-the-art model optimizer that applies post-quantization algorithms to a model and produces a binary representation based on the target specified by the user. The binary is in the form of a shared object file that is loaded into a small runtime for its execution.

Internally it consists of two phases:

LEIP Compress

Deep Neural Networks (DNNs) use a large number of parameters to learn. As a result, they pose a large memory and compute footprint during runtime inference. Such resource constraints limit their deployment to edge devices, and these have limited memory, size, and power budgets. LEIP Compress provides developers with state-of-art quantization optimization to facilitate deployment of edge AI solutions.

### Quantization Algorithms

With quantization, LEIP Compress transforms numerical representations of the DNN parameters from floating point to integers. This results in a lower memory footprint and faster computation. LEIP supports the following Post Training Quantization (PTQ) techniques:

#### Symmetric

First, the maximum M of the inputs x_f in absolute value, M=max(|xf|), is selected. The floating point range that is effectively being quantized is symmetric with respect to zero as is the quantized range.

#### Asymmetric

In asymmetric quantization, the min/max in the float range is mapped to the min/max of the integer range. The is performed by using a zero-point (also called quantization bias, or offset) in addition to the scale factor.

#### Per-Channel

This quantization is used when standard symmetric, asymmetric, and powers-of-two algorithms fail to achieve a level of performance. This could happen when the resolution of 256 values is not sufficient to encode the behavior of the network. The per-channel algorithm provides improvements by quantizing for each channel. Convolutional and dense layers consist of a significant number of channels. Instead of quantizing all of them in bulk, per-channel quantization can be used to quantize each channel separately to provide accuracy improvements.

The efficacy of the quantization techniques highly depends on the model and training dataset, and LEIP Compress provides the ability to explore their use, from the simplest to the more complex ones. The core thesis of these quantization algorithms is to analyze the distribution of floating point values and provide a mapping to integer values while minimizing loss in overall accuracy.

### Optimizations

There are two other optimization techniques that can be used in conjunction with the process of casting to integer in addition to the quantization algorithms offered by LEIP Compress. These additional optimizations may or may not benefit the overall accuracy after quantization depending on the type of model and the specific distributions of values for its parameters. But in some cases they can boost the accuracy to within a close percentage of the original baseline.

#### Tensor Splitting

LEIP SDK supports a quantization technique called Tensor Splitting. Tensors are decomposed into sub-tensors to allow for a separate and more optimal compression ratio. The algorithm provides a flow to automatically determine the layers whose tensors should be split using a predefined heuristic.

To try out this optimization, simply add `--compress_optimization tensor_splitting`

to the `leip optimize`

command, as shown in the example in the Using LEIP document. Depending on the size of the model and its layers, the Tensor Splitting optimization pass could take several minutes.

#### Bias Correction

LEIP SDK supports a quantization technique called Bias Correction. Generally, quantization introduces a biased error in the output activations. Bias Correction will calibrate the model and adjust the biases to reduce this error. In some cases, this optimization will significantly improve the modelâ€™s performance.

To try out Bias Correction, simply add `--compress_optimization bias_correction`

as shown in the example in the LEIP Introduction document. Depending on the size of the model and its layers, the Bias Correction optimization pass could take several minutes.

The `tensor_splitting`

and `bias_correction`

optimizations can be cascaded together by specifying `--optimization tensor_splitting,bias_correction`

.

### Target Data Type

LEIP Optimize provides two types of outputs based on the `--quantizer`

argument:

When

`--quantizer=none`

, the model is only compiled, leaving weights in their original type.When

`--quantizer=asymmetric`

,`--quantizer=symmetric`

, or`--quantizer=symmetricpc`

, both weights and operations are quantized, and the output model has*(u)int8*weights.

LEIP Compile

This phase also has an independent command `leip compile`

and is described in LEIP Compile.

## CLI Usage

The basic command is:

```
leip optimize --input_path inputModel/ \
--output_path optimizedModel/ \
--rep_dataset rep_dataset.txt
```

For a detailed explanation of each option, refer to the CLI Reference for LEIP Optimize.