The LEIP SDK will automatically generate and save a compression report when running LEIP Optimize. To view the report after it is generated, open the HTML file compression_report.html that is in the folder you defined as --output_path when running the command.

Explore the compression report per layer within your network by:

  • Pre-Quantization vs. Post Quantization Weights - Histograms of the weights of the tensors pre- and post-quantization. Color of the post-histogram indicates relative quantization error compared to other tensors, with red being relatively worse, and green being relatively better.

  • Number - The order the layers appear within the network, 1 being the first layer.

  • Layer Name - The internal name of the layer/tensor.

  • Original Weights Range - The minimum and maximum weight value found in the original tensor.

  • Compressed Weights Range - The minimum and maximum weight value found in the compressed tensor.

  • Quantization Error - A measurement of how much the tensor weights have changed.

  • Number of Elements - The number of individual scalar values found in the tensor, obtained by multiplying the numbers in the shape together.

Note that only the weight tensor inputs to convolution-like ops are shown in the compression report. When the --use_legacy_quantizer is used on Tensorflow models, the bias input tensor to a convolution-like op is also included.

Tensors used as temporary input and output buffers of convolutions-like ops are not shown. Tensors for other ops are also not shown.

  • Models using symmetric per-channel quantizers.

  • Pytorch models optimized using the --use_legacy_quantizer parameter.

  • Models optimized using the CUDA Int8 path

Click to view example
Select a histogram pair to see more details in a model: