The LEIP SDK will automatically generate and save a compression report when running LEIP Optimize. To view the report after it is generated, open the HTML file
compression_report.html that is in the folder you defined as
--output_path when running the command.
Explore the compression report per layer within your network by:
Pre-Quantization vs. Post Quantization Weights - Histograms of the weights of the tensors pre- and post-quantization. Color of the post-histogram indicates relative quantization error compared to other tensors, with red being relatively worse, and green being relatively better.
Number - The order the layers appear within the network, 1 being the first layer.
Layer Name - The internal name of the layer/tensor.
Original Weights Range - The minimum and maximum weight value found in the original tensor.
Compressed Weights Range - The minimum and maximum weight value found in the compressed tensor.
Quantization Error - A measurement of how much the tensor weights have changed.
Number of Elements - The number of individual scalar values found in the tensor, obtained by multiplying the numbers in the shape together.
Note that only the weight tensor inputs to convolution-like ops are shown in the compression report. When the
--use_legacy_quantizer is used on Tensorflow models, the bias input tensor to a convolution-like op is also included.
Tensors used as temporary input and output buffers of convolutions-like ops are not shown. Tensors for other ops are also not shown.
Models using symmetric per-channel quantizers.
Pytorch models optimized using the
Models optimized using the CUDA Int8 path