Skip to main content
Skip table of contents

Quantization Guided Training (QGT)

This training documents the use of the Latent AI SDK for performing Quantization Guided Training (QGT) using a model of your choice. What follows is example usage of the classes provided by the Latent AI SDK for quantized training within the Tensorflow 2.x Keras framework. Note that the Keras .h5 format is currently supported. This example will construct a Lenet model to be trained on MNIST. The resulting model will be quantized to 4 bits.


For information on using the version of QGT for Pytorch, see LEIP Quantization Guided Training for Pytorch .

Note, if you would prefer to use the leip train command to simplify most of the steps of integrating with the Latent AI SDK for Quantization Guided Training, see the LEIP Train page.

Example

In this example we will import the following modules:

CODE
import tensorflow as tf
import leip.core.train.utilities as utils
from leip.core.train.regularizers.quantization_guided_regularizer import QuantizationGuidedRegularizer
from leip.core.train.constraints.quantization_guided_constraint import QuantizationGuidedConstraint
from leip.core.train.quantization_guided_callback import QGMonitoringCallback

Load Data

Now let’s load the MNIST training and test data and normalize the images to floating point:

CODE
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0

Build Model

Next we will construct a Lenet model using the following Keras API calls:

CODE
tf.keras.backend.clear_session()
model = tf.keras.Sequential(name="LeNet-5",)
model.add(tf.keras.layers.InputLayer(input_shape=(28, 28)))
model.add(tf.keras.layers.Reshape(target_shape=(28, 28, 1)))
model.add(tf.keras.layers.Conv2D(filters=6, kernel_size=(3, 3), activation='relu',))
model.add(tf.keras.layers.AveragePooling2D())
model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=(3, 3), activation='relu',))
model.add(tf.keras.layers.AveragePooling2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=120, activation='relu',))
model.add(tf.keras.layers.Dense(units=84, activation='relu',))
model.add(tf.keras.layers.Dense(units=10, activation='softmax',))

Batchnorm Folding

This step is optional. Folding batchnorm layers can significantly improve a model’s inference latency. To accurately simulate the quantization effects in the folded model, we need to apply batchnorm folding and transform the graph for training too. To enable this optional step we would do the following:

CODE
folded_model = fold_weights(model)
model = strip_batchnorms(model, folded_model)

Attach Regularizers

To quantize the model during training, we simply need to attach regularizer declarations and their parameters values to the model. During training, these regularizers and quantization parameters will guide the model towards weights that can be represented more compactly.

The leip.core.train.regularizers.quantization_guided_regularizer.QuantizationGuidedRegularizer class encapsulates a training regularizer used for Quantization Guided Training of the model. This class also provides quantization during the training process. We will use the asymmetric quantization algorithm and quantize to 4 bits. Here let’s create the JSON from a newly constructed QuantizationGuidedRegularizer:

CODE
regularizer_json = QuantizationGuidedRegularizer(
    num_bits=4,
    lambda_1=1.0,  # not used by QuantizationGuidedRegularizer
    lambda_2=1.0,  # regularizes quantization error
    lambda_3=0.0,  # L2 regularization
    lambda_4=1.0,  # not used by QuantizationGuidedRegularizer
    lambda_5=1.0,  # not used by QuantizationGuidedRegularizer
    quantizer_name="asymmetric").get_serialized()

The snippet of JSON created by the above call is shown below:

CODE
{
    "class_name": "QuantizationGuidedRegularizer",
    "config": {
        "num_bits": 4,
        "lambda_1": 1.0,
        "lambda_2": 1.0,
        "lambda_3": 0.0,
        "lambda_4": 1.0,
        "lambda_5": 1.0,
        "quantizer_name": "asymmetric"
    }
}

There are a few ways to attach regularizers to the model layers. Here, we will use the following list_tf_keras_model call to generate a regularizer attachment scheme JSON which specifies attaching the same regularizer JSON we created above to every layer in the model:

CODE
regularizer_attachment_scheme = utils.list_tf_keras_model(
    model,
    return_json=True,
    attach_regularizer_to_all=regularizer_json
)

To take a peek at what the regularizer_attachment_scheme JSON looks like click here .

Next we use that regularizer attachment scheme to attach the regularizers to the model:

CODE
model = utils.attach_regularizers(
    model,
    regularizer_attachment_scheme,
    target_keras_h5_file=None,
    backend_session_reset=True)

Although this next step is optional, here we will set up a quantization callback class to have the training process post updates about the quantization loss:

CODE
quantization_guided_callback = QGMonitoringCallback(epoch_begin=True, epoch_end=True)

Train

Next we compile and train the model:

CODE
model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=['accuracy'])

tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir="./log_dir",
    histogram_freq=0,
    write_graph=True,
    write_images=False,
    update_freq="epoch",
    profile_batch=0,
    embeddings_freq=0,
    embeddings_metadata=None,)

model.fit(
    train_images,
    train_labels,
    batch_size=256,
    epochs=10,
    verbose=1,
    validation_split=0.1,
    callbacks=[tensorboard_callback, quantization_guided_callback],
)

Quantize

Finally, the next step is to apply the learned quantization parameters to the model. This is done by calling the apply_quantization utility function:

CODE
utils.apply_quantization(model)

Evaluate

Now let’s take a look at the accuracy of the resulting quantized model!

CODE
eval_results = model.evaluate(
    test_images,
    test_labels,
    batch_size=1,
    verbose=1)
print("loss", round(float(eval_results[0]), 5))
print("accuracy", round(float(eval_results[1]), 5))

Save Model

To save the model to disk, we can just use the Keras model.save() method:

CODE
model.save(
    "trained_quantized_lenet.h5",
    overwrite=True,
    include_optimizer=False,
    save_format="h5",)

But note that when loading the model from disk, we’ll need to supply the following classes in custom_objects so that Keras can deserialize them from disk:

CODE
tf.keras.backend.clear_session()
model = tf.keras.models.load_model(
    "trained_quantized_lenet.h5",
    custom_objects={
        "QuantizationGuidedRegularizer": QuantizationGuidedRegularizer,
        "QuantizationGuidedConstraint": QuantizationGuidedConstraint
    }, compile=False)

As an alternative, we could have called utils.strip_model_for_inference() to strip the model of the custom objects which are not needed for inference. But note in that case that further Quantizated Training of this model would require re-attaching the regularizer objects, training, and calling utils.apply_quantization() again.

CODE
model = utils.strip_model_for_inference(model)


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.