Quantization Guided Training (QGT)
This training documents the use of the Latent AI SDK for performing Quantization Guided Training (QGT) using a model of your choice. What follows is example usage of the classes provided by the Latent AI SDK for quantized training within the Tensorflow 2.x Keras framework. Note that the Keras .h5 format is currently supported. This example will construct a Lenet model to be trained on MNIST. The resulting model will be quantized to 4 bits.
For information on using the version of QGT for Pytorch, see LEIP Quantization Guided Training for Pytorch .
Note, if you would prefer to use the leip train
command to simplify most of the steps of integrating with the Latent AI SDK for Quantization Guided Training, see the LEIP Train page.
Example
In this example we will import the following modules:
import tensorflow as tf
import leip.core.train.utilities as utils
from leip.core.train.regularizers.quantization_guided_regularizer import QuantizationGuidedRegularizer
from leip.core.train.constraints.quantization_guided_constraint import QuantizationGuidedConstraint
from leip.core.train.quantization_guided_callback import QGMonitoringCallback
Load Data
Now let’s load the MNIST training and test data and normalize the images to floating point:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
Build Model
Next we will construct a Lenet model using the following Keras API calls:
tf.keras.backend.clear_session()
model = tf.keras.Sequential(name="LeNet-5",)
model.add(tf.keras.layers.InputLayer(input_shape=(28, 28)))
model.add(tf.keras.layers.Reshape(target_shape=(28, 28, 1)))
model.add(tf.keras.layers.Conv2D(filters=6, kernel_size=(3, 3), activation='relu',))
model.add(tf.keras.layers.AveragePooling2D())
model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=(3, 3), activation='relu',))
model.add(tf.keras.layers.AveragePooling2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=120, activation='relu',))
model.add(tf.keras.layers.Dense(units=84, activation='relu',))
model.add(tf.keras.layers.Dense(units=10, activation='softmax',))
Batchnorm Folding
This step is optional. Folding batchnorm layers can significantly improve a model’s inference latency. To accurately simulate the quantization effects in the folded model, we need to apply batchnorm folding and transform the graph for training too. To enable this optional step we would do the following:
folded_model = fold_weights(model)
model = strip_batchnorms(model, folded_model)
Attach Regularizers
To quantize the model during training, we simply need to attach regularizer declarations and their parameters values to the model. During training, these regularizers and quantization parameters will guide the model towards weights that can be represented more compactly.
The leip.core.train.regularizers.quantization_guided_regularizer.QuantizationGuidedRegularizer
class encapsulates a training regularizer used for Quantization Guided Training of the model. This class also provides quantization during the training process. We will use the asymmetric
quantization algorithm and quantize to 4
bits. Here let’s create the JSON from a newly constructed QuantizationGuidedRegularizer
:
regularizer_json = QuantizationGuidedRegularizer(
num_bits=4,
lambda_1=1.0, # not used by QuantizationGuidedRegularizer
lambda_2=1.0, # regularizes quantization error
lambda_3=0.0, # L2 regularization
lambda_4=1.0, # not used by QuantizationGuidedRegularizer
lambda_5=1.0, # not used by QuantizationGuidedRegularizer
quantizer_name="asymmetric").get_serialized()
The snippet of JSON created by the above call is shown below:
{
"class_name": "QuantizationGuidedRegularizer",
"config": {
"num_bits": 4,
"lambda_1": 1.0,
"lambda_2": 1.0,
"lambda_3": 0.0,
"lambda_4": 1.0,
"lambda_5": 1.0,
"quantizer_name": "asymmetric"
}
}
There are a few ways to attach regularizers to the model layers. Here, we will use the following list_tf_keras_model
call to generate a regularizer attachment scheme JSON which specifies attaching the same regularizer JSON we created above to every layer in the model:
regularizer_attachment_scheme = utils.list_tf_keras_model(
model,
return_json=True,
attach_regularizer_to_all=regularizer_json
)
To take a peek at what the regularizer_attachment_scheme JSON looks like click here .
Next we use that regularizer attachment scheme to attach the regularizers to the model:
model = utils.attach_regularizers(
model,
regularizer_attachment_scheme,
target_keras_h5_file=None,
backend_session_reset=True)
Although this next step is optional, here we will set up a quantization callback class to have the training process post updates about the quantization loss:
quantization_guided_callback = QGMonitoringCallback(epoch_begin=True, epoch_end=True)
Train
Next we compile and train the model:
model.compile(
optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir="./log_dir",
histogram_freq=0,
write_graph=True,
write_images=False,
update_freq="epoch",
profile_batch=0,
embeddings_freq=0,
embeddings_metadata=None,)
model.fit(
train_images,
train_labels,
batch_size=256,
epochs=10,
verbose=1,
validation_split=0.1,
callbacks=[tensorboard_callback, quantization_guided_callback],
)
Quantize
Finally, the next step is to apply the learned quantization parameters to the model. This is done by calling the apply_quantization
utility function:
utils.apply_quantization(model)
Evaluate
Now let’s take a look at the accuracy of the resulting quantized model!
eval_results = model.evaluate(
test_images,
test_labels,
batch_size=1,
verbose=1)
print("loss", round(float(eval_results[0]), 5))
print("accuracy", round(float(eval_results[1]), 5))
Save Model
To save the model to disk, we can just use the Keras model.save()
method:
model.save(
"trained_quantized_lenet.h5",
overwrite=True,
include_optimizer=False,
save_format="h5",)
But note that when loading the model from disk, we’ll need to supply the following classes in custom_objects
so that Keras can deserialize them from disk:
tf.keras.backend.clear_session()
model = tf.keras.models.load_model(
"trained_quantized_lenet.h5",
custom_objects={
"QuantizationGuidedRegularizer": QuantizationGuidedRegularizer,
"QuantizationGuidedConstraint": QuantizationGuidedConstraint
}, compile=False)
As an alternative, we could have called utils.strip_model_for_inference()
to strip the model of the custom objects which are not needed for inference. But note in that case that further Quantizated Training of this model would require re-attaching the regularizer objects, training, and calling utils.apply_quantization()
again.
model = utils.strip_model_for_inference(model)