Optimize a Quantized Model for an Android Target¶

Mobile phones are ubiquitous edge devices with AI capabilities. Many mobile phones run the Android operating system. Quantized models are popular for mobile devices for faster inference and less energy consumption. An accuracy compromise is generally acceptable in these devices to achieve real-time inference, and these devices also commonly have optimized integer execution. This tutorial provides step-by-step instructions for importing a quantized mobile device model from Kaggle into Forge, applying optimizations, and creating an artifact to be deployed.

Environment setup¶

We will be using Kaggle to download a quantized mobile device object detection model.

In [ ]:

Copied!

! pip install kagglehub
! pip install tflite
! pip install kagglehub
! pip install tflite

Download a model and compile it for Android¶

We will download the model from Kaggle and load it with TensorFlow Lite (tflite).

In [ ]:

Copied!

import kagglehub

# Download latest version
path = kagglehub.model_download("iree/ssd-mobilenet-v1/tfLite/100-320-uint8-nms")

print("Path to model files:", path)
import kagglehub

# Download latest version
path = kagglehub.model_download("iree/ssd-mobilenet-v1/tfLite/100-320-uint8-nms")

print("Path to model files:", path)

In [ ]:

Copied!





from pathlib import Path
import tflite

tflite_model_file = Path(path) / "1.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)
from pathlib import Path
import tflite

tflite_model_file = Path(path) / "1.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)

Then we will load it into Forge for optimization.

In [ ]:

Copied!

import forge
ir = forge.RelayModule.from_tflite(tflite_model)
import forge
ir = forge.RelayModule.from_tflite(tflite_model)

You can use the forge.IRModule class to introspect into the model. We can see that this model expects uint8 inputs.

In [ ]:

Copied!

ir.input_dtypes
ir.input_dtypes

We can also verify that the model is quantized.

In [ ]:

Copied!

ir.is_quantized
ir.is_quantized

We will set the target and create a directory to save the compiled output.

In [ ]:

Copied!

target =  "android/cpu"
target =  "android/cpu"

In [ ]:

Copied!





import os
optimized_model_dir = "detector_quantized"
if not os.path.exists(optimized_model_dir):
    os.makedirs(optimized_model_dir)
import os
optimized_model_dir = "detector_quantized"
if not os.path.exists(optimized_model_dir):
    os.makedirs(optimized_model_dir)

Now you can compile the model.

In [ ]:

Copied!

ir.compile(target=target, output_path=f"{optimized_model_dir}", force_overwrite=True, export_metadata=True)
ir.compile(target=target, output_path=f"{optimized_model_dir}", force_overwrite=True, export_metadata=True)

The compiled model is all you need to run inference on this model using the Android LRE.

If you're creating your own Android application using our SDK, you would only need this compiled model library.

Our sample application does not support this model out-of-the-box. You will need to modify the application to pass integer input data and expect integer outputs. But rest of the application will behave similar to the non-quantized model tutorial.