Skip to content

Quick Start Guide

Basic Workflow

The typical workflow with Forge consists of these main steps:

1. Select your Target

Choose the appropriate backend based on your deployment target:

  • NVIDIA GPU → Use the ONNXRuntime backend: forge.ONNXModule for efficient GPU execution.
  • Android or CPU → Use the TVM backend: forge.RelayModule for hardware-aware optimizations and better CPU performance.

2. Load your Model

  • Ingest your trained model into TVM or ONNX backend.
  • Convert to Forge's internal representation (IR).

3. Quantize (Optional)

  • Apply quantization to reduce model size.
  • Calibrate using representative data.
  • Choose between static and dynamic quantization.

4. Compile/Export

Code Examples for Basic Optimization

TensorRT

import forge

# 1. Load ONNX model
ir = forge.ONNXModule("path/to/your/model.onnx")

# 2. Optimize with quantization
ir.calibrate(calibration_dataset)
ir.quantize(activation_dtype="uint8", kernel_dtype="uint8", quant_type='static')

# 3. Export
ir.export("optimized_model.onnx", is_tensorrt=True)

Android

import forge

# 1. Load model
ir = forge.RelayModule.from_tflite(tflite_model)

# 2. Compile for Android
target = "android/cpu"
ir.compile(target=target, output_path="android_optimized")

Generic CPU

import forge

# 1. Load your model
ir = forge.RelayModule.from_onnx(onnx_model)

# 2. Optimize with quantization (optional)
ir.calibrate(calibration_dataset)
ir.quantize(activation_dtype="uint8", kernel_dtype="uint8", quant_type="static")

# 3. Compile
target = "llvm"  # or specific CPU target like "llvm -mcpu=cascadelake"
ir.compile(target=target, output_path="cpu_optimized")

Next Steps