Quick Start Guide¶
Basic Workflow¶
The typical workflow with Forge consists of these main steps:
1. Select your Target
Choose the appropriate backend based on your deployment target:
- NVIDIA GPU → Use the ONNXRuntime backend: forge.ONNXModule for efficient GPU execution.
- Android or CPU → Use the TVM backend: forge.RelayModule for hardware-aware optimizations and better CPU performance.
2. Load your Model
- Ingest your trained model into TVM or ONNX backend.
- Convert to Forge's internal representation (IR).
3. Quantize (Optional)
- Apply quantization to reduce model size.
- Calibrate using representative data.
- Choose between static and dynamic quantization.
4. Compile/Export
- For TVM, select your target and compile your model.
- For ONNX, select TensorRT or non-TensorRT and export your model.
Code Examples for Basic Optimization¶
TensorRT¶
import forge
# 1. Load ONNX model
ir = forge.ONNXModule("path/to/your/model.onnx")
# 2. Optimize with quantization
ir.calibrate(calibration_dataset)
ir.quantize(activation_dtype="uint8", kernel_dtype="uint8", quant_type='static')
# 3. Export
ir.export("optimized_model.onnx", is_tensorrt=True)
Android¶
import forge
# 1. Load model
ir = forge.RelayModule.from_tflite(tflite_model)
# 2. Compile for Android
target = "android/cpu"
ir.compile(target=target, output_path="android_optimized")
Generic CPU¶
import forge
# 1. Load your model
ir = forge.RelayModule.from_onnx(onnx_model)
# 2. Optimize with quantization (optional)
ir.calibrate(calibration_dataset)
ir.quantize(activation_dtype="uint8", kernel_dtype="uint8", quant_type="static")
# 3. Compile
target = "llvm" # or specific CPU target like "llvm -mcpu=cascadelake"
ir.compile(target=target, output_path="cpu_optimized")
Next Steps¶
- Explore detailed tutorials.
- Consult the API documentation.
- Learn about deployment options.