Guide to Compilation with Forge¶
This guide will show you how to compile the Forge RelayModule for a range of targets.
Load an RelayModule
import forge
import onnx
onnx_model = onnx.load("path/to/model.onnx")
ir = forge.RelayModule.from_onnx(onnx_model)
Compiling¶
Forge RelayModule can be compiled with its compile()
method. The main thing a user will need to pass to the compile method will be the 'target' designation (to be detailed below). Below is the type-signature and docstring for the method for reference.
RelayModule Compile Method Docstring - Click to Expand & Collapse
forge.RelayModule.compile
¶
compile(
target="llvm",
host=None,
output_path="./compile_output",
opt_level=3,
set_float16=False,
set_channel_layout=None,
export_relay=False,
export_metadata=False,
force_overwrite=False,
uuid=None,
encrypt_password=None,
)
Compiles the model for a specified target with various configuration options.
This method compiles the model for a given target, which can be a string or a dictionary specifying the target attributes. The compilation can be customized through various parameters, including optimization level and data type settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
Union[str, Dict[str, Any]]
|
Can be one of a literal target string, a target tag (pre-defined target alias), a json string describing, a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are: kind : str (required) Which codegen path to use, for example "llvm" or "cuda". keys : List of str (optional) A set of strategies that can be dispatched to. When using "kind=opencl" for example, one could set keys to ["mali", "opencl", "gpu"]. device : str (optional) A single key that corresponds to the actual device being run on. This will be effectively appended to the keys. libs : List of str (optional) The set of external libraries to use. For example ["cblas", "mkl"]. system-lib : bool (optional) If True, build a module that contains self registered functions. Useful for environments where dynamic loading like dlopen is banned. mcpu : str (optional) The specific cpu being run on. Serves only as an annotation. model : str (optional) An annotation indicating what model a workload came from. runtime : str (optional) An annotation indicating which runtime to use with a workload. mtriple : str (optional) The llvm triplet describing the target, for example "arm64-linux-android". mattr : List of str (optional) The llvm features to compile with, for example ["+avx512f", "+mmx"]. mfloat-abi : str (optional) An llvm setting that is one of "hard" or "soft" indicating whether to use hardware or software floating-point operations. mabi : str (optional) An llvm setting. Generate code for the specified ABI, for example "lp64d". host : Union[str, Dict[str, Any]] (optional) Description for target host. Can be recursive. Similar to target. |
'llvm'
|
host
|
Optional[Union[str, Dict[str, Any]]]
|
Similar to target but for target host. Can be one of a literal target host string, a target tag (pre-defined target alias), a json string describing a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are same as target. |
None
|
output_path
|
Optional[Union[str, Path]]
|
The path to save the compiled output, |
'./compile_output'
|
opt_level
|
int
|
Optimization level, ranging from 0 to 4. Larger numbers, correspond with more aggressive compilation optimizations. Default is 3. |
3
|
set_float16
|
bool
|
If True, enables Float16 data type for all operators permitted. Default is False. |
False
|
set_channel_layout
|
Optional[str]
|
Optional specification of the channel layout ("first", "last"), defaults to no changing of layout if None. |
None
|
export_relay
|
bool
|
If True, exports the Relay text representation of the model. Default is False. |
False
|
export_metadata
|
bool
|
If True, exports the metadata JSON of the model as a text file. Default is False. |
False
|
force_overwrite
|
bool
|
If True, the method will overwrite if the provided output path already exists. A ValueError will be thrown if False and the output path already exists. Default is False. |
False
|
uuid
|
Optional[str]
|
Optional specification of a uuid from the user when the model needs to have a unique identifier that will be set by the user, when this value is not set, the uuid will be a randomely generated one. |
None
|
encrypt_password
|
Optional[str]
|
Optional specification of a password if is desirable to have the model encrypted. As an output, there will be the model file and the key. |
None
|
Returns:
Name | Type | Description |
---|---|---|
None |
None
|
The method operates in place. |
Compilation Arguments
The compilation is dictated by the passed options, below will provide some details on the corresponding arguments and its effects on compilation.
target
: A string or dictionary that denotes the targeted hardware for compilation. See here for more details. A simpler method here would be to leverage the pre-defined hardware "tags" (aliases).
Target Tags - Click to Expand & Collapse
There are pre-defined hardware tags (aliases) that can greatly simplify the passing of the target to the compiler. For example, one only needs to pass "raspberry-pi/5" instead of its detailed target description, "llvm -mtriple=aarch64-linux-gnu -mcpu=cortex-a76 -mattr=+neon -num-cores=4". Tags are broken out into three categories: CUDA
, x86
, and ARM
. See lists of tags with the provided APIs.
CUDA Tags
forge.list_cuda_tags
¶
list_cuda_tags(verbose=False)
List all tags (pre-defined aliases) of CUDA targets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose
|
bool
|
A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[str], List[Tuple[str, TargetHost]]]
|
A list of tags |
x86 Tags
forge.list_x86_tags
¶
list_x86_tags(verbose=False)
List all tags (pre-defined aliases) of x86 targets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose
|
bool
|
A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[str], List[Tuple[str, TargetHost]]]
|
A list of tags |
ARM Tags
forge.list_arm_tags
¶
list_arm_tags(verbose=False)
List all tags (pre-defined aliases) of ARM targets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose
|
bool
|
A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[str], List[Tuple[str, TargetHost]]]
|
A list of tags |
Android Tags
forge.list_android_tags
¶
list_android_tags(verbose=False)
List all tags (pre-defined aliases) for Android targets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose
|
bool
|
A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[str], List[Tuple[str, TargetHost]]]
|
A list of tags |
All Tags
forge.list_target_tags
¶
list_target_tags(verbose=False)
List all tags (pre-defined aliases) of all targets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose
|
bool
|
A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[str], List[Tuple[str, TargetHost]]]
|
A list of tags |
host
: An optional string or dictionary that denotes the host hardware containing the targeted hardware. This is relevant for multi-target compilation, e.g. GPU + CPU.
output_path
: A directory to write the compiled artifact to. If the directory does not exist, it will be made. If the directory already exists, a directory with a number will be created.
opt_level
: An optimization flag leveraged by the compiler, where the highest level of 4 corresponds to the most aggressive optimizations.
set_float16
: An option that will convert any float32
nodes into float16
nodes (operator-permitting). This option is ignored for TensorRT compilation.
set_channel_layout
: The data and kernel layout of the model can have a major impact on the final inference latency. There are two channel-layout options: "first" or "last". For quantized models, it is generally recommended that one compile with a channel-last layout. If set, this option will convert the model's layouts to maximize either channel-first or channel-last compute. This option is ignored for TensorRT compilation, which defaults to channel-first.
export_relay
: This flag will save a text file of the Relay text representation of the model to the desginated output_path
.
export_metadata
: This flag will save a JSON text file of the metadata leveraged by the Latent Runtime Engine.
force_overwrite
: This flag, when set to True, will force an overwrite of the output path if it already exists. Otherwise, it will raise a ValueError.
Example Code¶
# compile for CPU
ir.compile()
# compile for CPU, set channel, and export Relay as a text file
ir.compile(set_channel_layout="first", export_relay=True, force_overwrite=True)
# compile for GPU (targets host GPU)
ir.compile(target="cuda", force_overwrite=True)
# compile for GPU and/or CPU with explicit target strings (gives control to target a specific CPU or GPU)
ir.compile(target="cuda -arch=sm_86", host="llvm -mcpu=skylake", force_overwrite=True)
# compile for GPU with host CPU details (provides CPU acceleration for model sections mapped to CPU)
ir.compile(target="cuda", host="llvm -mcpu=skylake", force_overwrite=True)
# compile for Raspberry Pi using a hardware tag
ir.compile(target="raspberry-pi/5", force_overwrite=True)
# compile for Android SoC using a hardware tag
ir.compile(target="android/cpu", force_overwrite=True)
# compile for CPU with float16
ir.compile(set_float16=True, force_overwrite=True)
Using TensorRT
Use forge.ONNXModule to use TensorRT
Compiling with Custom UUID¶
All the model artifacts contain metadata. By default each model is assigned a unique UUID, but the user has the ability to define and assign a custom UUID to the compiled model artifact using the API:
ir.compile(
target="cuda",
uuid="123e4567-e89b-12d3-a456-426614174000",
)
Compilation Artifacts¶
The compiled object will be a .so
file placed in the designated output_path
directory, along with optional Relay-text and runtime-metadata text/JSON files.
Compiling and Encrypting the Artifact¶
The compile API offers the possibility to generate an encrypted compiled object. An example of the API usage is shown as:
ir.compile(
target="cuda",
output_path=output_path,
force_overwrite=True,
encrypt_password="test_password",
)
Encryption Time
Be aware that encrypting the model will result in slower compilation time.
Encryption will generate an additional file, such that the output_path will contain not only the .so
file, but also a .bin
key which needs to be provided at runtime.
from pylre import LatentRuntimeEngine
lre = LatentRuntimeEngine(f"{output_path}/model_library.so", f"{output_path}/modelKey.bin", "test_password")
[05:04:55] /app/src/runtime/latentai/lre.cpp:166: Model looks encrypted but no key provided
[05:04:55] /app/src/runtime/latentai/lre_cryption_service.cpp:66: Corrupted chunk encountered while decrypting the key, wrong password or corrupted key file