Skip to content

Guide to Compilation with Forge

This guide will show you how to compile the Forge RelayModule for a range of targets.

Load an RelayModule

import forge
import onnx

onnx_model = onnx.load("path/to/model.onnx")
ir = forge.RelayModule.from_onnx(onnx_model)


Compiling

Forge RelayModule can be compiled with its compile() method. The main thing a user will need to pass to the compile method will be the 'target' designation (to be detailed below). Below is the type-signature and docstring for the method for reference.

RelayModule Compile Method Docstring - Click to Expand & Collapse

forge.RelayModule.compile

compile(
    target="llvm",
    host=None,
    output_path="./compile_output",
    opt_level=3,
    set_float16=False,
    set_channel_layout=None,
    export_relay=False,
    export_metadata=False,
    force_overwrite=False,
    uuid=None,
    encrypt_password=None,
)

Compiles the model for a specified target with various configuration options.

This method compiles the model for a given target, which can be a string or a dictionary specifying the target attributes. The compilation can be customized through various parameters, including optimization level and data type settings.

Parameters:

Name Type Description Default
target Union[str, Dict[str, Any]]

Can be one of a literal target string, a target tag (pre-defined target alias), a json string describing, a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are:

kind : str (required) Which codegen path to use, for example "llvm" or "cuda".

keys : List of str (optional) A set of strategies that can be dispatched to. When using "kind=opencl" for example, one could set keys to ["mali", "opencl", "gpu"].

device : str (optional) A single key that corresponds to the actual device being run on. This will be effectively appended to the keys.

libs : List of str (optional) The set of external libraries to use. For example ["cblas", "mkl"].

system-lib : bool (optional) If True, build a module that contains self registered functions. Useful for environments where dynamic loading like dlopen is banned.

mcpu : str (optional) The specific cpu being run on. Serves only as an annotation.

model : str (optional) An annotation indicating what model a workload came from.

runtime : str (optional) An annotation indicating which runtime to use with a workload.

mtriple : str (optional) The llvm triplet describing the target, for example "arm64-linux-android".

mattr : List of str (optional) The llvm features to compile with, for example ["+avx512f", "+mmx"].

mfloat-abi : str (optional) An llvm setting that is one of "hard" or "soft" indicating whether to use hardware or software floating-point operations.

mabi : str (optional) An llvm setting. Generate code for the specified ABI, for example "lp64d".

host : Union[str, Dict[str, Any]] (optional) Description for target host. Can be recursive. Similar to target.

'llvm'
host Optional[Union[str, Dict[str, Any]]]

Similar to target but for target host. Can be one of a literal target host string, a target tag (pre-defined target alias), a json string describing a configuration, or a dictionary of configuration options. When using a dictionary or json string to configure target, the possible values are same as target.

None
output_path Optional[Union[str, Path]]

The path to save the compiled output, ./compile_output by default.

'./compile_output'
opt_level int

Optimization level, ranging from 0 to 4. Larger numbers, correspond with more aggressive compilation optimizations. Default is 3.

3
set_float16 bool

If True, enables Float16 data type for all operators permitted. Default is False.

False
set_channel_layout Optional[str]

Optional specification of the channel layout ("first", "last"), defaults to no changing of layout if None.

None
export_relay bool

If True, exports the Relay text representation of the model. Default is False.

False
export_metadata bool

If True, exports the metadata JSON of the model as a text file. Default is False.

False
force_overwrite bool

If True, the method will overwrite if the provided output path already exists. A ValueError will be thrown if False and the output path already exists. Default is False.

False
uuid Optional[str]

Optional specification of a uuid from the user when the model needs to have a unique identifier that will be set by the user, when this value is not set, the uuid will be a randomely generated one.

None
encrypt_password Optional[str]

Optional specification of a password if is desirable to have the model encrypted. As an output, there will be the model file and the key.

None

Returns:

Name Type Description
None None

The method operates in place.

Compilation Arguments

The compilation is dictated by the passed options, below will provide some details on the corresponding arguments and its effects on compilation.

target: A string or dictionary that denotes the targeted hardware for compilation. See here for more details. A simpler method here would be to leverage the pre-defined hardware "tags" (aliases).

Target Tags - Click to Expand & Collapse

There are pre-defined hardware tags (aliases) that can greatly simplify the passing of the target to the compiler. For example, one only needs to pass "raspberry-pi/5" instead of its detailed target description, "llvm -mtriple=aarch64-linux-gnu -mcpu=cortex-a76 -mattr=+neon -num-cores=4". Tags are broken out into three categories: CUDA, x86, and ARM. See lists of tags with the provided APIs.

CUDA Tags

forge.list_cuda_tags

list_cuda_tags(verbose=False)

List all tags (pre-defined aliases) of CUDA targets

Parameters:

Name Type Description Default
verbose bool

A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False.

False

Returns:

Type Description
Union[List[str], List[Tuple[str, TargetHost]]]

A list of tags

x86 Tags

forge.list_x86_tags

list_x86_tags(verbose=False)

List all tags (pre-defined aliases) of x86 targets

Parameters:

Name Type Description Default
verbose bool

A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False.

False

Returns:

Type Description
Union[List[str], List[Tuple[str, TargetHost]]]

A list of tags

ARM Tags

forge.list_arm_tags

list_arm_tags(verbose=False)

List all tags (pre-defined aliases) of ARM targets

Parameters:

Name Type Description Default
verbose bool

A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False.

False

Returns:

Type Description
Union[List[str], List[Tuple[str, TargetHost]]]

A list of tags

Android Tags

forge.list_android_tags

list_android_tags(verbose=False)

List all tags (pre-defined aliases) for Android targets

Parameters:

Name Type Description Default
verbose bool

A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False.

False

Returns:

Type Description
Union[List[str], List[Tuple[str, TargetHost]]]

A list of tags

All Tags

forge.list_target_tags

list_target_tags(verbose=False)

List all tags (pre-defined aliases) of all targets

Parameters:

Name Type Description Default
verbose bool

A flag to return all the tags with their corresponding target string literals for the TVM compiler if true. Default is False.

False

Returns:

Type Description
Union[List[str], List[Tuple[str, TargetHost]]]

A list of tags

host: An optional string or dictionary that denotes the host hardware containing the targeted hardware. This is relevant for multi-target compilation, e.g. GPU + CPU.

output_path: A directory to write the compiled artifact to. If the directory does not exist, it will be made. If the directory already exists, a directory with a number will be created.

opt_level: An optimization flag leveraged by the compiler, where the highest level of 4 corresponds to the most aggressive optimizations.

set_float16: An option that will convert any float32 nodes into float16 nodes (operator-permitting). This option is ignored for TensorRT compilation.

set_channel_layout: The data and kernel layout of the model can have a major impact on the final inference latency. There are two channel-layout options: "first" or "last". For quantized models, it is generally recommended that one compile with a channel-last layout. If set, this option will convert the model's layouts to maximize either channel-first or channel-last compute. This option is ignored for TensorRT compilation, which defaults to channel-first.

export_relay: This flag will save a text file of the Relay text representation of the model to the desginated output_path.

export_metadata: This flag will save a JSON text file of the metadata leveraged by the Latent Runtime Engine.

force_overwrite: This flag, when set to True, will force an overwrite of the output path if it already exists. Otherwise, it will raise a ValueError.


Example Code

# compile for CPU
ir.compile()

# compile for CPU, set channel, and export Relay as a text file
ir.compile(set_channel_layout="first", export_relay=True, force_overwrite=True)

# compile for GPU (targets host GPU)
ir.compile(target="cuda", force_overwrite=True)

# compile for GPU and/or CPU with explicit target strings (gives control to target a specific CPU or GPU)
ir.compile(target="cuda -arch=sm_86", host="llvm -mcpu=skylake", force_overwrite=True)

# compile for GPU with host CPU details (provides CPU acceleration for model sections mapped to CPU)
ir.compile(target="cuda", host="llvm -mcpu=skylake", force_overwrite=True)

# compile for Raspberry Pi using a hardware tag
ir.compile(target="raspberry-pi/5", force_overwrite=True)

# compile for Android SoC using a hardware tag
ir.compile(target="android/cpu", force_overwrite=True)

# compile for CPU with float16
ir.compile(set_float16=True, force_overwrite=True)

Using TensorRT

Use forge.ONNXModule to use TensorRT

Compiling with Custom UUID

All the model artifacts contain metadata. By default each model is assigned a unique UUID, but the user has the ability to define and assign a custom UUID to the compiled model artifact using the API:

ir.compile(
        target="cuda",
        uuid="123e4567-e89b-12d3-a456-426614174000",
    )
This is an interesting use case where each part of the UUID can signify host identifier, hardware, or any other encoded information that is defined by the user.

Compilation Artifacts

The compiled object will be a .so file placed in the designated output_path directory, along with optional Relay-text and runtime-metadata text/JSON files.

Compiling and Encrypting the Artifact

The compile API offers the possibility to generate an encrypted compiled object. An example of the API usage is shown as:

ir.compile(
        target="cuda",
        output_path=output_path,
        force_overwrite=True,
        encrypt_password="test_password",
    )

Encryption Time

Be aware that encrypting the model will result in slower compilation time.

Encryption will generate an additional file, such that the output_path will contain not only the .so file, but also a .bin key which needs to be provided at runtime.

from pylre import LatentRuntimeEngine
lre = LatentRuntimeEngine(f"{output_path}/model_library.so", f"{output_path}/modelKey.bin", "test_password")
If you attempt to run an encrypted model without providing the key and password to the LRE, the model will not run. An error will be logged:
[05:04:55] /app/src/runtime/latentai/lre.cpp:166: Model looks encrypted but no key provided
or
[05:04:55] /app/src/runtime/latentai/lre_cryption_service.cpp:66: Corrupted chunk encountered while decrypting the key, wrong password or corrupted key file