ONNXModule Class¶

forge.ONNXModule¶

forge.ONNXModule ¶

ONNXModule(onnx_model: Union[str, PathLike, bytes, ModelProto])

ONNXModule in Forge, an extension of ONNX's ModelProto, facilitates manipulation and optimizationof machine learning models.

This class provides a user-friendly API to easily quantize and export ONNX models for inference with the LEIP LatentRuntimeEngine (LRE).

Initialize an ONNXModule instance.

Note

The ONNX model must be less than 2GB in size due to Protobuf serialization limitations.

Parameters:

Name	Type	Description	Default
`onnx_model`	`Union[str, PathLike, bytes, ModelProto]`	Can be: - A file-like object (has 'read' function) - A string/PathLike containing a file name to an ONNX model - A string containing serialized ModelProto - A ModelProto instance	required

Methods:

Name	Description
`copy`	Returns a deep copy of the instance
`get_inference_function`	Creates an ONNX Runtime inference function from the given model.
`calibrate`	Calibrates the model by tracking intermediate layer statistics.
`quantize`	Applies quantization to the model with specified parameters.
`export`	Exports the current state of the ONNX model to the specified output with metadata
`__repr__`

Attributes:

Name	Type	Description
`mod`	`ModelProto`	The current state of the ONNX model
`ir_version`	`int`	ONNX model's IR version
`input_count`	`int`	ONNX model's number of expected inputs
`input_shapes`	`List[Tuple[Union[int, str], ...]]`	List of ONNX model's input shapes
`input_dtypes`	`List[str]`	List of ONNX model's input data types
`input_names`	`List`	List of ONNX model's input names
`output_count`	`int`	ONNX model's number of expected outputs
`output_shapes`	`List[Tuple[Union[int, str], ...]]`	List of ONNX model's output shapes
`output_dtypes`	`List[str]`	List of ONNX model's output data types
`output_names`	`List`	List of ONNX model's output names
`is_calibrated`	`bool`	Flag to check whether or not the module is calibrated (for quantization)
`is_quantized`	`bool`	Flag to check whether or not the module is quantized (non-TensorRT)
`is_quantized_for_tensorrt`	`bool`	Flag to check whether or not the module is 'quantized' for TensorRT

Attributes¶

mod `property` ¶

mod: ModelProto

The current state of the ONNX model

ir_version `property` ¶

ir_version: int

ONNX model's IR version

input_count `property` ¶

input_count: int

ONNX model's number of expected inputs

input_shapes `property` ¶

input_shapes: List[Tuple[Union[int, str], ...]]

List of ONNX model's input shapes

input_dtypes `property` ¶

input_dtypes: List[str]

List of ONNX model's input data types

input_names `property` ¶

input_names: List

List of ONNX model's input names

output_count `property` ¶

output_count: int

ONNX model's number of expected outputs

output_shapes `property` ¶

output_shapes: List[Tuple[Union[int, str], ...]]

List of ONNX model's output shapes

output_dtypes `property` ¶

output_dtypes: List[str]

List of ONNX model's output data types

output_names `property` ¶

output_names: List

List of ONNX model's output names

is_calibrated `property` ¶

is_calibrated: bool

Flag to check whether or not the module is calibrated (for quantization)

is_quantized `property` ¶

is_quantized: bool

Flag to check whether or not the module is quantized (non-TensorRT)

is_quantized_for_tensorrt `property` ¶

is_quantized_for_tensorrt: bool

Flag to check whether or not the module is 'quantized' for TensorRT

Functions¶

copy ¶

copy() -> ONNXModule

Returns a deep copy of the instance

get_inference_function ¶

get_inference_function(providers: Optional[Union[str, List[str]]] = None, opt_level: Union[int, GraphOptimizationLevel] = ORT_DISABLE_ALL) -> Callable

Creates an ONNX Runtime inference function from the given model.

This function loads the current state of the ONNX model and returns a callable inference function that can be used to run predictions. The returned function automatically handles input and output names and shapes, and executes inference using the specified execution providers and graph optimization level.

Parameters:

Name	Type	Description	Default
`providers`	`Optional[Union[str, List[str]]]`	The execution providers to use for inference. Can be a string, e.g. "CUDAExecutionProvider" or a list of provider strings to try in priority order. If not provided, defaults to "CPUExecutionProvider".	`None`
`opt_level`	`Union[int, GraphOptimizationLevel]`	The level of graph optimization to apply during model loading. Defaults to `GraphOptimizationLevel.ORT_DISABLE_ALL`.	`ORT_DISABLE_ALL`

Returns:

Name	Type	Description
`Callable`	`Callable`	A callable function that takes input data as positional arguments and returns a dictionary mapping output names to their corresponding NumPy arrays. The function has additional metadata attributes such as `input_names`, `input_shapes`, `output_names`, `output_shapes`, and `session`.

Metadata Attributes of the Returned Function

input_names: A tuple of input tensor names.
input_shapes: A tuple of input tensor shapes.
input_dtypes: A tuple of input tensor data types.
output_names: A tuple of output tensor names.
output_shapes: A tuple of output tensor shapes.
output_dtypes: A tuple of output tensor data types.
session: The ONNX Runtime InferenceSession used for inference.

Inference Function Output

The inference function will return a dictionary mapping of output node names to respective node activations collected during inference.

Example

inference_fn = ir.get_inference_function()
output = inference_fn(input_data)
output_name = inference_fn.output_names[0])
print(output[output_name])

Raises:

Type	Description
`ValueError`	If the model, providers, or optimization level are invalid.

calibrate ¶

calibrate(calib_data: Iterable[Any], use_cuda: bool = True, reset: bool = True) -> None

Calibrates the model by tracking intermediate layer statistics.

This method collects statistics from intermediate layers of the model using the provided calibration dataset. These statistics are used for deriving quantization parameters in a subsequent quantization process. It's essential that the calibration data is representative of the model's expected real-world inputs.

Note: Ensure that the calibration data is in the form of numpy arrays and has undergone the necessary pre-processing steps required for the model.

Parameters:

Name	Type	Description	Default
`calib_data`	`Iterable[Any]`	An iterable of data samples for calibration. The samples should be in a format compatible with the model's input requirements. Inspect the onnx_ir.input_shapes`and onnx_ir.input_dtypes` for details. For multiple inputs, each set of inputs should be an iterable of numpy arrays, e.g. a list or tuple of numpy arrays.	required
`reset`	`bool`	If True, any previous calibration data is cleared before new data is processed. Defaults to True.	`True`
`use_cuda`	`bool`	If True, Forge will utilize CUDA devices for calibration if GPUs can be found. Operation will fall back to CPU if GPUs are not found. Default is True.	`True`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

Raises:

Type	Description
`ValueError`	If `calib_data` is not in the correct format.

quantize ¶

quantize(activation_dtype: str = 'int8', kernel_dtype: Optional[str] = None, per_channel: bool = False, calib_method: str = 'entropy', quant_type: str = 'any') -> None

Applies quantization to the model with specified parameters.

This method quantizes the model's activations and kernels specified data types. If kernel_dtype is None, they default to the activation_dtype. The quantization can be "static" (requiring prior calibration), "dynamic" (no calibration needed), or "any" (prioritizing static if possible, i.e. the module 'is_calibrated').

This method will perform two processes in quantization

1) Quantize the current model state (non-TensorRT) 2) Compute 'quantization' for TensorRT. This step only accounts for the 'calib_method' argument. All other arguments have no effect on the 'quantization' for TensorRT.

Note: When using "static" quant_type, ensure calibration is performed beforehand or provide calib_data for calibration.

Parameters:

Name	Type	Description	Default
`activation_dtype`	`str`	Data type for activations ("int8", "uint8"), default is "int8".	`'int8'`
`kernel_dtype`	`Optional[str]`	Data type for kernels ("int8", "uint8"), defaults to `activation_dtype` if None.	`None`
`per_channel`	`bool`	If True, performs per-channel quantization on kernels. Default is False.	`False`
`calib_method`	`str`	Method for calibration ("average", "entropy", "minmax", "percentile"), default is "entropy". Overview of calibration methods: "average" - computed average of the min-max extrema across calibration data, "entropy" - distribution-based maximization of entropy of quantized values, "minmax" - absolute most extreme min-max values across calibration data, "percentile" - computed 99-th percentile cut-offs across calibration data.	`'entropy'`
`quant_type`	`str`	Type of quantization ("static", "dynamic", "any"), default is "any".	`'any'`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

export ¶

export(f: Union[str, PathLike] = './model.onnx', force_overwrite: bool = False, is_tensorrt: bool = False, uuid: Optional[str] = None, encrypt_password: Optional[str] = None) -> None

Exports the current state of the ONNX model to the specified output with metadata for inference with the LEIP LatentRuntimeEngine (LRE).

This method manages output path validation and enforces the '.onnx' file extension. If the model is quantized, related metadata will be included in the export. If exporting for downstream use with TensorRT, set 'is_tensorrt=True' (applicable to quantized models only); the process will export an unquantized version of the model along with 'quantization' parameters in its metadata.

Parameters:

Name	Type	Description	Default
`f`	`Union[str, Path]`	A string containing a file name or a pathlike object. Defaults to "./model.onnx".	`'./model.onnx'`
`force_overwrite`	`bool`	If True, overwrites the output path if it already exists. Defaults to False.	`False`
`is_tensorrt`	`bool`	If True, only exports the model in its unquantized state, but exports the current state of collected calibration data needed to run the model with TensorRT's 8-bit quantization. See module 'calibrate()' and 'quantize()` methods, both necessary steps before any calibration data gets exported.	`False`
`uuid`	`Optional[str]`	A custom UUID for the export. If not provided, a new UUID4 is generated.	`None`
`encrypt_password`	`Optional[str]`	Optional specification of a password if is desirable to have the model encrypted. As an output, there will be the model file and the key.	`None`

Returns:

Name	Type	Description
`None`	`None`	This method operates in place.

repr ¶

__repr__() -> str

ONNXModule Class¶

forge.ONNXModule¶

forge.ONNXModule ¶

Attributes¶

mod property ¶

ir_version property ¶

input_count property ¶

input_shapes property ¶

input_dtypes property ¶

input_names property ¶

output_count property ¶

output_shapes property ¶

output_dtypes property ¶

output_names property ¶

is_calibrated property ¶

is_quantized property ¶

is_quantized_for_tensorrt property ¶

Functions¶

copy ¶

get_inference_function ¶

calibrate ¶

quantize ¶

export ¶

__repr__ ¶

mod `property` ¶

ir_version `property` ¶

input_count `property` ¶

input_shapes `property` ¶

input_dtypes `property` ¶

input_names `property` ¶

output_count `property` ¶

output_shapes `property` ¶

output_dtypes `property` ¶

output_names `property` ¶

is_calibrated `property` ¶

is_quantized `property` ¶

is_quantized_for_tensorrt `property` ¶

repr ¶