Guide to Introspection with Forge¶

This guide will show how to introspect on the many properties of the model with Forge's intermediate representation, the forge.RelayModule. There are many useful properties of Forge's RelayModule that can aid the engineer, scientist, or developer.

Load a Model

Following the guide on loading let's load a Forge RelayModule.

import forge
import onnx

onnx_model = onnx.load("path/to/model.onnx")
ir = forge.RelayModule.from_onnx(onnx_model)

What is an RelayModule?¶

The forge.RelayModule is Forge's intermediate representation module. It is a framework-agnostic representation of the model that provides the compiler a generalized and standardized abstraction that captures the algorithm of the model. It describes what the algorithm is, and not how a device ought to execute the algorithm. Because Forge is built atop the open-source TVM project, we adopt their intermediate representation language, Relay - TVM's Intermediate Representation. Forge extends TVM by providing a graph backend and a refined API.

Distinction Between Forge and Relay

One should note that there is a distinction between the 'Forge RelayModule' and the 'Relay RelayModule'. The Forge RelayModule is an object that wraps the Relay RelayModule, TVM's native intermediate representation. The Forge RelayModule aims to provide a one-to-one parallel to the underlying Relay RelayModule.

Properties of an RelayModule¶

The readable properties of a forge.RelayModule.

See the Intermediate Representation¶

The Relay RelayModule can be referenced with the class's mod and typed_mod properties. In a notebook cell, these calls will display the Relay graph as text. Use print statements to display the Relay graph to the console when not using a notebook. The Relay should look familiar as a representation of your model. Don't be concerned with understanding all the details of the output for now.

ir.mod  # Relay RelayModule
ir.typed_mod  # Relay RelayModule w/ static-typing

See the Operators¶

It's simple to get a count of all the distinct operators within a model.

ir.operators  # Dict[str, int]

Get Input and Output Information¶

There are a handful of properties that provides quick access to the inputs and outputs of a model.

# input properties
ir.input_count  # int
ir.input_shapes  # List[Tuple[int, ...]]
ir.input_dtypes  # List[str]

# output properties
ir.output_count  # int
ir.output_shapes  # List[Tuple[int, ...]]
ir.output_dtypes  # List[str]

Identify your Model¶

Models can be tricky to identify. Sometimes two files may be duplicates, but how can you be sure? In Forge, there are two ways to distinguish a model's identity.

ir.fingerprint  # str

The fingerprint property is a deterministic hashing of a model's Relay structure and weights. One can ascertain that two Forge RelayModule's with matching fingerprints are completely identical.

hash(ir)  # int

The hashing a Forge RelayModule with the hash() function is a deterministic hashing of a model's Relay structure (excluding weights), i.e. two models of identical structures, trained on different data sets will yield matching hashes (but different fingerprints).

Hashing Uniqueness

Both the fingerprint and hash features are reliant upon hashing. Hashing does not guarantee uniqueness, but it is highly improbable for different models to derive matching hashes.

Inference Debugging¶

One may want to quickly get a python callable that emulates the inference of the underlying model (especially in situations of manipulating the underlying graph). The inference function expects numpy arrays as positional arguments. The inference function is not an optimized compilation of the model. It should only be used as a tool for debugging and validating accuracy.

func = ir.get_inference_function()
func(input_data)  # func(input0, input1, ..., inputN) for multiple inputs

Partitioning the RelayModule¶

Partitioning a Relay graph for the purpose of compilation with different compiler backends or hardware is done to leverage the strengths of various environments for optimal performance. Essentially, it involves:

Dividing the Graph: Breaking down the computational graph of a model into segments or partitions.
Targeted Execution: Assigning these partitions to different compiler backends or hardware units (like CPUs, GPUs, TPUs) that are best suited for executing them.
Performance Optimization: This approach optimizes the overall performance by ensuring that each part of the model runs on the most efficient platform for its specific type of computation.

In essence, it's about matching different parts of the model with the most effective resources available for their execution.