Skip to content

NeuroML Model Inference

This guide explores how NeuroML lets you run any ML or AI model directly from your smart contract.

NeuroML is tightly integrated with our Model Hub, ensuring that every model uploaded there is instantly accessible through NeuroML. We've also made the process of uploading and using your model incredibly fast, allowing you to get to work in seconds.

OpenGradient is designed to be flexible, supporting a range of inference verification techniques, such as ZKML and TEE inference. This empowers developers to choose the most suitable methods for their use cases and requirements. To learn more about the security options we offer, go to Inference Verification.

Model Inference Precompile

NeuroML inference is provided through a Solidity interface (called NeuroML) that any smart contract can use. The inference is implemented by a custom precompile on the OpenGradient network.

TIP

The NeuroML precompile is accessible at 0x00000000000000000000000000000000000000F4.

The two high-level functions exposed by the NeuroML interface are runModel and runLllm. runLllm is a function specifically designed for running large language models, whereas runModel is a generic method that can be used to execute any AI and ML models. We are only focusing on runModel on this page. Check out LLM Inference for Large Language Models.

The runModel function is defined as follows:

solidity
interface NeuroML {

    // security modes offered for ML inference
    enum ModelInferenceMode { VANILLA, ZKML, TEE }

    // runs any ML model
    function runModel(
        ModelInferenceMode mode, 
        ModelInferenceRequest memory request
    ) external returns (ModelOutput memory);
}

Calling this function from your smart contract will atomically execute the requested model with the given input and return the result synchronously.

Model Input and Output

runModel takes 2 arguments:

  • ModelInferenceMode: defines the security mode that OpenGradient should use for verifying the inference (VANILLA, ZKML, TEE)
  • ModelInferenceRequest: contains the following 2 fields:
    • modelId: the unique ID of the model from the Model Hub that will be used for inference
    • ModelInput input: defines the input for the model that will be passed in during inference. The input format must match the ONNX model's expected input types and names.

Similarly to the input, runModel returns a generic ModelOutput.

To support a wide range of models, both the input and output can take various shapes and forms (such as multidimensional numbers and strings). These generic input and output types are made up of a list of number tensors and/or string tensors. For example,

solidity
struct ModelInput {
    NumberTensor[] numbers;
    StringTensor[] strings;
}

Each tensor has a unique name that must match up the expected ONNX input metadata and type.

TIP

Inspect your ONNX model metadata to find the input and output types the model expects.

Similarly, the ModelOutput is represented by number and string tensors:

solidity
struct ModelOutput {
    NumberTensor[] numbers;
    StringTensor[] strings;
    
    bool is_simulation_result; // indicates whether the result is "real"
}

To read more about is_simulation_result, please see Simulation Results.

NOTE

We use fixed-point representation numbers in the model input and output; see the Number type for more details. E.g., 1.52 is represented as Number{value = 152, decimals = 2}.

The full definition of input and output types is as follows:

solidity
/**
 * Can be used to represent a floating-point number or integer.
 *
 * eg 10 can be represented as Number(10, 0),
 * and 1.5 can be represented as Number(15, 1)
 */
struct Number {
    int128 value;
    int128 decimals;
}

/**
 * Represents a model tensor input filled with numbers.
 */
struct NumberTensor {
    string name;
    Number[] values;
}

/**
 * Represents a model tensor input filled with strings.
 */
struct StringTensor {
    string name;
    string[] values;
}

/**
 * Model input, made up of various tensors of numbers and/or strings.
 */
struct ModelInput {
    NumberTensor[] numbers;
    StringTensor[] strings;
}

/**
 * Model inference request.
 */
struct ModelInferenceRequest {
    string modelId;
    ModelInput input;
}

/**
 * Model output, made up of tensors of either numbers or strings, ordered
 * as defined by the model. 
 *
 * For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
 * you could access them like this:
 *
 * number_tensor_1 = output.numbers[0];
 * string_tensor_1 = output.strings[0];
 * number_tensor_2 = output.numbers[1];
 *
 */
struct ModelOutput {
    NumberTensor[] numbers;
    StringTensor[] strings;
    
    bool is_simulation_result; // indicates whether the result is real
}

Example Smart Contract

The following smart contract uses NeuroML to natively run a model from the Model Hub and persist its output.

solidity
import "opengradient-neuroml/src/OGInference.sol";

contract MlExample {

    // Execute an ML model from OpenGradient's model storage, secured by ZKML
    function runZkmlModel() public {
        // model takes 1 number tensor as input
        ModelInput memory modelInput = ModelInput(
            new NumberTensor[](1),
            new StringTensor[](0));

        // populate tensor
        Number[] memory numbers = new Number[](2);
        numbers[0] = Number(7286679744720459, 17); // 0.07286679744720459
        numbers[1] = Number(4486280083656311, 16); // 0.4486280083656311

        // set expected tensor name
        modelInput.numbers[0] = NumberTensor("input", numbers);

        // execute inference
        ModelOutput memory output = NEURO_ML.runModel(
            NeuroML.ModelInferenceMode.ZKML,
            ModelInferenceRequest(
                "QmbbzDwqSxZSgkz1EbsNHp2mb67rYeUYHYWJ4wECE24S7A",
                modelInput
        ));

        // handle result
        if (output.is_simulation_result == false) {
            resultNumber = output.numbers[0].values[0];
        } else {
            resultNumber = Number(0, 0);
        }
    }
}

OpenGradient 2024