SolidML Onchain Inference

Overview

This guide walks you through how SolidML lets you run any ML or AI model directly from your smart contract thourgh the use of the OGInference precompile.

SolidML is tightly integrated with our Model Hub, making every model uploaded to the Hub instantly usable through SolidML. We've also made the process of uploading and using your model incredibly fast, allowing you to build AI-enabled apps without worrying about any of the underlying infrastructure.

Every inference executed through SolidML is natively secured by the OpenGradient blockchain validator nodes, allowing developers to focus on their application rather than complex verification techniques. Behind the scenes, OpenGradient relies on a range of inference verification techniques, such as ZKML, Trusted Execution Environments and cryptoeconomic security. This empowers developers to choose the most suitable methods for their use cases and requirements. To learn more about the security options we offer, go to Inference Verification.

Inference Precompile

SolidML inference is provided through a Solidity interface (called OGInference) that any smart contract can call. The inference is implemented by a custom precompile on the OpenGradient network.

TIP

The SolidML inference precompile is accessible at 0x00000000000000000000000000000000000000F4.

There are 3 high-level functions exposed by OGInference:

runModelInference: allows running inference of any generic ONNX model from the Model Hub.
runLLMCompletion: allows running LLM completions, similar to OpenAI completions endpoint.
runLLMChat: allows running LLM chat completions (series of messages), similar to OpenAI chat endpoint.

As a general rule of thumb, if you upload your own custom model or reuse an existing tailor-made model from the Model Hub, you should use runModelInference, whereas if you want to use LLMs, you should use either the completion or chat function.

ML Model Inferece

Function Definition

The runModel function is defined as follows:

solidity

interface OGInference {

    // security modes offered for ML inference
    enum ModelInferenceMode { VANILLA, ZKML, TEE }

    function runModelInference(
        ModelInferenceRequest memory request
    ) external returns (ModelOutput memory);
}

Calling this function from your smart contract will atomically execute the requested model with the given input and return the result synchronously.

Model Input and Output

The model input is defined as follows:

solidity

struct ModelInferenceRequest {
    ModelInferenceMode mode;
    string modelCID;
    ModelInput input;
}

Fields:

mode: defines the inference execution and verification mode (ZK, TEE or VANILLA)
modelCID: the CID of the model to be inferenced. Can be retrieved from the Model Hub
input: generic container for defining the model input

ModelInput and ModelOutput:

The input and output formats are implemented as generic tensor containers, providing flexibility to handle arrays of various shapes and dimensions for model inference. Each tensor has a unique name that must match up the expected ONNX input metadata and type. We currently support number and string input and output tensors.

TIP

Inspect your ONNX model metadata to find the model's input and output schema.

The definition of the input and output are as follows:

solidity

struct ModelInput {
    TensorLib.MultiDimensionalNumberTensor[] numbers;
    TensorLib.StringTensor[] strings;
}

struct ModelOutput {
    TensorLib.MultiDimensionalNumberTensor[] numbers;
    TensorLib.StringTensor[] strings;
    TensorLib.JsonScalar[] jsons;
    bool is_simulation_result; // indicates whether the result is part of simulation
}

Both the input and output consists of a list of number and string tensors. Number tensors can be multidimensional. The output also supports explicit JSON return types.

To read more about is_simulation_result in the output, please see Simulation Results.

NOTE

We use fixed-point representation numbers in the model input and output; see the TensorLib.Number type for more details. E.g., 1.52 is represented as TensorLib.Number{value = 152, decimals = 2}.

Full Definition:

solidity

enum ModelInferenceMode { VANILLA, ZK, TEE }

/**
 * Model inference request.
 */
struct ModelInferenceRequest {
    ModelInferenceMode mode;
    string modelCID;
    ModelInput input;
}

/**
 * Model input, made up of various tensors of either numbers or strings.
 */
struct ModelInput {
    TensorLib.MultiDimensionalNumberTensor[] numbers;
    TensorLib.StringTensor[] strings;
}

/**
 * Model output, made up of tensors of either numbers or strings, ordered
 * as defined by the model. 
 *
 * For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
 * you could access them like this:
 * number_tensor_1 = output.numbers[0];
 * string_tensor_1 = output.strings[0];
 * number_tensor_2 = output.numbers[1];
 */
struct ModelOutput {
    TensorLib.MultiDimensionalNumberTensor[] numbers;
    TensorLib.StringTensor[] strings;
    TensorLib.JsonScalar[] jsons;
    bool is_simulation_result;
}

TensorLib Reference

solidity

library TensorLib {

    /**
    * Can be used to represent a floating-point number or integer.
    *
    * eg 10 can be represented as Number(10, 0),
    * and 1.5 can be represented as Number(15, 1)
    */
    struct Number {
        int128 value;
        int128 decimals;
    }

    struct MultiDimensionalNumberTensor {
        string name;
        Number[] values;
        uint32[] shape;
    }

    struct StringTensor {
        string name;
        string[] values;
    }

    struct JsonScalar {
        string name;
        string value;
    }

Example ML Usage

The following smart contract uses SolidML to natively run a model from the Model Hub and persist its output.

solidity

import "opengradient-SolidML/src/OGInference.sol";

contract MlExample {

    // Execute an ML model from OpenGradient's model storage, secured by ZKML
    function runZkmlModel() public {
        // model takes 1 number tensor as input
        ModelInput memory modelInput = ModelInput(
            new NumberTensor[](1),
            new StringTensor[](0));

        // populate tensor
        Number[] memory numbers = new Number[](2);
        numbers[0] = Number(7286679744720459, 17); // 0.07286679744720459
        numbers[1] = Number(4486280083656311, 16); // 0.4486280083656311

        // set expected tensor name
        modelInput.numbers[0] = NumberTensor("input", numbers);

        // execute inference
        ModelOutput memory output = OGInference.runModelInference(
            ModelInferenceRequest(
                ModelInferenceMode.ZKML,
                "QmbbzDwqSxZSgkz1EbsNHp2mb67rYeUYHYWJ4wECE24S7A",
                modelInput
        ));

        // handle result
        if (output.is_simulation_result == false) {
            resultNumber = output.numbers[0].values[0];
        } else {
            resultNumber = Number(0, 0);
        }
    }
}

LLM Inference

Function Definiton

The runLLMCompletion and runLLMChat functions are defined as follows:

solidity

interface OGInference {

    function runLLMCompletion(
        LLMCompletionRequest memory request
    ) external returns (LLMCompletionResponse memory);

    function runLLMChat(
        LLMChatRequest memory request
    ) external returns (LLMChatResponse memory);
}

Calling runLlm from your smart contract will atomically execute the requested LLM with the given input and return the result synchronously.

Completion Input and Output

Both the chat and completion request and response types are similar to the OpenAI API format, however there are some simplifications for ease of use.

solidity

enum LLMInferenceMode { VANILLA, TEE }

struct LLMCompletionRequest {
    LLMInferenceMode mode;
    string modelCID;
    string prompt;
    uint32 max_tokens;
    string[] stop_sequence;
    uint32 temperature; // 0-100
}

struct LLMCompletionResponse {
    string answer;
    bool is_simulation_result;
}

Input Fields:

mode: execution and security mode for the LLM inference
modelCID: name of the LLM to use. The list of LLMs you can use can be found here.
prompt: LLM prompt
max_tokens: maximum number of tokens to generate
stop_sequence: stop generating further tokens after any of these are generated
temperature: LLM temperature

NOTE

Running LLMs can naturally take a longer time, so adjusting these parameters, as well as the length of the input and output can have a significant impact on transaction speed and costs.

Output Fields:

answer: LLM's generated answer
is_sumulation_result: whether the result is actually generated by the LLM or just a dummy result used for simulating the transaction. To read more, please see Simulation Results.

Chat Input and Output

The chat endpoint supports function calling as well as system, user and assistant messages.

solidity

enum LLMInferenceMode { VANILLA, TEE }

struct LLMChatRequest {
    LLMInferenceMode mode;
    string modelCID;
    ChatMessage[] messages;
    ToolDefinition[] tools; 
    string tool_choice;
    uint32 max_tokens;
    string[] stop_sequence;
    uint32 temperature; // 0-100
}

struct LLMChatResponse {
    string finish_reason;
    ChatMessage message;
}

struct ToolDefinition {
    string description;
    string name;
    string parameters; // This must be a JSON 
}

struct ChatMessage {
  string role;
  string content;
  string name;
  string tool_call_id; // only used for tool response
  ToolCall[] tool_calls;
}

struct ToolCall {
  string id;
  string name;
  string arguments; // formatted as json
}

Input Fields:

mode: execution and security mode for the LLM inference
modelCID: name of the LLM to use. The list of LLMs you can use can be found here.
messages: input messages (see ChatMessage)
tools: list of available tools to use for the LLM (can be empty)
tool_choice: one of auto, none, or required. Forces the LLM to use/not use a tool in its response.
max_tokens: maximum number of tokens to generate
stop_sequence: stop generating further tokens after any of these are generated
temperature: LLM temperature

Both the input and output types mirror the OpenAI chat and tool-calling API types. Please refer to their documentation for more info.

Example LLM Usage

The following smart contract uses Llama3-8B to run a short prompt and saves its answer in the smart contract.

solidity

import "opengradient-SolidML/src/OGInference.sol";

contract LlmExample {

    string answer;

    // Execute a Large Language Model directly in your smart contract
    function runLlm() public {
        // define stop-sequence
        string[] memory stopSequence = new string[](1);
        stopSequence[0] = "<end>";

        // run Llama 3 model
        LlmResponse memory llmResult = solid_ml.runLlm(
            SolidML.LlmInferenceMode.VANILLA,
            LlmInferenceRequest(
                "meta-llama/Meta-Llama-3-8B-Instruct",
                "Hello sir, who are you?\n<start>",
                1000,
                stopSequence,
                0
        ));

        // handle result
        if (llmResuklt.is_simulation_result) {
            answer = "empty";
        } else {
            answer = llmResult.answer;
        }
    }
}

SolidML Onchain Inference ​

Overview ​

Inference Precompile ​

ML Model Inferece ​

Function Definition ​

Model Input and Output ​

TensorLib Reference ​

Example ML Usage ​

LLM Inference ​

Function Definiton ​

Completion Input and Output ​

Chat Input and Output ​

Example LLM Usage ​

SolidML Onchain Inference

Overview

Inference Precompile

ML Model Inferece

Function Definition

Model Input and Output

TensorLib Reference

Example ML Usage

LLM Inference

Function Definiton

Completion Input and Output

Chat Input and Output

Example LLM Usage