SolidML Onchain Inference
Overview
This guide walks you through how SolidML lets you run any ML or AI model directly from your smart contract thourgh the use of the OGInference
precompile.
SolidML is tightly integrated with our Model Hub, making every model uploaded to the Hub instantly usable through SolidML. We've also made the process of uploading and using your model incredibly fast, allowing you to build AI-enabled apps without worrying about any of the underlying infrastructure.
Every inference executed through SolidML is natively secured by the OpenGradient blockchain validator nodes, allowing developers to focus on their application rather than complex verification techniques. Behind the scenes, OpenGradient relies on a range of inference verification techniques, such as ZKML, Trusted Execution Environments and cryptoeconomic security. This empowers developers to choose the most suitable methods for their use cases and requirements. To learn more about the security options we offer, go to Inference Verification.
Inference Precompile
SolidML inference is provided through a Solidity interface (called OGInference
) that any smart contract can call. The inference is implemented by a custom precompile on the OpenGradient network.
TIP
The SolidML inference precompile is accessible at 0x00000000000000000000000000000000000000F4
.
There are 3 high-level functions exposed by OGInference
:
runModelInference
: allows running inference of any generic ONNX model from the Model Hub.runLLMCompletion
: allows running LLM completions, similar to OpenAI completions endpoint.runLLMChat
: allows running LLM chat completions (series of messages), similar to OpenAI chat endpoint.
As a general rule of thumb, if you upload your own custom model or reuse an existing tailor-made model from the Model Hub, you should use runModelInference
, whereas if you want to use LLMs, you should use either the completion or chat function.
ML Model Inferece
Function Definition
The runModel
function is defined as follows:
interface OGInference {
// security modes offered for ML inference
enum ModelInferenceMode { VANILLA, ZKML, TEE }
function runModelInference(
ModelInferenceRequest memory request
) external returns (ModelOutput memory);
}
Calling this function from your smart contract will atomically execute the requested model with the given input and return the result synchronously.
Model Input and Output
The model input is defined as follows:
struct ModelInferenceRequest {
ModelInferenceMode mode;
string modelCID;
ModelInput input;
}
Fields:
mode
: defines the inference execution and verification mode (ZK, TEE or VANILLA)modelCID
: the CID of the model to be inferenced. Can be retrieved from the Model Hubinput
: generic container for defining the model input
ModelInput and ModelOutput:
The input and output formats are implemented as generic tensor containers, providing flexibility to handle arrays of various shapes and dimensions for model inference. Each tensor has a unique name that must match up the expected ONNX input metadata and type. We currently support number and string input and output tensors.
TIP
Inspect your ONNX model metadata to find the model's input and output schema.
The definition of the input and output are as follows:
struct ModelInput {
TensorLib.MultiDimensionalNumberTensor[] numbers;
TensorLib.StringTensor[] strings;
}
struct ModelOutput {
TensorLib.MultiDimensionalNumberTensor[] numbers;
TensorLib.StringTensor[] strings;
TensorLib.JsonScalar[] jsons;
bool is_simulation_result; // indicates whether the result is part of simulation
}
Both the input and output consists of a list of number and string tensors. Number tensors can be multidimensional. The output also supports explicit JSON return types.
To read more about is_simulation_result
in the output, please see Simulation Results.
NOTE
We use fixed-point representation numbers in the model input and output; see the TensorLib.Number
type for more details. E.g., 1.52
is represented as TensorLib.Number{value = 152, decimals = 2}
.
Full Definition:
enum ModelInferenceMode { VANILLA, ZK, TEE }
/**
* Model inference request.
*/
struct ModelInferenceRequest {
ModelInferenceMode mode;
string modelCID;
ModelInput input;
}
/**
* Model input, made up of various tensors of either numbers or strings.
*/
struct ModelInput {
TensorLib.MultiDimensionalNumberTensor[] numbers;
TensorLib.StringTensor[] strings;
}
/**
* Model output, made up of tensors of either numbers or strings, ordered
* as defined by the model.
*
* For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
* you could access them like this:
* number_tensor_1 = output.numbers[0];
* string_tensor_1 = output.strings[0];
* number_tensor_2 = output.numbers[1];
*/
struct ModelOutput {
TensorLib.MultiDimensionalNumberTensor[] numbers;
TensorLib.StringTensor[] strings;
TensorLib.JsonScalar[] jsons;
bool is_simulation_result;
}
TensorLib Reference
library TensorLib {
/**
* Can be used to represent a floating-point number or integer.
*
* eg 10 can be represented as Number(10, 0),
* and 1.5 can be represented as Number(15, 1)
*/
struct Number {
int128 value;
int128 decimals;
}
struct MultiDimensionalNumberTensor {
string name;
Number[] values;
uint32[] shape;
}
struct StringTensor {
string name;
string[] values;
}
struct JsonScalar {
string name;
string value;
}
Example ML Usage
The following smart contract uses SolidML
to natively run a model from the Model Hub and persist its output.
import "opengradient-SolidML/src/OGInference.sol";
contract MlExample {
// Execute an ML model from OpenGradient's model storage, secured by ZKML
function runZkmlModel() public {
// model takes 1 number tensor as input
ModelInput memory modelInput = ModelInput(
new NumberTensor[](1),
new StringTensor[](0));
// populate tensor
Number[] memory numbers = new Number[](2);
numbers[0] = Number(7286679744720459, 17); // 0.07286679744720459
numbers[1] = Number(4486280083656311, 16); // 0.4486280083656311
// set expected tensor name
modelInput.numbers[0] = NumberTensor("input", numbers);
// execute inference
ModelOutput memory output = OGInference.runModelInference(
ModelInferenceRequest(
ModelInferenceMode.ZKML,
"QmbbzDwqSxZSgkz1EbsNHp2mb67rYeUYHYWJ4wECE24S7A",
modelInput
));
// handle result
if (output.is_simulation_result == false) {
resultNumber = output.numbers[0].values[0];
} else {
resultNumber = Number(0, 0);
}
}
}
LLM Inference
Function Definiton
The runLLMCompletion
and runLLMChat
functions are defined as follows:
interface OGInference {
function runLLMCompletion(
LLMCompletionRequest memory request
) external returns (LLMCompletionResponse memory);
function runLLMChat(
LLMChatRequest memory request
) external returns (LLMChatResponse memory);
}
Calling runLlm
from your smart contract will atomically execute the requested LLM with the given input and return the result synchronously.
Completion Input and Output
Both the chat and completion request and response types are similar to the OpenAI API format, however there are some simplifications for ease of use.
enum LLMInferenceMode { VANILLA, TEE }
struct LLMCompletionRequest {
LLMInferenceMode mode;
string modelCID;
string prompt;
uint32 max_tokens;
string[] stop_sequence;
uint32 temperature; // 0-100
}
struct LLMCompletionResponse {
string answer;
bool is_simulation_result;
}
Input Fields:
mode
: execution and security mode for the LLM inferencemodelCID
: name of the LLM to use. The list of LLMs you can use can be found here.prompt
: LLM promptmax_tokens
: maximum number of tokens to generatestop_sequence
: stop generating further tokens after any of these are generatedtemperature
: LLM temperature
NOTE
Running LLMs can naturally take a longer time, so adjusting these parameters, as well as the length of the input and output can have a significant impact on transaction speed and costs.
Output Fields:
answer
: LLM's generated answeris_sumulation_result
: whether the result is actually generated by the LLM or just a dummy result used for simulating the transaction. To read more, please see Simulation Results.
Chat Input and Output
The chat endpoint supports function calling as well as system, user and assistant messages.
enum LLMInferenceMode { VANILLA, TEE }
struct LLMChatRequest {
LLMInferenceMode mode;
string modelCID;
ChatMessage[] messages;
ToolDefinition[] tools;
string tool_choice;
uint32 max_tokens;
string[] stop_sequence;
uint32 temperature; // 0-100
}
struct LLMChatResponse {
string finish_reason;
ChatMessage message;
}
struct ToolDefinition {
string description;
string name;
string parameters; // This must be a JSON
}
struct ChatMessage {
string role;
string content;
string name;
string tool_call_id; // only used for tool response
ToolCall[] tool_calls;
}
struct ToolCall {
string id;
string name;
string arguments; // formatted as json
}
Input Fields:
mode
: execution and security mode for the LLM inferencemodelCID
: name of the LLM to use. The list of LLMs you can use can be found here.messages
: input messages (seeChatMessage
)tools
: list of available tools to use for the LLM (can be empty)tool_choice
: one ofauto
,none
, orrequired
. Forces the LLM to use/not use a tool in its response.max_tokens
: maximum number of tokens to generatestop_sequence
: stop generating further tokens after any of these are generatedtemperature
: LLM temperature
Both the input and output types mirror the OpenAI chat and tool-calling API types. Please refer to their documentation for more info.
Example LLM Usage
The following smart contract uses Llama3-8B to run a short prompt and saves its answer in the smart contract.
import "opengradient-SolidML/src/OGInference.sol";
contract LlmExample {
string answer;
// Execute a Large Language Model directly in your smart contract
function runLlm() public {
// define stop-sequence
string[] memory stopSequence = new string[](1);
stopSequence[0] = "<end>";
// run Llama 3 model
LlmResponse memory llmResult = solid_ml.runLlm(
SolidML.LlmInferenceMode.VANILLA,
LlmInferenceRequest(
"meta-llama/Meta-Llama-3-8B-Instruct",
"Hello sir, who are you?\n<start>",
1000,
stopSequence,
0
));
// handle result
if (llmResuklt.is_simulation_result) {
answer = "empty";
} else {
answer = llmResult.answer;
}
}
}