Skip to content

SDK Inference Overview

You can use the Python SDK to use our verified and decentralized inference infrastructure directly from traditional off-chain applications. This enables building end-user applications that use decentralized AI inference that is end-to-end verified and secured by the OpenGradient network.

Using our SDK, you can ensure the full integrity and security of AI models and inferences at a more competitive price. Behind the scenes, our inference methods trigger an on-chain inference transaction on OpenGradient, which means it's fully verified, traceable, and secured by the entire value of our network.

You can find the complete function declarations and documentation in our API Reference.

SDK Initialization

In order to use the SDK library, you first need to initialize it by calling og.init with the private key of your blockchain account and the email and password of your Model Hub account you created during credentials setup.

python
import opengradient as og

og.init(private_key="<private_key>", email="<email>", password="<password>")

TIP

You can use opengradient config show to get these values from your local config file.

Refer to the API Reference for more information.

Model Inference

The inference is exposed through the infer method:

python
def infer(model_cid, inference_mode, model_input)

NOTE

ZKML has some restrictions on what types of models are supported. Please check here for details on what these restrictions are.

API Reference

The complete function definiton and documentation can be found here.

Example

python
import opengradient as og
import numpy as np

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# run inference
tx_hash, model_output = og.infer(
    model_cid='QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ',
    model_input={
        "num_input1": [1.0, 2.0, 3.0],
        "num_input2": 10,
        "str_input1": np.array(["hello", "ONNX"]),
        "str_input2": " world"
    },
    inference_mode=og.InferenceMode.VANILLA
)

# print output
print(model_output)

Inference CLI Usage

The infer command takes the following arguments: model CID, inference mode (VANILLA, TEE, or ZKML), and model input as a dictionary of a type string or as a JSON file:

Using inline input:

bash
opengradient infer -m QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ \
--mode VANILLA \
--input '{"num_input1":[1.0, 2.0, 3.0], "num_input2":10, "str_input1":["hello", "ONNX"], "str_input2":" world"}'

Using file input:

bash
opengradient infer -m QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ --mode VANILLA --input-file input.json

TIP

Remember to initialize your configuration using opengradient config init if you haven't already

To get more information on how to make inferences using the CLI, you can run:

bash
opengradient infer --help

LLM Inference

The SDK currently supports two types of LLM inference:

  1. llm_completion for simple LLM completions
  2. llm_chat for more advanced LLM chat completions (including tool-usage)

Along with these two types of LLM inference, we also support two flavors of execution:

  1. og.LlmInferenceMode.VANILLA: (default) normal inference with no security backing
  2. og.LlmInferenceMode.TEE: text inference run within a trusted execution environment

More information on TEE LLMs can be found here.

Both of these functions mostly mirror the OpenAI APIs, however there are some minor diferences.

python
def llm_completion(
  model_cid, 
  prompt, 
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None)

def llm_chat(
  model_cid, 
  messages, 
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None, 
  tools=[], 
  tool_choice=None)

LLM API Reference

For full definitions and documentation on these methods, please check the API reference:

Completion Example

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# run LLM inference
tx_hash, response = og.llm_completion(
    model_cid='meta-llama/Meta-Llama-3-8B-Instruct',
    prompt="Translate the following English text to French: 'Hello, how are you?'",
    max_tokens=50,
    temperature=0.0
)

# print output
print("Transaction Hash:", tx_hash)
print("LLM Output:", response)

Chat Example

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# create messages history
messages = [
    {
        "role": "system",
        "content": "You are a helpful AI assistant.",
        "name": "HAL"
    },
    {
        "role": "user",
        "content": "Hello! How are you doing? Can you repeat my name?",
    }]

# run LLM inference
tx_hash, finish_reason, message = og.llm_chat(
  model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3,
  messages=messages
)

# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)

Chat Example with Tools

python
# Define your tools
tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type":
                        "string",
                        "description":
                        "The city to find the weather for, e.g. 'San Francisco'"
                    },
                    "state": {
                        "type":
                        "string",
                        "description":
                        "the two-letter abbreviation for the state that the city is"
                        " in, e.g. 'CA' which would mean 'California'"
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit to fetch the temperature in",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city", "state", "unit"]
            },
        }
}]

# Message conversation
messages = [
    {
        "role": "system",
        "content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
    },
    {
        "role": "user",
        "content": "Hi! How are you doing today?"
    }, 
    {
        "role": "assistant",
        "content": "I'm doing well! How can I help you?",
    }, 
    {
        "role":
        "user",
        "content":
        "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
    }]

tx_hash, finish_reason, message = og.llm_chat(model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3, messages=messages, tools=tools)

# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)

LLM CLI Usage

We also have explicit support for using LLMs through the completion and chat commands in the CLI.

For example, you can run a competion inference with Llama-3 using the following command:

bash
opengradient completion --model "meta-llama/Meta-Llama-3-8B-Instruct" --prompt "hello who are you?" --max-tokens 50

Or you can use files instead of text input in order to simplify your command:

bash
opengradient chat --model "mistralai/Mistral-7B-Instruct-v0.3" --messages-file messages.json --tools-file tools.json --max-tokens 200

The list of models we support can be found in the Model Hub.

To get more information on how to run LLM's using the CLI, you can run:

bash
opengradient completion --help
opengradient chat --help

TEE LLMs

OpenGradient now supports LLM inference within trusted execution environments (TEEs). In order to deliver useful LLMs, we've enabled this technology using Intel TDX and NVIDIA H100 GPUs with confidential compute. This means you can now make both chat and completion inference requests to models in a fully hardware attested environment. In order to utilize this, use one of the supported models below and use the flags:

  • inference_mode=og.LlmInferenceMode.TEE for the python SDK
  • --mode TEE for the CLI.

Supported Models:

NOTE

This technology is so new so access may be periodically restricted due to usage limitations.

SDK API Reference

Please refer to our API Reference for any additional details around the SDK methods.

OpenGradient 2025