SDK Inference Overview
You can use the Python SDK to use our verified and decentralized inference infrastructure directly from traditional off-chain applications. This enables building end-user applications that use decentralized AI inference that is end-to-end verified and secured by the OpenGradient network.
Using our SDK, you can ensure the full integrity and security of AI models and inferences at a more competitive price. Behind the scenes, our inference methods trigger an on-chain inference transaction on OpenGradient, which means it's fully verified, traceable, and secured by the entire value of our network.
You can find the complete function declarations and documentation in our API Reference.
SDK Initialization
In order to use the SDK library, you first need to initialize it by calling og.init
with the private key of your blockchain account and the email and password of your Model Hub account you created during credentials setup.
import opengradient as og
og.init(private_key="<private_key>", email="<email>", password="<password>")
TIP
You can use opengradient config show
to get these values from your local config file.
Refer to the API Reference for more information.
Model Inference
The inference is exposed through the infer
method:
def infer(model_cid, inference_mode, model_input)
NOTE
ZKML has some restrictions on what types of models are supported. Please check here for details on what these restrictions are.
API Reference
The complete function definiton and documentation can be found here.
Example
import opengradient as og
import numpy as np
# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")
# run inference
tx_hash, model_output = og.infer(
model_cid='QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ',
model_input={
"num_input1": [1.0, 2.0, 3.0],
"num_input2": 10,
"str_input1": np.array(["hello", "ONNX"]),
"str_input2": " world"
},
inference_mode=og.InferenceMode.VANILLA
)
# print output
print(model_output)
Inference CLI Usage
The infer
command takes the following arguments: model CID, inference mode (VANILLA, TEE, or ZKML), and model input as a dictionary of a type string or as a JSON file:
Using inline input:
opengradient infer -m QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ \
--mode VANILLA \
--input '{"num_input1":[1.0, 2.0, 3.0], "num_input2":10, "str_input1":["hello", "ONNX"], "str_input2":" world"}'
Using file input:
opengradient infer -m QmbUqS93oc4JTLMHwpVxsE39mhNxy6hpf6Py3r9oANr8aZ --mode VANILLA --input-file input.json
TIP
Remember to initialize your configuration using opengradient config init
if you haven't already
To get more information on how to make inferences using the CLI, you can run:
opengradient infer --help
LLM Inference
The SDK currently supports two types of LLM inference:
llm_completion
for simple LLM completionsllm_chat
for more advanced LLM chat completions (including tool-usage)
Along with these two types of LLM inference, we also support two flavors of execution:
og.LlmInferenceMode.VANILLA
: (default) normal inference with no security backingog.LlmInferenceMode.TEE
: text inference run within a trusted execution environment
More information on TEE LLMs can be found here.
Both of these functions mostly mirror the OpenAI APIs, however there are some minor diferences.
def llm_completion(
model_cid,
prompt,
max_tokens=100,
temperature=0.0,
stop_sequence=None)
def llm_chat(
model_cid,
messages,
max_tokens=100,
temperature=0.0,
stop_sequence=None,
tools=[],
tool_choice=None)
LLM API Reference
For full definitions and documentation on these methods, please check the API reference:
Completion Example
import opengradient as og
# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")
# run LLM inference
tx_hash, response = og.llm_completion(
model_cid='meta-llama/Meta-Llama-3-8B-Instruct',
prompt="Translate the following English text to French: 'Hello, how are you?'",
max_tokens=50,
temperature=0.0
)
# print output
print("Transaction Hash:", tx_hash)
print("LLM Output:", response)
Chat Example
import opengradient as og
# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")
# create messages history
messages = [
{
"role": "system",
"content": "You are a helpful AI assistant.",
"name": "HAL"
},
{
"role": "user",
"content": "Hello! How are you doing? Can you repeat my name?",
}]
# run LLM inference
tx_hash, finish_reason, message = og.llm_chat(
model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3,
messages=messages
)
# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)
Chat Example with Tools
# Define your tools
tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type":
"string",
"description":
"The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type":
"string",
"description":
"the two-letter abbreviation for the state that the city is"
" in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city", "state", "unit"]
},
}
}]
# Message conversation
messages = [
{
"role": "system",
"content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
},
{
"role": "user",
"content": "Hi! How are you doing today?"
},
{
"role": "assistant",
"content": "I'm doing well! How can I help you?",
},
{
"role":
"user",
"content":
"Can you tell me what the temperate will be in Dallas, in fahrenheit?"
}]
tx_hash, finish_reason, message = og.llm_chat(model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3, messages=messages, tools=tools)
# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)
LLM CLI Usage
We also have explicit support for using LLMs through the completion
and chat
commands in the CLI.
For example, you can run a competion inference with Llama-3 using the following command:
opengradient completion --model "meta-llama/Meta-Llama-3-8B-Instruct" --prompt "hello who are you?" --max-tokens 50
Or you can use files instead of text input in order to simplify your command:
opengradient chat --model "mistralai/Mistral-7B-Instruct-v0.3" --messages-file messages.json --tools-file tools.json --max-tokens 200
The list of models we support can be found in the Model Hub.
To get more information on how to run LLM's using the CLI, you can run:
opengradient completion --help
opengradient chat --help
TEE LLMs
OpenGradient now supports LLM inference within trusted execution environments (TEEs). In order to deliver useful LLMs, we've enabled this technology using Intel TDX and NVIDIA H100 GPUs with confidential compute. This means you can now make both chat
and completion
inference requests to models in a fully hardware attested environment. In order to utilize this, use one of the supported models below and use the flags:
inference_mode=og.LlmInferenceMode.TEE
for the python SDK--mode TEE
for the CLI.
Supported Models:
- meta-llama/Llama-3.1-70B-Instruct
- More models to come!
NOTE
This technology is so new so access may be periodically restricted due to usage limitations.
SDK API Reference
Please refer to our API Reference for any additional details around the SDK methods.