Skip to content

LLM Inference

The SDK currently supports two types of LLM inference:

  1. llm_completion for simple LLM completions
  2. llm_chat for more advanced LLM chat completions (including tool-usage)

Both inference types support two execution modes:

  1. og.LlmInferenceMode.VANILLA: standard inference execution on OpenGradient's decentralized network, providing verifiable on-chain results without hardware attestation
  2. og.LlmInferenceMode.TEE: verified and private inference through TEE nodes that route to third-party LLM APIs (OpenAI, Gemini, Anthropic, etc.)

TEE execution provides cryptographic verification of prompts for mission-critical applications (DeFi, financial services, healthcare, etc.) and ensures privacy of personal data through hardware-attested code auditing. More information on TEE LLMs can be found here.

Both of these functions mostly mirror the OpenAI APIs, however there are some minor diferences.

python
def llm_completion(
  model_cid, 
  prompt, 
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None)

def llm_chat(
  model_cid, 
  messages, 
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None, 
  tools=[], 
  tool_choice=None)

LLM API Reference

For full definitions and documentation on these methods, please check the API reference:

Completion Example

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# run LLM inference
tx_hash, response = og.llm_completion(
    model_cid='meta-llama/Meta-Llama-3-8B-Instruct',
    prompt="Translate the following English text to French: 'Hello, how are you?'",
    max_tokens=50,
    temperature=0.0
)

# print output
print("Transaction Hash:", tx_hash)
print("LLM Output:", response)

Chat Example

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# create messages history
messages = [
    {
        "role": "system",
        "content": "You are a helpful AI assistant.",
        "name": "HAL"
    },
    {
        "role": "user",
        "content": "Hello! How are you doing? Can you repeat my name?",
    }]

# run LLM inference
tx_hash, finish_reason, message = og.llm_chat(
  model_cid="openai/gpt-4.1",
  messages=messages
)

# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)

Chat Example with Tools

python
# Define your tools
tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type":
                        "string",
                        "description":
                        "The city to find the weather for, e.g. 'San Francisco'"
                    },
                    "state": {
                        "type":
                        "string",
                        "description":
                        "the two-letter abbreviation for the state that the city is"
                        " in, e.g. 'CA' which would mean 'California'"
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit to fetch the temperature in",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city", "state", "unit"]
            },
        }
}]

# Message conversation
messages = [
    {
        "role": "system",
        "content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
    },
    {
        "role": "user",
        "content": "Hi! How are you doing today?"
    }, 
    {
        "role": "assistant",
        "content": "I'm doing well! How can I help you?",
    }, 
    {
        "role":
        "user",
        "content":
        "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
    }]

tx_hash, finish_reason, message = og.llm_chat(model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3, messages=messages, tools=tools)

# print output
print("Transaction Hash:", tx_hash)
print("Finish Reason:", finish_reason)
print("LLM Output:", message)

LLM CLI Usage

We also have explicit support for using LLMs through the completion and chat commands in the CLI.

For example, you can run a competion inference with Llama-3 using the following command:

bash
opengradient completion --model "meta-llama/Meta-Llama-3-8B-Instruct" --prompt "hello who are you?" --max-tokens 50

Or you can use files instead of text input in order to simplify your command:

bash
opengradient chat --model "mistralai/Mistral-7B-Instruct-v0.3" --messages-file messages.json --tools-file tools.json --max-tokens 200

The list of models we support can be found in the Model Hub.

To get more information on how to run LLM's using the CLI, you can run:

bash
opengradient completion --help
opengradient chat --help

TEE LLMs

OpenGradient supports LLM inference within trusted execution environments (TEEs), enabling verified and private access to both proprietary and open-source models. TEE nodes route requests to third-party LLM APIs (such as OpenAI, Gemini, Anthropic, and others) while providing critical security and verification guarantees.

Key Benefits

Verification for Mission-Critical Applications: TEE nodes enable prompt verification, making them ideal for mission-critical applications like DeFi protocols, financial services, healthcare systems, and other sensitive systems where you need cryptographic proof of what prompts were sent to the LLM.

Privacy Protection: Personal data and sensitive information remain private. TEE nodes audit and verify code execution, ensuring that your data is processed securely without exposure to unauthorized parties.

Hardware Attestation: Built on Intel TDX with confidential compute, TEE nodes provide hardware-level attestation of code execution, giving you cryptographic guarantees that the routing and verification code runs as expected before forwarding requests to third-party LLM APIs.

Usage

To utilize TEE LLM inference, use the following flags:

  • inference_mode=og.LlmInferenceMode.TEE for the python SDK
  • --mode TEE for the CLI.

TEE Examples

DeFi Smart Contract Analysis

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# Analyze a smart contract with verified prompts for audit trail
contract_code = """
function transfer(address to, uint256 amount) public {
    require(balanceOf[msg.sender] >= amount, "Insufficient balance");
    balanceOf[msg.sender] -= amount;
    balanceOf[to] += amount;
}
"""

tx_hash, response = og.llm_completion(
    model_cid='gpt-4',
    prompt=f"Analyze this Solidity smart contract function for security vulnerabilities:\n\n{contract_code}",
    max_tokens=500,
    temperature=0.0,
    inference_mode=og.LlmInferenceMode.TEE  # Verified prompt for audit compliance
)

print("Transaction Hash (verifiable on-chain):", tx_hash)
print("Security Analysis:", response)

Privacy-Sensitive Healthcare Chat

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# Chat with patient data - TEE ensures privacy and code verification
messages = [
    {
        "role": "system",
        "content": "You are a medical assistant. Analyze patient symptoms and provide preliminary guidance."
    },
    {
        "role": "user",
        "content": "Patient: 45-year-old male, presenting with chest pain and shortness of breath. Blood pressure: 140/90. What are the potential causes?"
    }
]

tx_hash, finish_reason, message = og.llm_chat(
    model_cid='claude-3-opus',
    messages=messages,
    max_tokens=300,
    temperature=0.1,
    inference_mode=og.LlmInferenceMode.TEE  # Patient data remains private
)

print("Transaction Hash:", tx_hash)
print("Medical Guidance:", message)

Financial Risk Assessment

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# Assess loan application with verified audit trail
loan_data = {
    "applicant_income": 75000,
    "credit_score": 720,
    "debt_to_income": 0.35,
    "loan_amount": 250000
}

prompt = f"""Assess this loan application for approval:
Income: ${loan_data['applicant_income']}
Credit Score: {loan_data['credit_score']}
Debt-to-Income Ratio: {loan_data['debt_to_income']}
Requested Loan: ${loan_data['loan_amount']}

Provide risk assessment and recommendation."""

tx_hash, response = og.llm_completion(
    model_cid='gpt-4',
    prompt=prompt,
    max_tokens=250,
    temperature=0.0,
    inference_mode=og.LlmInferenceMode.TEE  # Cryptographic proof of assessment criteria
)

print("Audit Trail Hash:", tx_hash)
print("Risk Assessment:", response)

CLI Usage with TEE

You can also use TEE mode from the command line:

bash
# DeFi contract analysis with verification
opengradient completion \
  --model "gpt-4" \
  --prompt "Analyze this smart contract for reentrancy vulnerabilities..." \
  --max-tokens 500 \
  --mode TEE

# Chat with privacy guarantees
opengradient chat \
  --model "claude-3-opus" \
  --messages-file patient_query.json \
  --max-tokens 300 \
  --mode TEE

The TEE node routes your request to the third-party API while providing cryptographic verification of the prompt and ensuring your data remains private through hardware-attested code execution.

Supported Models

TEE LLMs support routing to various third-party providers including:

  • OpenAI models (GPT-4, GPT-3.5, etc.)
  • Google Gemini models
  • Anthropic Claude models
  • Open-source models like meta-llama/Llama-3.1-70B-Instruct
  • More models and providers coming soon!

NOTE

This technology is cutting-edge, so access may be periodically restricted due to usage limitations.

SDK API Reference

Please refer to our API Reference for any additional details around the SDK methods.