Skip to content

LLM Inference

Overview

The OpenGradient Python SDK provides a simple interface for running LLM inference on OpenGradient's decentralized network. Under the hood, the SDK uses x402, a payment-gated HTTP standard that extends HTTP with payment requirements using the 402 Payment Required status code. This enables secure, cryptographically-verified payments before inference execution, making it ideal for pay-per-use LLM APIs.

The OpenGradient network and SDK handles all the complexity of payment signing, verification, and settlement automatically, so you can focus on building your application. All LLM inferences are executed on OpenGradient's decentralized network and verified using TEE (Trusted Execution Environments), providing cryptographic guarantees that your prompts and results are handled correctly.

Methods

The SDK currently supports two types of LLM inference:

  1. llm_completion for simple LLM completions
  2. llm_chat for more advanced LLM chat completions (including tool-usage)

Both inference types support two execution modes:

  1. og.LlmInferenceMode.VANILLA: standard inference execution on OpenGradient's decentralized network, providing verifiable on-chain results without hardware attestation
  2. og.LlmInferenceMode.TEE: verified and private inference through TEE nodes that route to third-party LLM APIs (OpenAI, Gemini, Anthropic, etc.)

TEE execution provides cryptographic verification of prompts for mission-critical applications (DeFi, financial services, healthcare, etc.) and ensures privacy of personal data through hardware-attested code auditing.

Both of these functions mostly mirror the OpenAI APIs, however there are some minor diferences.

TIP

To learn more about how LLM execution works on OpenGradient, including the x402 payment-gated API and TEE verification, see LLM Execution.

python
def llm_completion(
  model_cid, 
  prompt, 
  inference_mode=og.LlmInferenceMode.VANILLA,
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None)

def llm_chat(
  model_cid, 
  messages, 
  inference_mode=og.LlmInferenceMode.VANILLA,
  max_tokens=100, 
  temperature=0.0, 
  stop_sequence=None, 
  tools=[], 
  tool_choice=None)

Completion Example

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

result = client.llm_completion(
    model_cid='openai/gpt-4o',
    prompt="Translate the following English text to French: 'Hello, how are you?'",
    max_tokens=50,
    temperature=0.0
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

Chat Example

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

messages = [
    {
        "role": "system",
        "content": "You are a helpful AI assistant.",
        "name": "HAL"
    },
    {
        "role": "user",
        "content": "Hello! How are you doing? Can you repeat my name?",
    }
]

result = client.llm_chat(
    model_cid="openai/gpt-4.1",
    messages=messages
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

Chat Example with Tools

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Define your tools
tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'"
                    },
                    "state": {
                        "type": "string",
                        "description": "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit to fetch the temperature in",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city", "state", "unit"]
            },
        }
}]

# Message conversation
messages = [
    {
        "role": "system",
        "content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
    },
    {
        "role": "user",
        "content": "Hi! How are you doing today?"
    }, 
    {
        "role": "assistant",
        "content": "I'm doing well! How can I help you?",
    }, 
    {
        "role": "user",
        "content": "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
    }
]

result = client.llm_chat(
    model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3,
    messages=messages,
    tools=tools
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

TEE LLMs

OpenGradient supports LLM inference within trusted execution environments (TEEs), enabling verified and private access to both proprietary and open-source models. TEE nodes route requests to third-party LLM APIs (such as OpenAI, Gemini, Anthropic, and others) while providing critical security and verification guarantees.

Key Benefits

Verification for Mission-Critical Applications: TEE nodes enable prompt verification, making them ideal for mission-critical applications like DeFi protocols, financial services, healthcare systems, and other sensitive systems where you need cryptographic proof of what prompts were sent to the LLM.

Privacy Protection: Personal data and sensitive information remain private. TEE nodes audit and verify code execution, ensuring that your data is processed securely without exposure to unauthorized parties.

Hardware Attestation: Built on Intel TDX with confidential compute, TEE nodes provide hardware-level attestation of code execution, giving you cryptographic guarantees that the routing and verification code runs as expected before forwarding requests to third-party LLM APIs.

TIP

You can verify inference transactions on the OpenGradient block explorer.

Usage

To utilize TEE LLM inference, use the following flags:

  • inference_mode=og.LlmInferenceMode.TEE for the python SDK
  • --mode TEE for the CLI.

TEE Examples

DeFi Smart Contract Analysis

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Analyze a smart contract with verified prompts for audit trail
contract_code = """
function transfer(address to, uint256 amount) public {
    require(balanceOf[msg.sender] >= amount, "Insufficient balance");
    balanceOf[msg.sender] -= amount;
    balanceOf[to] += amount;
}
"""

result = client.llm_completion(
    model_cid='gpt-4',
    prompt=f"Analyze this Solidity smart contract function for security vulnerabilities:\n\n{contract_code}",
    max_tokens=500,
    temperature=0.0,
    inference_mode=og.LlmInferenceMode.TEE  # Verified prompt for audit compliance
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

Privacy-Sensitive Healthcare Chat

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Chat with patient data - TEE ensures privacy and code verification
messages = [
    {
        "role": "system",
        "content": "You are a medical assistant. Analyze patient symptoms and provide preliminary guidance."
    },
    {
        "role": "user",
        "content": "Patient: 45-year-old male, presenting with chest pain and shortness of breath. Blood pressure: 140/90. What are the potential causes?"
    }
]

result = client.llm_chat(
    model_cid='claude-3-opus',
    messages=messages,
    max_tokens=300,
    temperature=0.1,
    inference_mode=og.LlmInferenceMode.TEE  # Patient data remains private
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

Financial Risk Assessment

python
import opengradient as og
import os

client = og.new_client(
    email=None,
    password=None,
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Assess loan application with verified audit trail
loan_data = {
    "applicant_income": 75000,
    "credit_score": 720,
    "debt_to_income": 0.35,
    "loan_amount": 250000
}

prompt = f"""Assess this loan application for approval:
Income: ${loan_data['applicant_income']}
Credit Score: {loan_data['credit_score']}
Debt-to-Income Ratio: {loan_data['debt_to_income']}
Requested Loan: ${loan_data['loan_amount']}

Provide risk assessment and recommendation."""

result = client.llm_completion(
    model_cid='gpt-4',
    prompt=prompt,
    max_tokens=250,
    temperature=0.0,
    inference_mode=og.LlmInferenceMode.TEE  # Cryptographic proof of assessment criteria
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

The TEE node routes your request to the third-party API while providing cryptographic verification of the prompt and ensuring your data remains private through hardware-attested code execution.

Supported Models

TEE LLMs support the following models:

  • openai/gpt-4.1
  • openai/gpt-4o
  • anthropic/claude-4.0-sonnet
  • anthropic/claude-3.5-haiku
  • x-ai/grok-3-beta
  • x-ai/grok-3-mini-beta
  • x-ai/grok-4-1-fast-non-reasoning
  • google/gemini-2.5-flash-preview
  • google/gemini-2.5-pro-preview

NOTE

This technology is cutting-edge, so access may be periodically restricted due to usage limitations.

Settlement Modes

The SDK provides three settlement modes via og.x402SettlementMode that control how inference data is recorded on-chain for payment settlement and auditability. Each mode offers different trade-offs between data completeness, privacy, and transaction costs.

Available Modes

ModeDescriptionUse Case
SETTLERecords only cryptographic hashes of input/outputPrivacy-focused applications needing proof of execution
SETTLE_METADATARecords full model info, complete input/output data, and all metadataMaximum transparency and auditability
SETTLE_BATCHAggregates multiple inferences into a single settlementHigh-volume applications optimizing for cost

SETTLE (Default)

Individual settlement with input/output hashes only. This is the most privacy-preserving option—actual data is not stored on-chain, only cryptographic hashes that can be used to verify execution.

python
result = client.llm_chat(
    model_cid="openai/gpt-4.1",
    messages=messages,
    settlement_mode=og.x402SettlementMode.SETTLE  # Default
)

SETTLE_METADATA

Individual settlement with full metadata. Records complete model information, full input and output data, and all inference metadata on-chain. Provides maximum transparency and auditability but has higher gas costs due to larger data storage.

python
result = client.llm_chat(
    model_cid="openai/gpt-4.1",
    messages=messages,
    settlement_mode=og.x402SettlementMode.SETTLE_METADATA
)

Best for:

  • Proof of Agent Actions: Cryptographically prove that a specific agent used prompt X to take action Y
  • Resolution Verification: Verify that decisions used the correct prompt with accurate data inputs
  • Agent Auditing: Applications requiring complete transparency, such as DeFi agents making trading decisions

SETTLE_BATCH

Batch settlement for multiple inferences. Aggregates multiple inference requests into a single settlement transaction using batch hashes. Most cost-efficient for high-volume applications with reduced per-inference transaction overhead.

python
result = client.llm_chat(
    model_cid="openai/gpt-4.1",
    messages=messages,
    settlement_mode=og.x402SettlementMode.SETTLE_BATCH
)

TIP

To learn more about how settlement works in the x402 payment protocol, see LLM Execution - Settlement Modes.

SDK API Reference

Please refer to our API Reference for any additional details around the SDK methods.

More Examples

You can find more examples using the SDK here.