Skip to content

Verifiable LLM Inference via x402

Overview

The OpenGradient Python SDK provides a simple interface for running LLM inference on OpenGradient's decentralized network. Under the hood, the SDK uses x402, a payment-gated HTTP standard that extends HTTP with payment requirements using the 402 Payment Required status code. Inference is paid for using $OPG testnet tokens on Base Sepolia, while execution and verification is handled by the OpenGradient network.

The SDK handles all the complexity of payment signing, verification, and settlement automatically, so you can focus on building your application. All LLM inferences are executed on OpenGradient's decentralized network and verified using TEE (Trusted Execution Environments), providing cryptographic guarantees that your prompts and results are handled correctly.

Methods

The SDK currently supports two types of LLM inference:

  1. client.llm.completion() for simple LLM completions
  2. client.llm.chat() for more advanced LLM chat completions (including tool-usage)

Additionally, client.llm.ensure_opg_approval() is used to ensure your wallet has approved sufficient $OPG tokens for Permit2 spending before making requests (see Permit2 Approval).

All LLM inference runs in TEE (Trusted Execution Environment), providing cryptographic verification of prompts for mission-critical applications (DeFi, financial services, healthcare, etc.) and ensuring privacy of personal data through hardware-attested code auditing.

Both of these functions mostly mirror the OpenAI APIs, however there are some minor differences.

TIP

To learn more about how LLM execution works on OpenGradient, including the x402 payment-gated API and TEE verification, see LLM Execution.

python
def client.llm.completion(
  model: og.TEE_LLM,
  prompt: str,
  max_tokens: int = 100,
  temperature: float = 0.0,
  stop_sequence: Optional[List[str]] = None,
  x402_settlement_mode: Optional[og.x402SettlementMode] = og.x402SettlementMode.SETTLE_BATCH
) -> og.TextGenerationOutput

def client.llm.chat(
  model: og.TEE_LLM,
  messages: List[Dict],
  max_tokens: int = 100,
  temperature: float = 0.0,
  stop_sequence: Optional[List[str]] = None,
  tools: Optional[List[Dict]] = None,
  tool_choice: Optional[str] = None,
  x402_settlement_mode: Optional[og.x402SettlementMode] = og.x402SettlementMode.SETTLE_BATCH,
  stream: bool = False
) -> Union[og.TextGenerationOutput, og.TextGenerationStream]

x402 Payment

LLM inference is paid for using $OPG testnet tokens on Base Sepolia. All other operations — TEE node registration, inference execution, proof settlement, and verification — happen on the OpenGradient network. The SDK handles all payment signing and settlement automatically.

PropertyValue
Payment NetworkBase Sepolia
Token$OPG (0x240b09731D96979f50B2C649C9CE10FcF9C7987F)
Chain ID84532
Proof SettlementOpenGradient Network
python
client = og.Client(private_key="0x...")

NOTE

Make sure your wallet has $OPG testnet tokens on Base Sepolia. You can get tokens from our faucet.

Permit2 Approval

Before making LLM requests, your wallet must have approved sufficient $OPG tokens for Permit2 spending. Call client.llm.ensure_opg_approval() to set the allowance — it only sends an on-chain transaction when the current allowance is below the requested amount.

python
# Approve at least 5 OPG for Permit2 spending (only transacts if needed)
approval = client.llm.ensure_opg_approval(opg_amount=5.0)

print(f"Allowance before: {approval.allowance_before}")
print(f"Allowance after: {approval.allowance_after}")
print(f"Tx hash: {approval.tx_hash}")  # None if no approval was needed

TIP

You only need to call ensure_opg_approval once (or when your allowance runs low). If the current Permit2 allowance is already sufficient, it returns immediately without sending a transaction.

Completion Example

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

result = client.llm.completion(
    model=og.TEE_LLM.GPT_4O,
    prompt="Translate the following English text to French: 'Hello, how are you?'",
    max_tokens=50,
    temperature=0.0
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

Chat Example

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

messages = [
    {
        "role": "system",
        "content": "You are a helpful AI assistant.",
        "name": "HAL"
    },
    {
        "role": "user",
        "content": "Hello! How are you doing? Can you repeat my name?",
    }
]

result = client.llm.chat(
    model=og.TEE_LLM.GPT_4_1_2025_04_14,
    messages=messages
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

Streaming Chat Example

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

client.llm.ensure_opg_approval(opg_amount=5.0)

messages = [
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language."},
    {"role": "user", "content": "What makes it good for beginners?"},
]

stream = client.llm.chat(
    model=og.TEE_LLM.GPT_4_1_2025_04_14,
    messages=messages,
    stream=True,
    max_tokens=300,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Chat Example with Tools

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

# Define your tools
tools = [{
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to find the weather for, e.g. 'San Francisco'"
                    },
                    "state": {
                        "type": "string",
                        "description": "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
                    },
                    "unit": {
                        "type": "string",
                        "description": "The unit to fetch the temperature in",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city", "state", "unit"]
            },
        }
}]

# Message conversation
messages = [
    {
        "role": "system",
        "content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
    },
    {
        "role": "user",
        "content": "Hi! How are you doing today?"
    }, 
    {
        "role": "assistant",
        "content": "I'm doing well! How can I help you?",
    }, 
    {
        "role": "user",
        "content": "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
    }
]

result = client.llm.chat(
    model=og.TEE_LLM.GPT_4O,
    messages=messages,
    tools=tools
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

TEE Verification

All LLM inference on OpenGradient runs within trusted execution environments (TEEs), enabling verified and private access to both proprietary and open-source models. TEE nodes route requests to third-party LLM APIs (such as OpenAI, Gemini, Anthropic, and others) while providing critical security and verification guarantees.

Key Benefits

Verification for Mission-Critical Applications: TEE nodes enable prompt verification, making them ideal for mission-critical applications like DeFi protocols, financial services, healthcare systems, and other sensitive systems where you need cryptographic proof of what prompts were sent to the LLM.

Privacy Protection: Personal data and sensitive information remain private. TEE nodes audit and verify code execution, ensuring that your data is processed securely without exposure to unauthorized parties.

Hardware Attestation: Built on Intel TDX with confidential compute, TEE nodes provide hardware-level attestation of code execution, giving you cryptographic guarantees that the routing and verification code runs as expected before forwarding requests to third-party LLM APIs.

TIP

You can verify inference transactions on the OpenGradient block explorer.

Examples

DeFi Smart Contract Analysis

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

# Analyze a smart contract with verified prompts for audit trail
contract_code = """
function transfer(address to, uint256 amount) public {
    require(balanceOf[msg.sender] >= amount, "Insufficient balance");
    balanceOf[msg.sender] -= amount;
    balanceOf[to] += amount;
}
"""

result = client.llm.completion(
    model=og.TEE_LLM.GPT_4O,
    prompt=f"Analyze this Solidity smart contract function for security vulnerabilities:\n\n{contract_code}",
    max_tokens=500,
    temperature=0.0
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

Privacy-Sensitive Healthcare Chat

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

# Chat with patient data - TEE ensures privacy and code verification
messages = [
    {
        "role": "system",
        "content": "You are a medical assistant. Analyze patient symptoms and provide preliminary guidance."
    },
    {
        "role": "user",
        "content": "Patient: 45-year-old male, presenting with chest pain and shortness of breath. Blood pressure: 140/90. What are the potential causes?"
    }
]

result = client.llm.chat(
    model=og.TEE_LLM.CLAUDE_4_0_SONNET,
    messages=messages,
    max_tokens=300,
    temperature=0.1
)

print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")

Financial Risk Assessment

python
import opengradient as og
import os

client = og.Client(
    private_key=os.environ.get("OG_PRIVATE_KEY"),
)

# Ensure Permit2 allowance for OPG payments
client.llm.ensure_opg_approval(opg_amount=5.0)

# Assess loan application with verified audit trail
loan_data = {
    "applicant_income": 75000,
    "credit_score": 720,
    "debt_to_income": 0.35,
    "loan_amount": 250000
}

prompt = f"""Assess this loan application for approval:
Income: ${loan_data['applicant_income']}
Credit Score: {loan_data['credit_score']}
Debt-to-Income Ratio: {loan_data['debt_to_income']}
Requested Loan: ${loan_data['loan_amount']}

Provide risk assessment and recommendation."""

result = client.llm.completion(
    model=og.TEE_LLM.GPT_4O,
    prompt=prompt,
    max_tokens=250,
    temperature=0.0
)

print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")

The TEE node routes your request to the third-party API while providing cryptographic verification of the prompt and ensuring your data remains private through hardware-attested code execution.

Supported Models

TEE LLMs support the following models:

  • openai/gpt-4.1-2025-04-14
  • openai/gpt-4o
  • openai/o4-mini
  • anthropic/claude-4.0-sonnet
  • anthropic/claude-3.7-sonnet
  • anthropic/claude-3.5-haiku
  • google/gemini-2.5-flash
  • google/gemini-2.5-pro
  • google/gemini-2.5-flash-lite
  • google/gemini-2.0-flash
  • x-ai/grok-3-beta
  • x-ai/grok-3-mini-beta
  • x-ai/grok-4.1-fast
  • x-ai/grok-4-1-fast-non-reasoning
  • x-ai/grok-2-1212
  • x-ai/grok-2-vision-latest

NOTE

This technology is cutting-edge, so access may be periodically restricted due to usage limitations.

Settlement Modes

The SDK provides three settlement modes via og.x402SettlementMode that control how inference data is recorded on the OpenGradient blockchain for proof settlement and auditability. Each mode offers different trade-offs between data completeness, privacy, and transaction costs.

Available Modes

ModeDescriptionUse Case
SETTLENo input or output hashes are posted to the chain. Your inference data remains completely off-chainPrivacy-focused applications where no on-chain data is needed
SETTLE_METADATARecords full model info, complete input/output data, and all metadataMaximum transparency and auditability
SETTLE_BATCHAggregates multiple inferences into a single settlementHigh-volume applications optimizing for cost

SETTLE

Individual settlement with no on-chain data. This is the most privacy-preserving option — no input or output hashes are posted to the chain, and your inference data remains completely off-chain.

python
result = client.llm.chat(
    model=og.TEE_LLM.GPT_4_1_2025_04_14,
    messages=messages,
    x402_settlement_mode=og.x402SettlementMode.SETTLE
)

SETTLE_METADATA

Individual settlement with full metadata. Records complete model information, full input and output data, and all inference metadata on-chain. Provides maximum transparency and auditability but has higher gas costs due to larger data storage.

python
result = client.llm.chat(
    model=og.TEE_LLM.GPT_4_1_2025_04_14,
    messages=messages,
    x402_settlement_mode=og.x402SettlementMode.SETTLE_METADATA
)

Best for:

  • Proof of Agent Actions: Cryptographically prove that a specific agent used prompt X to take action Y
  • Resolution Verification: Verify that decisions used the correct prompt with accurate data inputs
  • Agent Auditing: Applications requiring complete transparency, such as DeFi agents making trading decisions

SETTLE_BATCH (Default)

Batch settlement for multiple inferences. Aggregates multiple inference requests into a single settlement transaction using batch hashes. Most cost-efficient for high-volume applications with reduced per-inference transaction overhead.

python
result = client.llm.chat(
    model=og.TEE_LLM.GPT_4_1_2025_04_14,
    messages=messages,
    x402_settlement_mode=og.x402SettlementMode.SETTLE_BATCH
)

TIP

To learn more about how settlement works in the x402 payment protocol, see LLM Execution - Settlement Modes.

SDK API Reference

Please refer to our API Reference for any additional details around the SDK methods.

More Examples

You can find more examples using the SDK here.