LLM Inference
Overview
The OpenGradient Python SDK provides a simple interface for running LLM inference on OpenGradient's decentralized network. Under the hood, the SDK uses x402, a payment-gated HTTP standard that extends HTTP with payment requirements using the 402 Payment Required status code. This enables secure, cryptographically-verified payments before inference execution, making it ideal for pay-per-use LLM APIs.
The OpenGradient network and SDK handles all the complexity of payment signing, verification, and settlement automatically, so you can focus on building your application. All LLM inferences are executed on OpenGradient's decentralized network and verified using TEE (Trusted Execution Environments), providing cryptographic guarantees that your prompts and results are handled correctly.
Methods
The SDK currently supports two types of LLM inference:
llm_completionfor simple LLM completionsllm_chatfor more advanced LLM chat completions (including tool-usage)
Both inference types support two execution modes:
og.LlmInferenceMode.VANILLA: standard inference execution on OpenGradient's decentralized network, providing verifiable on-chain results without hardware attestationog.LlmInferenceMode.TEE: verified and private inference through TEE nodes that route to third-party LLM APIs (OpenAI, Gemini, Anthropic, etc.)
TEE execution provides cryptographic verification of prompts for mission-critical applications (DeFi, financial services, healthcare, etc.) and ensures privacy of personal data through hardware-attested code auditing.
Both of these functions mostly mirror the OpenAI APIs, however there are some minor diferences.
TIP
To learn more about how LLM execution works on OpenGradient, including the x402 payment-gated API and TEE verification, see LLM Execution.
def llm_completion(
model_cid,
prompt,
inference_mode=og.LlmInferenceMode.VANILLA,
max_tokens=100,
temperature=0.0,
stop_sequence=None)
def llm_chat(
model_cid,
messages,
inference_mode=og.LlmInferenceMode.VANILLA,
max_tokens=100,
temperature=0.0,
stop_sequence=None,
tools=[],
tool_choice=None)Completion Example
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
result = client.llm_completion(
model_cid='openai/gpt-4o',
prompt="Translate the following English text to French: 'Hello, how are you?'",
max_tokens=50,
temperature=0.0
)
print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")Chat Example
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
messages = [
{
"role": "system",
"content": "You are a helpful AI assistant.",
"name": "HAL"
},
{
"role": "user",
"content": "Hello! How are you doing? Can you repeat my name?",
}
]
result = client.llm_chat(
model_cid="openai/gpt-4.1",
messages=messages
)
print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")Chat Example with Tools
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
# Define your tools
tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city", "state", "unit"]
},
}
}]
# Message conversation
messages = [
{
"role": "system",
"content": "You are a AI assistant that helps the user with tasks. Use tools if necessary.",
},
{
"role": "user",
"content": "Hi! How are you doing today?"
},
{
"role": "assistant",
"content": "I'm doing well! How can I help you?",
},
{
"role": "user",
"content": "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
}
]
result = client.llm_chat(
model_cid=og.LLM.MISTRAL_7B_INSTRUCT_V3,
messages=messages,
tools=tools
)
print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")TEE LLMs
OpenGradient supports LLM inference within trusted execution environments (TEEs), enabling verified and private access to both proprietary and open-source models. TEE nodes route requests to third-party LLM APIs (such as OpenAI, Gemini, Anthropic, and others) while providing critical security and verification guarantees.
Key Benefits
Verification for Mission-Critical Applications: TEE nodes enable prompt verification, making them ideal for mission-critical applications like DeFi protocols, financial services, healthcare systems, and other sensitive systems where you need cryptographic proof of what prompts were sent to the LLM.
Privacy Protection: Personal data and sensitive information remain private. TEE nodes audit and verify code execution, ensuring that your data is processed securely without exposure to unauthorized parties.
Hardware Attestation: Built on Intel TDX with confidential compute, TEE nodes provide hardware-level attestation of code execution, giving you cryptographic guarantees that the routing and verification code runs as expected before forwarding requests to third-party LLM APIs.
TIP
You can verify inference transactions on the OpenGradient block explorer.
Usage
To utilize TEE LLM inference, use the following flags:
inference_mode=og.LlmInferenceMode.TEEfor the python SDK--mode TEEfor the CLI.
TEE Examples
DeFi Smart Contract Analysis
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
# Analyze a smart contract with verified prompts for audit trail
contract_code = """
function transfer(address to, uint256 amount) public {
require(balanceOf[msg.sender] >= amount, "Insufficient balance");
balanceOf[msg.sender] -= amount;
balanceOf[to] += amount;
}
"""
result = client.llm_completion(
model_cid='gpt-4',
prompt=f"Analyze this Solidity smart contract function for security vulnerabilities:\n\n{contract_code}",
max_tokens=500,
temperature=0.0,
inference_mode=og.LlmInferenceMode.TEE # Verified prompt for audit compliance
)
print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")Privacy-Sensitive Healthcare Chat
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
# Chat with patient data - TEE ensures privacy and code verification
messages = [
{
"role": "system",
"content": "You are a medical assistant. Analyze patient symptoms and provide preliminary guidance."
},
{
"role": "user",
"content": "Patient: 45-year-old male, presenting with chest pain and shortness of breath. Blood pressure: 140/90. What are the potential causes?"
}
]
result = client.llm_chat(
model_cid='claude-3-opus',
messages=messages,
max_tokens=300,
temperature=0.1,
inference_mode=og.LlmInferenceMode.TEE # Patient data remains private
)
print(f"Response: {result.chat_output['content']}")
print(f"Payment hash: {result.payment_hash}")Financial Risk Assessment
import opengradient as og
import os
client = og.new_client(
email=None,
password=None,
private_key=os.environ.get("OG_PRIVATE_KEY"),
)
# Assess loan application with verified audit trail
loan_data = {
"applicant_income": 75000,
"credit_score": 720,
"debt_to_income": 0.35,
"loan_amount": 250000
}
prompt = f"""Assess this loan application for approval:
Income: ${loan_data['applicant_income']}
Credit Score: {loan_data['credit_score']}
Debt-to-Income Ratio: {loan_data['debt_to_income']}
Requested Loan: ${loan_data['loan_amount']}
Provide risk assessment and recommendation."""
result = client.llm_completion(
model_cid='gpt-4',
prompt=prompt,
max_tokens=250,
temperature=0.0,
inference_mode=og.LlmInferenceMode.TEE # Cryptographic proof of assessment criteria
)
print(f"Response: {result.completion_output}")
print(f"Payment hash: {result.payment_hash}")The TEE node routes your request to the third-party API while providing cryptographic verification of the prompt and ensuring your data remains private through hardware-attested code execution.
Supported Models
TEE LLMs support the following models:
openai/gpt-4.1openai/gpt-4oanthropic/claude-4.0-sonnetanthropic/claude-3.5-haikux-ai/grok-3-betax-ai/grok-3-mini-betax-ai/grok-4-1-fast-non-reasoninggoogle/gemini-2.5-flash-previewgoogle/gemini-2.5-pro-preview
NOTE
This technology is cutting-edge, so access may be periodically restricted due to usage limitations.
Settlement Modes
The SDK provides three settlement modes via og.x402SettlementMode that control how inference data is recorded on-chain for payment settlement and auditability. Each mode offers different trade-offs between data completeness, privacy, and transaction costs.
Available Modes
| Mode | Description | Use Case |
|---|---|---|
SETTLE | Records only cryptographic hashes of input/output | Privacy-focused applications needing proof of execution |
SETTLE_METADATA | Records full model info, complete input/output data, and all metadata | Maximum transparency and auditability |
SETTLE_BATCH | Aggregates multiple inferences into a single settlement | High-volume applications optimizing for cost |
SETTLE (Default)
Individual settlement with input/output hashes only. This is the most privacy-preserving option—actual data is not stored on-chain, only cryptographic hashes that can be used to verify execution.
result = client.llm_chat(
model_cid="openai/gpt-4.1",
messages=messages,
settlement_mode=og.x402SettlementMode.SETTLE # Default
)SETTLE_METADATA
Individual settlement with full metadata. Records complete model information, full input and output data, and all inference metadata on-chain. Provides maximum transparency and auditability but has higher gas costs due to larger data storage.
result = client.llm_chat(
model_cid="openai/gpt-4.1",
messages=messages,
settlement_mode=og.x402SettlementMode.SETTLE_METADATA
)Best for:
- Proof of Agent Actions: Cryptographically prove that a specific agent used prompt X to take action Y
- Resolution Verification: Verify that decisions used the correct prompt with accurate data inputs
- Agent Auditing: Applications requiring complete transparency, such as DeFi agents making trading decisions
SETTLE_BATCH
Batch settlement for multiple inferences. Aggregates multiple inference requests into a single settlement transaction using batch hashes. Most cost-efficient for high-volume applications with reduced per-inference transaction overhead.
result = client.llm_chat(
model_cid="openai/gpt-4.1",
messages=messages,
settlement_mode=og.x402SettlementMode.SETTLE_BATCH
)TIP
To learn more about how settlement works in the x402 payment protocol, see LLM Execution - Settlement Modes.
SDK API Reference
Please refer to our API Reference for any additional details around the SDK methods.
More Examples
You can find more examples using the SDK here.
