Skip to content

LLM Execution

OpenGradient provides x402-based LLM inference, a payment-gated HTTP API for running large language model inference on OpenGradient's decentralized network. This execution mode uses TEE (Trusted Execution Environments) for verification and is ideal for web applications, LLM-as-a-Service offerings, and applications that need flexible payment options without requiring synchronous on-chain transaction execution.

TIP

For ML model execution using PIPE with ZKML, TEE, and Vanilla verification, see ML Execution.

Key Features

  • Payment-Gated Access: Secure, cryptographically-verified payment before inference execution
  • Standard HTTP/REST: Familiar API patterns for web developers
  • Flexible Payment Options: Support for stablecoins, crypto, and traditional payment methods
  • TEE Verification: All LLM inferences are verified using Trusted Execution Environments
  • Provable Prompt Usage: Cryptographically prove which prompts were used for any inference across all settlement modes, enabling transparent verification of agent actions and decision-making processes
  • Optional Facilitators: Use facilitator services to simplify payment verification and settlement
  • Low Latency: Off-chain execution with on-chain payment settlement

How It Works

The x402 LLM inference flow follows a secure payment-gated pattern with TEE verification:

TIP

You can read more on the x402 standard here.

1. Initial Request

The client makes an HTTP request to the LLM inference endpoint:

http
POST /v1/chat/completions HTTP/1.1
Host: llm.opengradient.ai
Content-Type: application/json

{
  "model": "openai/gpt-4o",
  "prompt": "Explain quantum computing in simple terms",
  "max_tokens": 200,
  "temperature": 0.7
}

2. Payment Requirement Response

The server responds with a 402 PAYMENT-REQUIRED status and payment details:

http
HTTP/1.1 402 Payment Required
PAYMENT-REQUIRED: {
  "amount": "0.001",
  "currency": "OUSDC",
  "chain_id": 10744,
  "payment_id": "0x1234...",
  "expires_at": "2024-01-15T10:30:00Z"
}

NOTE

The HTTP response format shown in these examples is illustrative. Actual responses from the API may differ in structure or field names. Always refer to the actual API responses when implementing integrations.

3. Payment Creation

The client creates a payment payload and cryptographically signs it:

javascript
// Example payment payload creation
const paymentPayload = {
  payment_id: "0x1234...",
  amount: "0.001",
  currency: "OUSDC",
  chain_id: 10744,
  timestamp: Date.now(),
  nonce: generateNonce()
};

// Sign the payment payload
const signature = await wallet.signMessage(
  JSON.stringify(paymentPayload)
);

4. Payment Submission

The client resubmits the request with the payment signature:

http
POST /v1/chat/completions HTTP/1.1
Host: llm.opengradient.ai
Content-Type: application/json
PAYMENT-SIGNATURE: {
  "payload": {...},
  "signature": "0xabcd...",
  "address": "0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb"
}

{
  "model": "openai/gpt-4o",
  "prompt": "Explain quantum computing in simple terms",
  "max_tokens": 200,
  "temperature": 0.7
}

5. Payment Verification

The server (or optional Facilitator) verifies the payment signature. The LLM server at llm.opengradient.ai handles payment verification internally, which may involve interaction with the facilitator contract at address 0x339c7de83d1a62edafbaac186382ee76584d294f.

6. Inference Execution with TEE Verification

Once payment is verified, the server executes the LLM inference on OpenGradient's decentralized network using TEE nodes. The inference is routed through TEE nodes to third-party LLM APIs, and the results are returned with TEE attestation:

http
HTTP/1.1 200 OK
Content-Type: application/json
PAYMENT-RESPONSE: {
  "payment_id": "0x1234...",
  "tx_hash": "0x5678...",
  "settled": true
}

{
  "model": "openai/gpt-4o",
  "completion": "Quantum computing is a revolutionary computing paradigm...",
  "finish_reason": "stop",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 187,
    "total_tokens": 199
  },
  "verification": {
    "method": "TEE",
    "proof": "0x9abc...",
    "verified_by": "opengradient-network"
  }
}

NOTE

The HTTP response format shown in these examples is illustrative. Actual responses from the API may differ in structure or field names. Always refer to the actual API responses when implementing integrations.

7. Payment Settlement

After inference execution, the payment is settled on-chain (optionally via Facilitator). The LLM server at llm.opengradient.ai handles payment settlement internally, which may involve interaction with the facilitator contract at address 0x339c7de83d1a62edafbaac186382ee76584d294f to submit the transaction to the blockchain.

8. LLM Settlement (Proof Verification)

After the inference is executed and the TEE attestation is generated, the proof of TEE inference is posted and verified on the blockchain. This LLM settlement process ensures that:

  • Proof Posting: The TEE attestation proof is posted to the blockchain as part of the settlement transaction
  • On-Chain Verification: The proof is verified on-chain by validators to ensure the inference was executed correctly
  • Immutable Record: The proof and inference results are permanently recorded on-chain for auditability and transparency

The settlement transaction includes:

  • The TEE attestation proof
  • Inference data (varies by settlement mode - see Settlement Modes below)
  • Payment settlement information
  • Timestamp and block information

The specific inference data included depends on the settlement mode chosen by the client:

  • SETTLE_INDIVIDUAL: Input/output hashes only
  • SETTLE_BATCH: Batch hashes for multiple inferences
  • SETTLE_INDIVIDUAL_WITH_METADATA: Full model information, complete input and output data, and all inference metadata

Once the proof is posted and verified on-chain, the inference execution is considered fully settled and verified. This provides cryptographic guarantees that the LLM inference was executed correctly within the trusted execution environment.

Settlement Modes

Clients can choose from three modes of settlement, each offering different levels of on-chain data visibility:

  • SETTLE_INDIVIDUAL: Includes only input/output hashes for individual inference. This is the most gas-efficient option, storing minimal data on-chain while still providing cryptographic proof of execution. Input hashes enable verification of which prompts were used, allowing you to verify the full prompt data against the hash when needed.

  • SETTLE_BATCH: Batch hashes for multiple inferences. Useful for applications that need to settle multiple inferences in a single transaction, reducing gas costs per inference. Like SETTLE_INDIVIDUAL, batch hashes provide cryptographic proof of prompt usage for all inferences in the batch, enabling verification of which prompts were used for each inference.

  • SETTLE_INDIVIDUAL_WITH_METADATA: Includes full model information, complete input and output data, and all inference metadata. This mode provides full visibility on block explorers and is particularly useful for:

    • Proof of Agent Actions: Cryptographically prove that a specific agent used prompt X to take action Y, enabling full transparency for autonomous agent decisions and their reasoning
    • Resolution Verification: Verify that resolutions or decisions used the correct prompt with accurate data inputs, ensuring fair and transparent outcomes
    • Agent Auditing: Applications that require complete transparency and auditability of inference execution, such as DeFi agents making trading or portfolio management decisions

TEE Verification

x402 LLM inference uses TEE (Trusted Execution Environments) for verification. TEE is the standard and only verification method for LLM execution.

Characteristics:

  • Security: Instantly verified using hardware attestation
  • Overhead: Negligible overhead compared to other verification methods
  • Model Compatibility: LLMs and large models
  • Best For: LLMs routed through TEE nodes to third-party providers

TEE provides strong security guarantees with minimal performance impact, making it ideal for LLM inference. TEE nodes route LLM requests to third-party LLM APIs (like OpenAI, Anthropic, etc.) while providing cryptographic attestation that the inference was executed correctly.

How TEE Works for LLMs:

  • LLM requests are routed through TEE nodes to third-party LLM providers
  • TEE nodes provide hardware-attested verification of the inference execution
  • The attestation proves that the inference was executed correctly within the trusted environment
  • All inferences are verified by OpenGradient's decentralized network using TEE attestation
  • Provable Prompt Usage: TEE attestation provides cryptographic proof of which prompt was used for a given inference, enabling:
    • Agent Action Verification: Prove that a specific agent used prompt X to take a particular action, providing full transparency and auditability for autonomous agent decisions
    • Resolution Verification: Verify that resolutions or decisions used the correct prompt with accurate data inputs, ensuring fair and transparent outcomes
    • On-Chain Auditability: All settlement modes provide cryptographic proof of prompt usage through input hashes. With SETTLE_INDIVIDUAL_WITH_METADATA, the complete prompt and inference data are recorded on-chain, creating an immutable record. With SETTLE_INDIVIDUAL and SETTLE_BATCH, input hashes enable verification of which prompts were used, with the ability to verify the full prompt data against the hash when needed

NOTE

TEE verification is the standard and only verification method for x402 LLM inference. For ML execution with multiple verification options (ZKML, TEE, Vanilla), see ML Execution.

Facilitators

Facilitators are optional services that handle payment verification and settlement complexity. OpenGradient provides a facilitator service and endpoint (though others can run facilitator services too). Facilitators provide:

  • Payment Verification: Cryptographic verification of payment signatures
  • Settlement Management: On-chain transaction submission and confirmation
  • Payment Method Abstraction: Support for multiple payment methods (stablecoins, crypto, fiat)
  • Rate Limiting & Quotas: Usage tracking and rate limiting
  • Receipt Generation: Transaction receipts and audit trails

OpenGradient Facilitator:

  • Endpoint: llm.opengradient.ai (LLM server endpoint that handles facilitator interactions)
  • Facilitator Address: 0x339c7de83d1a62edafbaac186382ee76584d294f

When you send requests to llm.opengradient.ai, the LLM server handles payment verification and settlement internally, interacting with the facilitator contract as needed.

NOTE

Facilitators are optional. Servers can handle payment verification and settlement directly when accepting stablecoins or crypto payments. OpenGradient provides a facilitator contract at address 0x339c7de83d1a62edafbaac186382ee76584d294f, but others can also deploy and use their own facilitator contracts.

Supported Models

x402 LLM inference supports the following models:

  • openai/gpt-4.1
  • openai/gpt-4o
  • anthropic/claude-4.0-sonnet
  • anthropic/claude-3.5-haiku
  • x-ai/grok-3-beta
  • x-ai/grok-3-mini-beta
  • x-ai/grok-4-1-fast-non-reasoning
  • google/gemini-2.5-flash-preview
  • google/gemini-2.5-pro-preview

These models are routed through TEE nodes to third-party LLM APIs. For more information on TEE LLMs, see TEE LLMs.

Integration Examples

Python SDK

The OpenGradient Python SDK provides high-level abstractions for x402 LLM inference:

python
import opengradient as og

# Initialize SDK
og.init(
    private_key="<private_key>",
    email="<email>",
    password="<password>"
)

# Run LLM inference via x402
response = og.llm_completion(
    model_cid='openai/gpt-4o',
    prompt="Explain quantum computing in simple terms",
    max_tokens=200,
    temperature=0.7,
    execution_mode=og.ExecutionMode.X402  # Use x402 execution
)

print("Completion:", response)
print("Payment TX:", response.payment_tx_hash)

TIP

For more details on using the Python SDK for LLM inference, see the LLM SDK Guide.

Direct HTTP Integration

NOTE

We recommend using the Python SDK when possible rather than direct HTTP integration, as it handles payment signing, verification, and error handling automatically.

For direct HTTP integration without the SDK:

python
import requests
from eth_account import Account

# Step 1: Initial request
response = requests.get(
    "https://llm.opengradient.ai/v1/chat/completions",
    json={
        "model": "openai/gpt-4o",
        "prompt": "Explain quantum computing",
        "max_tokens": 200
    }
)

if response.status_code == 402:
    # Step 2: Get payment requirement
    payment_required = response.headers.get("PAYMENT-REQUIRED")
    
    # Step 3: Create and sign payment
    account = Account.from_key("your_private_key")
    payment_payload = create_payment_payload(payment_required)
    signature = account.sign_message(payment_payload)
    
    # Step 4: Resubmit with payment signature
    headers = {
        "PAYMENT-SIGNATURE": json.dumps({
            "payload": payment_payload,
            "signature": signature.signature.hex(),
            "address": account.address
        })
    }
    
    response = requests.get(
        "https://llm.opengradient.ai/v1/chat/completions",
        headers=headers,
        json={
            "model": "openai/gpt-4o",
            "prompt": "Explain quantum computing",
            "max_tokens": 200
        }
    )
    
    # Step 5: Get inference results
    result = response.json()
    print("Completion:", result["completion"])

JavaScript/TypeScript Integration

typescript
import { ethers } from "ethers";

async function runX402LLMInference(
  prompt: string,
  model: string
): Promise<string> {
  const wallet = new ethers.Wallet(process.env.PRIVATE_KEY!);
  
  // Step 1: Initial request
  let response = await fetch("https://llm.opengradient.ai/v1/chat/completions", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ model, prompt, max_tokens: 200 })
  });
  
  if (response.status === 402) {
    // Step 2: Get payment requirement
    const paymentRequired = JSON.parse(
      response.headers.get("PAYMENT-REQUIRED")!
    );
    
    // Step 3: Create and sign payment
    const paymentPayload = {
      ...paymentRequired,
      timestamp: Date.now(),
      nonce: Math.random().toString(36)
    };
    
    const signature = await wallet.signMessage(
      JSON.stringify(paymentPayload)
    );
    
    // Step 4: Resubmit with payment
    response = await fetch("https://llm.opengradient.ai/v1/chat/completions", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "PAYMENT-SIGNATURE": JSON.stringify({
          payload: paymentPayload,
          signature,
          address: wallet.address
        })
      },
      body: JSON.stringify({ model, prompt, max_tokens: 200 })
    });
  }
  
  // Step 5: Get results
  const result = await response.json();
  return result.completion;
}

Use Cases

x402 LLM inference is ideal for:

  • LLM-as-a-Service: Building private and verifiable LLM inference services
  • Web Applications: Integrating AI capabilities into web apps via REST APIs
  • Microservices: Adding AI inference to existing microservice architectures
  • Content Generation: Building content generation tools and applications
  • Chat Applications: Creating chat interfaces with verified LLM backends
  • API Gateways: Providing AI inference through API gateways and proxies
  • AI Agents with Provable Actions: Building autonomous agents where you can cryptographically prove which prompt was used to take a specific action, enabling full transparency and auditability for agent decisions
  • Resolution and Decision Verification: Verifying that resolutions or decisions used the correct prompt with accurate data inputs, ensuring fair and transparent outcomes with on-chain proof of the logic used

Security Considerations

  • Payment Verification: All payments are cryptographically verified before inference execution
  • Signature Validation: Payment signatures are validated to ensure authenticity
  • Expiration Handling: Payment requirements include expiration times to prevent replay attacks
  • Nonce Management: Payment payloads include nonces to prevent duplicate payments
  • TEE Verification: All inferences are verified by OpenGradient's decentralized network using TEE attestation

TIP

You can read more on the x402 standard here.

Next Steps