Skip to content

Inference Node

Inference nodes are stateless worker nodes that provide AI-related resources to the OpenGradient network. They supply GPUs for local model inference or provide secure access to external model providers like Anthropic or OpenAI.

Models are cached locally on inference nodes or downloaded as needed. After inference completes, proofs and attestations are settled and verified on the network asynchronously.

These nodes use TEE attestations or cryptographic proofs like ZKML to ensure privacy, security, and verifiability. Full nodes verify these proofs during settlement, so users can trust that their inference was executed correctly and securely.

Node Types

Depending on the model requested, inference requests are routed to different types of nodes:

1. LLM Proxy Nodes

LLM proxy nodes provide anonymous, private, and verifiable access to third-party LLM providers like Anthropic and OpenAI. These nodes run inside Trusted Execution Environments (TEEs) and act as secure intermediaries between users and external LLM APIs.

Key characteristics:

  • Privacy: User prompts and responses are processed inside the TEE—the node operator cannot see or log request data.
  • Anonymity: Requests are distributed across different nodes and cannot easily be tied back to your identity with the underlying LLM provider.
  • Verifiability: TEE attestations and cryptographic signing ensure that inference results are authentic and untampered.
  • Traceability: Applications like AI agents can prove their reasoning logic—every inference is cryptographically signed and traceable on-chain.
  • Provider Access: Route requests to OpenAI, Anthropic, and other LLM providers through secure, attested connections.

LLM proxy nodes are ideal for applications that need verifiable AI reasoning, such as autonomous agents where you need to prove which prompts led to specific actions.

Read more in LLM Execution.

2. Local Inference Nodes

Local inference nodes run models directly on GPUs from the Model Hub, providing high-performance inference for open-source and custom models. These nodes can provide verification through multiple methods such as ZKML proofs or TEE attestations depending on the use case.

Key characteristics:

  • Local Execution: Models run directly on the node's GPU hardware
  • Model Caching: Models are cached locally or downloaded from the Model Hub as needed
  • Flexible Verification: Support for TEE attestations, ZKML proofs, or Vanilla verification depending on security requirements
  • Open Models: Run Llama, Mistral, and other open-source models from the Model Hub

Local inference nodes are ideal for ML model inference, custom fine-tuned models, and use cases where you want to run open-source models with cryptographic verification.

Read more in ML Execution.

Scalable and Verifiable Architecture

OpenGradient's architecture separates the fast path (inference) from the verification path (settlement), delivering web2 speed with decentralized security.

Inference nodes register with the network and are verified before serving requests. For TEE nodes, registration confirms they run correct, untampered software—guaranteeing no logging or manipulation of user requests. Models are cached locally for fast access. Because inference doesn't go through the ledger, there's no blockchain bottleneck - you get the throughput and latency you'd expect from centralized infrastructure, with cryptographic verification handled asynchronously.

This separation of concerns enables cost-effective, scalable inference while maintaining full verification and security—combining scalable execution with decentralized verification.

Private Inference Nodes

In the future, OpenGradient will support the ability to run private models. A user needs to spin off a private inference node to run a private model. Private nodes must share a model ID and register on the network, stating their intent to run a particular private model.