Skip to content

Design Principles

This page explores the reasoning behind OpenGradient's HACA architecture — the constraints that shaped it, the trade-offs that were made, and why the system works the way it does.

Why Blockchains Struggle with AI

The first question to answer is why you cannot simply deploy AI inference on an existing blockchain. The short answer: the execution model does not fit.

The Re-Execution Problem

Traditional blockchains achieve consensus through re-execution. Every validator independently runs every transaction and checks that they all arrive at the same state. This works when transactions are cheap, deterministic, and fast — a token transfer takes microseconds and every node will compute the same result.

AI inference breaks all three assumptions:

  • It is expensive. Running a model requires GPUs. Asking 100 validators to each run a 70B-parameter LLM for every request means 100x the compute cost for no additional value. The result is the same — you are just paying for it 100 times.
  • It is non-deterministic. LLMs with temperature > 0, models with dropout, and floating-point arithmetic across different hardware all produce slightly different outputs. Validators cannot simply compare outputs to check correctness.
  • It is slow. A single LLM inference can take seconds. If every validator must complete the inference before the block can advance, block times become impractical for any real application.

The Oracle Problem, Inverted

Some projects treat AI inference as an oracle — an external input that gets injected into the chain. But this just moves the trust problem. Who runs the oracle? How do you know the oracle ran the correct model with the correct input? You end up trusting a single party, which defeats the purpose.

OpenGradient takes the opposite approach: instead of fitting AI into an existing blockchain model, it builds a blockchain model around AI's actual requirements.

Separation of Execution and Verification

The core architectural principle of HACA is that execution and verification are independent operations that happen on separate timelines.

This is a stronger claim than just "we use off-chain compute." It means the verification layer is specifically designed to validate proofs without ever needing to see — let alone re-run — the original computation.

How This Works in Practice

Consider an LLM inference request:

  1. The user sends a prompt to an inference node.
  2. The inference node processes the request inside a TEE enclave, calls the underlying LLM provider, and returns the response to the user.
  3. The TEE generates an attestation — a hardware-backed certificate proving the enclave was running approved code, the prompt was forwarded unmodified, and the response was returned untampered.
  4. This attestation is submitted to full nodes, which verify it against the on-chain TEE registry (checking that the signing key matches a registered enclave with valid PCR values).
  5. The verified attestation is recorded on the ledger.

At no point does a full node need to know what the prompt was, what model was used, or what the response contained. It only needs to verify that the attestation is valid — a purely cryptographic operation.

For ZKML, the principle is the same but the mechanism differs. The inference node runs the model and produces a zero-knowledge proof. Full nodes verify the proof mathematically. They never see the model weights or the input data; they only confirm that the proof is valid.

Why This Matters

This separation has several implications:

Scalability. Adding more inference nodes increases throughput linearly without adding any load to the verification layer. Full nodes verify a proof in milliseconds regardless of whether the underlying inference took 50ms or 5 seconds.

Hardware heterogeneity. Inference nodes need GPUs. Full nodes do not. This means the validator set can be large and diverse (improving decentralization) without requiring every validator to own expensive hardware.

Latency. Users get responses immediately. Settlement happens in the background. The user experience is identical to calling a centralized API.

Privacy. Since full nodes only see proofs (not the actual prompts or responses), the verification layer is inherently privacy-preserving. For TEE inference, even the inference node operator cannot access the data.

Responsibilities by Node Type

A useful way to understand HACA is to look at what each node type is responsible for — and what it explicitly is not:

ResponsibilityFull NodesInference NodesData NodesStorage (Walrus)
Run consensusYesNoNoNo
Execute modelsNoYesNoNo
Verify proofsYesNoNoNo
Settle paymentsYesNoNoNo
Register nodesYes (verify)Yes (submit)Yes (submit)No
Serve inference resultsNoYesNoNo
Fetch external dataNoNoYesNo
Store models/proofsNoNoNoYes
Maintain ledgerYesNoNoNo

The diagonal pattern — each node type having a clear, non-overlapping responsibility — is intentional. It means:

  • An inference node going offline does not affect consensus or the ledger.
  • A full node does not need GPUs and can run on commodity hardware.
  • Data nodes are isolated from both inference and consensus, so a compromised data source does not affect model execution.
  • Storage is decoupled entirely; models can be updated, added, or removed without any impact on the live inference or verification paths.

The Verification Spectrum

Rather than mandating a single verification method, HACA provides a spectrum:

No verification ◄──────────────────────────────────► Maximum verification
   Vanilla              TEE                ZKML
   (signature only)     (hardware attestation)     (mathematical proof)

This is a pragmatic choice, not a compromise. Different applications have different risk profiles:

When TEE Makes Sense

TEE verification works well when:

  • The workload involves large models (LLMs) where ZKML overhead is prohibitive
  • Privacy is important — TEE enclaves protect both input and output from the operator
  • The trust model accepts hardware-based attestation (relying on the CPU manufacturer's security guarantees)

TEE is the default for all LLM inference on OpenGradient. It provides strong guarantees with negligible overhead.

When ZKML Makes Sense

ZKML verification is appropriate when:

  • The model is small enough that proof generation overhead (1000-10000x) is acceptable
  • The use case demands cryptographic certainty — no reliance on hardware trust assumptions
  • The inference result directly affects high-value outcomes (DeFi liquidations, financial scoring, on-chain governance)

ZKML provides the strongest possible guarantee: a mathematical proof that anyone can verify independently.

When Vanilla Makes Sense

Vanilla verification (signature only) is useful when:

  • The inference is exploratory or non-critical
  • You are prototyping and do not need verification overhead
  • The cost of incorrect results is low

Vanilla still records the inference on-chain with a signature, providing a basic audit trail without cryptographic proof of correct execution.

Mixing Methods

A single transaction on OpenGradient can use different verification methods for different model calls. For example, a DeFi application might use:

  • ZKML for the risk scoring model that determines whether to approve a loan
  • TEE for the LLM that generates a human-readable explanation of the decision
  • Vanilla for a logging model that summarizes the transaction for analytics

This per-inference flexibility means developers do not have to over-verify low-risk computations or under-verify critical ones.

Trust Model

HACA constructs trust from three independent pillars:

1. Hardware Attestation (TEE)

TEE nodes generate attestations from the CPU's secure enclave. These attestations are verified against known-good measurements (PCR values) stored on-chain. The chain of trust goes:

CPU hardware → Enclave attestation → On-chain registry → Client verification

The on-chain registry means trust verification is not centralized. Every validator independently checks attestations, and the registry is a smart contract that anyone can audit.

2. Mathematical Proof (ZKML)

Zero-knowledge proofs provide trust through mathematics alone — no hardware assumptions, no trusted parties. The chain of trust is:

Model execution → ZK proof generation → On-chain proof verification

Anyone with the proof can verify it independently. This is the strongest form of trust but comes with the highest computational overhead.

3. Consensus (BFT)

The underlying CometBFT consensus provides Byzantine fault tolerance. As long as fewer than 1/3 of validators are compromised, the network correctly verifies and records all proofs. This provides the foundation on which the other trust mechanisms are built.

These three pillars are complementary. TEE and ZKML provide trust in individual inferences. Consensus provides trust in the verification layer itself.

Trade-Offs

No architecture is without trade-offs. Here are the ones HACA makes explicitly:

TEE relies on hardware trust. TEE attestations are only as trustworthy as the CPU manufacturer's implementation. If a fundamental TEE vulnerability is discovered, the security model degrades. OpenGradient mitigates this by supporting multiple verification methods and allowing applications to require ZKML for critical operations.

ZKML is slow. 1000-10000x overhead means ZKML is impractical for large models or high-throughput workloads. This is a current limitation of ZK technology, not a design choice. As ZK proof systems improve, this overhead will decrease.

Asynchronous settlement means temporary trust gaps. Between when an inference result is returned and when the proof is settled on-chain, the result is not yet verified. Applications that need immediate on-chain verification (like atomic DeFi operations) should use PIPE, which settles proof within the same transaction at the cost of higher latency.

Specialization adds coordination complexity. Having multiple node types means the network must handle registration, routing, and coordination between them. This is more complex than a homogeneous validator set. The trade-off is worthwhile because it enables capabilities (low-latency AI inference, GPU-heavy computation) that are impossible with a uniform design.

Further Reading