Inference Execution
This section will describe our novel on-chain inference execution method that allows models to be natively used from smart contracts. We call this method PIPE (Parallelized Inference Pre-Execution Engine), and it is one of our key technologies that allows OpenGradient to provide seamless and highly scalable inference to all developers.
PIPE
Using PIPE, smart contracts and applications on our network can natively use and execute AI models without introducing any overhead or congestion inside the EVM. Inferences are executed in parallel, which allows us to compete with centralized infrastructure providers while also providing full decentralization and, importantly, verifiable execution, ensuring the reliability of the system.
Concretely, the steps involved in executing a transaction on OpenGradient are as follows:
- User submits EVM transaction
- Transaction is first placed in the Inference Mempool
- The inference mempool simulates all transactions and extracts inference requests triggered by the smart contract transaction.
- Inference requests are sent to the inference network for parallel execution.
- Once all inferences are completed and results are available, the transaction is retrieved from the mempool for further execution.
- EVM immediately executes the transaction with pre-computed inference results.
- Transaction is included in the next block.
PIPE, powered by VCS architecture, is a game-changer in scalability. With PIPE, we can run inferences for hundreds or thousands of pending transactions in parallel, dramatically increasing network throughput and reducing latency for all users. These technologies enable us to horizontally scale our inference network and execution, offering a virtually limitless potential for AI model execution.
In addition, for smart contracts that directly embed AI and ML models, we can offer low-latency and non-blocking execution of transactions at scale. Since all inferences are executed in the inference mempool, actual block building remains extremely fast. No single transaction can slow down the entire blockchain due to, for example, an expensive and slow LLM completion.