Skip to content

LLM Inference in Python SDK

This guide describes using OpenGradient's native LLM inference capabilities from Python SDK. To see a list of LLMs supported, go to Supported Models.

OpenGradient network supports various security techniques for inference verification and security. You can choose the most suitable methods for your use cases and requirements. To learn more about our security options, visit Inference Verification.

The LLM inference is exposed through the infer_llm method:

python
def infer_llm(model_cid, prompt, max_tokens=100, temperature=0.0, stop_sequence=None)

Arguments

  • model_cid: the CID of the LLM model you want to use.
  • prompt: the input text prompt for the LLM.
  • max_tokens: (optional) the maximum number of tokens to generate. Default is 100.
  • temperature: (optional) controls randomness in generation. Higher values make output more random. Default is 0.0.
  • stop_sequence: (optional) a string that, if encountered, will stop the generation process.

Returns

  • A tuple containing:
    1. The transaction hash of the inference request.
    2. A string containing the generated text response from the LLM.

Note: The actual output length may be limited by the model's capabilities or other constraints.

Example

python
import opengradient as og

# initialize SDK
og.init(private_key="<private_key>", email="<email>", password="<password>")

# run LLM inference
tx_hash, response = og.infer_llm(
    model_cid='meta-llama/Meta-Llama-3-8B-Instruct',
    prompt="Translate the following English text to French: 'Hello, how are you?'",
    max_tokens=50,
    temperature=0.0
)

# print output
print("Transaction Hash:", tx_hash)
print("LLM Output:", response)

CLI LLM Inference

We also have explicit support for using LLMs through the llm command in the CLI.

For example, you can run Llama-3 using the following command:

bash
opengradient llm --model "meta-llama/Meta-Llama-3-8B-Instruct" --prompt "hello who are you?" --max-tokens 50

The list of models we support can be found in the Model Hub.

To get more information on how to run LLM's using the CLI, you can run:

bash
opengradient llm --help

OpenGradient 2024