- Call any LLM with the same API. TensorZero unifies every major LLM API (e.g. OpenAI) and inference server (e.g. Ollama).
- Get started with a few lines of code. Later, you can optionally add observability, automatic fallbacks, A/B testing, and much more.
- Use any programming language. You can use TensorZero with its Python SDK, any OpenAI SDK (Python, Node, Go, etc.), or its HTTP API.
You can find a complete runnable example of this guide on GitHub.
- Python
- Python (OpenAI SDK)
- Node (OpenAI SDK)
- HTTP
The TensorZero Python SDK provides a unified API for calling any LLM.
1
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.Copy
export OPENAI_API_KEY="sk-..."
See the Integrations page to learn how to set up credentials for other LLM providers.
2
Install the TensorZero Python SDK
You can install the TensorZero SDK with a Python package manager like
pip.Copy
pip install tensorzero
3
Initialize the TensorZero Gateway
Let’s initialize the TensorZero Gateway.
For simplicity, we’ll use an embedded gateway without observability or custom configuration.
Copy
from tensorzero import TensorZeroGateway
t0 = TensorZeroGateway.build_embedded()
The TensorZero Python SDK includes a synchronous
TensorZeroGateway client and an asynchronous AsyncTensorZeroGateway client.
Both options support running the gateway embedded in your application with build_embedded or connecting to a standalone gateway with build_http.
See Clients for more details.4
Call the LLM
Copy
response = t0.inference(
model_name="openai::gpt-5-mini",
# or: model="anthropic::claude-sonnet-4-20250514"
# or: Google, AWS, Azure, xAI, vLLM, Ollama, and many more
input={
"messages": [
{
"role": "user",
"content": "Tell me a fun fact.",
}
]
},
)
Sample Response
Sample Response
Copy
ChatInferenceResponse(
inference_id=UUID('0198d339-be77-74e0-b522-e08ec12d3831'),
episode_id=UUID('0198d339-be77-74e0-b522-e09f578f34d0'),
variant_name='openai::gpt-5-mini',
content=[
Text(
text='Fun fact: Botanically, bananas are berries but strawberries are not. \n\nA true berry develops from a single ovary and has seeds embedded in the flesh—bananas fit that definition. Strawberries are "aggregate accessory fruits": the tiny seeds on the outside are each from a separate ovary.',
arguments=None,
type='text'
)
],
usage=Usage(input_tokens=12, output_tokens=261),
finish_reason=FinishReason.STOP,
original_response=None
)
See the Inference API Reference for more details on the request and response formats.
The TensorZero Python SDK integrates with the OpenAI Python SDK to provide a unified API for calling any LLM.
1
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.Copy
export OPENAI_API_KEY="sk-..."
See the Integrations page to learn how to set up credentials for other LLM providers.
2
Install the OpenAI and TensorZero Python SDKs
You can install the OpenAI and TensorZero SDKs with a Python package manager like
pip.Copy
pip install openai tensorzero
3
Initialize the OpenAI client
Let’s initialize the TensorZero Gateway and patch the OpenAI client to use it.
For simplicity, we’ll use an embedded gateway without observability or custom configuration.
Copy
from openai import OpenAI
from tensorzero import patch_openai_client
client = OpenAI()
patch_openai_client(client, async_setup=False)
The TensorZero Python SDK supports both the synchronous
OpenAI client and the asynchronous AsyncOpenAI client.
Both options support running the gateway embedded in your application with patch_openai_client or connecting to a standalone gateway with base_url.
The embedded gateway supports synchronous initialization with async_setup=False or asynchronous initialization with async_setup=True.
See Clients for more details.4
Call the LLM
Copy
response = client.chat.completions.create(
model="tensorzero::model_name::openai::gpt-5-mini",
# or: model="tensorzero::model_name::anthropic::claude-sonnet-4-20250514"
# or: Google, AWS, Azure, xAI, vLLM, Ollama, and many more
messages=[
{
"role": "user",
"content": "Tell me a fun fact.",
}
],
)
Sample Response
Sample Response
Copy
ChatCompletion(
id='0198d33f-24f6-7cc3-9dd0-62ba627b27db',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='Sure! Did you know that octopuses have three hearts? Two pump blood to the gills, while the third pumps it to the rest of the body. And, when an octopus swims, the heart that delivers blood to the body actually **stops beating**—which is why they prefer to crawl rather than swim!',
refusal=None,
role='assistant',
annotations=None,
audio=None,
function_call=None,
tool_calls=[]
)
)
],
created=1755890789,
model='tensorzero::model_name::openai::gpt-5-mini',
object='chat.completion',
service_tier=None,
system_fingerprint='',
usage=CompletionUsage(
completion_tokens=67,
prompt_tokens=13,
total_tokens=80,
completion_tokens_details=None,
prompt_tokens_details=None
),
episode_id='0198d33f-24f6-7cc3-9dd0-62cd7028c3d7'
)
See the Inference (OpenAI) API Reference for more details on the request and response formats.
You can point the OpenAI Node SDK to a TensorZero Gateway to call any LLM with a unified API.
1
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.Copy
export OPENAI_API_KEY="sk-..."
See the Integrations page to learn how to set up credentials for other LLM providers.
2
Install the OpenAI Node SDK
You can install the OpenAI SDK with a package manager like
npm.Copy
npm i openai
3
Deploy a standalone (HTTP) TensorZero Gateway
Let’s deploy a standalone TensorZero Gateway using Docker.
For simplicity, we’ll use the gateway without observability or custom configuration.
Copy
docker run \
-e OPENAI_API_KEY \
-p 3000:3000 \
tensorzero/gateway \
--default-config
See the TensorZero Gateway Deployment page for more details.
4
Initialize the OpenAI client
Let’s initialize the OpenAI SDK and point it to the gateway we just launched.
Copy
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
});
5
Call the LLM
Copy
const response = await client.chat.completions.create({
model: "tensorzero::model_name::openai::gpt-5-mini",
// or: model: "tensorzero::model_name::anthropic::claude-sonnet-4-20250514",
// or: Google, AWS, Azure, xAI, vLLM, Ollama, and many more
messages: [
{
role: "user",
content: "Tell me a fun fact.",
},
],
});
Sample Response
Sample Response
Copy
{
id: '0198d345-4bd5-79a2-a235-ebaea8c16d91',
episode_id: '0198d345-4bd5-79a2-a235-ebbf6eb49cb8',
choices: [
{
index: 0,
finish_reason: 'stop',
message: {
content: 'Sure! Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old—and still perfectly edible!',
tool_calls: [],
role: 'assistant'
}
}
],
created: 1755891192,
model: 'tensorzero::model_name::openai::gpt-5-mini',
system_fingerprint: '',
service_tier: null,
object: 'chat.completion',
usage: { prompt_tokens: 13, completion_tokens: 37, total_tokens: 50 }
}
See the Inference (OpenAI) API Reference for more details on the request and response formats.
You can call the TensorZero Gateway directly over HTTP to access any LLM with a unified API.
1
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.Copy
export OPENAI_API_KEY="sk-..."
See the Integrations page to learn how to set up credentials for other LLM providers.
2
Deploy a standalone (HTTP) TensorZero Gateway
Let’s deploy a standalone TensorZero Gateway using Docker.
For simplicity, we’ll use the gateway without observability or custom configuration.
Copy
docker run \
-e OPENAI_API_KEY \
-p 3000:3000 \
tensorzero/gateway \
--default-config
See the TensorZero Gateway Deployment page for more details.
3
Call the LLM
You can call the LLM by sending a
POST request to the /inference endpoint of the TensorZero Gateway.Copy
curl -X POST "http://localhost:3000/inference" \
-H "Content-Type: application/json" \
-d '{
"model_name": "openai::gpt-5-mini",
"input": {
"messages": [
{
"role": "user",
"content": "Tell me a fun fact."
}
]
}
}'
Sample Response
Sample Response
Copy
{
"inference_id": "0198d351-b250-70d1-a24a-a255d148d7a6",
"episode_id": "0198d351-b250-70d1-a24a-a2690343bcf0",
"variant_name": "openai::gpt-5-mini",
"content": [
{
"type": "text",
"text": "Fun fact: botanically, bananas are berries but strawberries are not. \n\nIn botanical terms a \"berry\" develops from a single ovary and has seeds embedded in the flesh—bananas fit that definition, while strawberries are aggregate accessory fruits (the little \"seeds\" on the outside are actually separate ovaries). Want another fun fact?"
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 334
},
"finish_reason": "stop"
}
See the Inference API Reference for more details on the request and response formats.