Set up auth for TensorZero

You can create TensorZero API keys to authenticate your requests to the TensorZero Gateway. This way, your clients don’t need access to model provider credentials, making it easier to manage access and security. This page shows how to:

Create API keys for the TensorZero Gateway
Require clients to use these API keys for requests
Manage and disable API keys

TensorZero supports authentication for the gateway. Authentication for the UI is coming soon. In the meantime, we recommend pairing the UI with complementary products like Nginx, OAuth2 Proxy, or Tailscale.

Configure

You can find a complete runnable example of this guide on GitHub.

Configure your gateway to require authentication

You can instruct the TensorZero Gateway to require authentication in the configuration:

tensorzero.toml

[gateway]
auth.enabled = true

With this setting, every gateway endpoint except for /status and /health will require authentication.

Deploy TensorZero and Postgres

You must set up Postgres to use TensorZero’s authentication features.

Example: Docker Compose

You can deploy all the requirements using the Docker Compose file below:

docker-compose.yml

# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/deployment/tensorzero-gateway

services:
  clickhouse:
    image: clickhouse:lts
    environment:
      CLICKHOUSE_USER: chuser
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
      CLICKHOUSE_PASSWORD: chpassword
    ports:
      - "8123:8123" # HTTP port
      - "9000:9000" # Native port
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping
      start_period: 30s
      start_interval: 1s
      timeout: 1s

  postgres:
    image: postgres:14-alpine
    environment:
      POSTGRES_DB: tensorzero
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready -U postgres
      start_period: 30s
      start_interval: 1s
      timeout: 1s

  gateway:
    image: tensorzero/gateway
    volumes:
      - ./config:/app/config:ro
    command: --config-file /app/config/tensorzero.toml
    environment:
      TENSORZERO_CLICKHOUSE_URL: http://chuser:chpassword@clickhouse:8123/tensorzero
      TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
      OPENAI_API_KEY: ${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
    ports:
      - "3000:3000"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    healthcheck:
      test: wget --spider --tries 1 http://localhost:3000/status
      start_period: 30s
      start_interval: 1s
      timeout: 1s
    depends_on:
      clickhouse:
        condition: service_healthy
      postgres:
        condition: service_healthy

  ui:
    image: tensorzero/ui
    environment:
      TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
      TENSORZERO_GATEWAY_URL: http://gateway:3000
    ports:
      - "4000:4000"
    depends_on:
      clickhouse:
        condition: service_healthy
      gateway:
        condition: service_healthy

volumes:
  postgres-data:
  clickhouse-data:

Create a TensorZero API key

You can create API keys using the TensorZero UI. If you’re running a standard local deployment, visit http://localhost:4000/api-keys to create a key.Alternatively, you can create API keys programmatically in the CLI using the gateway binary with the --create-api-key flag. For example:

docker compose run --rm gateway --create-api-key

The API key is a secret and should be kept secure.

Once you’ve created an API key, set the TENSORZERO_API_KEY environment variable.

Make an authenticated inference request

Python (TensorZero SDK)
Python (OpenAI SDK)
Node (OpenAI SDK)
HTTP

You can make authenticated requests by setting the api_key parameter in your TensorZero client:

tensorzero_sdk.py

import os

from tensorzero import TensorZeroGateway

t0 = TensorZeroGateway.build_http(
    api_key=os.environ["TENSORZERO_API_KEY"],
    gateway_url="http://localhost:3000",
)

response = t0.inference(
    model_name="openai::gpt-5-mini",
    input={
        "messages": [
            {
                "role": "user",
                "content": "Tell me a fun fact.",
            }
        ]
    },
)

print(response)

The client will automatically read the TENSORZERO_API_KEY environment variable if you don’t set api_key.

Authentication is not supported in the embedded (in-memory) gateway in Python. Please use the HTTP client with a standalone gateway to make authenticated requests.

You can make authenticated requests by setting the api_key parameter in your OpenAI client:

openai_sdk.py

import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TENSORZERO_API_KEY"],
    base_url="http://localhost:3000/openai/v1",
)

response = client.chat.completions.create(
    model="tensorzero::model_name::openai::gpt-5-mini",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact.",
        }
    ],
)

print(response)

Authentication is not supported in the embedded (in-memory) gateway in Python. Please use the HTTP client with a standalone gateway to make authenticated requests.

You can make authenticated requests by setting the apiKey parameter in your OpenAI client:

openai_sdk.ts

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.TENSORZERO_API_KEY,
  baseURL: "http://localhost:3000/openai/v1",
});

const response = await client.chat.completions.create({
  model: "tensorzero::model_name::openai::gpt-5-mini",
  messages: [
    {
      role: "user",
      content: "Tell me a fun fact.",
    },
  ],
});

You can make authenticated requests by setting the Authorization HTTP header to Bearer <API_KEY>:

curl.sh

curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${TENSORZERO_API_KEY}" \
  -d '{
    "model": "tensorzero::model_name::openai::gpt-5-mini",
    "messages": [
      {
        "role": "user",
        "content": "Tell me a fun fact."
      }
    ]
  }'

Manage API keys

You can manage and disable API keys in the TensorZero UI. If you’re running a standard local deployment, visit http://localhost:4000/api-keys to manage your keys.Alternatively, you can disable API keys programmatically in the CLI using the gateway binary with the --disable-api-key flag. Pass the public ID of the key you want to disable (the 12-character portion after sk-t0-). For example:

docker compose run --rm gateway --disable-api-key xxxxxxxxxxxx

Advanced

Customize the gateway’s authentication cache

By default, the TensorZero Gateway caches authentication database queries for one second. You can customize this behavior in the configuration:

[gateway.auth.cache]
enabled = true # boolean
ttl_ms = 60_000 # one minute

Set up rate limiting by API key

Once you have authentication enabled, you can apply rate limits on a per-API-key basis using the api_key_public_id scope in your rate limiting rules. This allows you to enforce different usage limits for different API keys, which is useful for implementing tiered access or preventing individual keys from consuming too many resources.

TensorZero API keys have the following format:sk-t0-xxxxxxxxxxxx-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyThe xxxxxxxxxxxx portion is the 12-character public ID that you can use in rate limiting rules. The remaining portion of the key is secret and should be kept secure.

For example, you can limit each API key to 100 model inferences per hour, but allow a specific API key to make 1000 inferences:

# Each API key can make up to 100 model inferences per hour
[[rate_limiting.rules]]
priority = 0
model_inferences_per_hour = 100
scope = [
    { api_key_public_id = "tensorzero::each" }
]

# But override the limit for a specific API key
[[rate_limiting.rules]]
priority = 1
model_inferences_per_hour = 1000
scope = [
    { api_key_public_id = "xxxxxxxxxxxx" }
]

See Enforce custom rate limits for more details on configuring rate limits with API keys.

Centralize auth across multiple TensorZero deployments

If you have multiple TensorZero deployments (e.g. one per team), you can centralize auth using gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls without restricting how teams build their LLM features. See Centralize auth, rate limits, and more for details.

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

Configure

Advanced

Customize the gateway’s authentication cache

Set up rate limiting by API key

Centralize auth across multiple TensorZero deployments

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

​Configure

​Advanced

​Customize the gateway’s authentication cache

​Set up rate limiting by API key

​Centralize auth across multiple TensorZero deployments

Configure

Advanced

Customize the gateway’s authentication cache

Set up rate limiting by API key

Centralize auth across multiple TensorZero deployments