- Setup a ClickHouse database
- Optional: Create a configuration file
- Run the gateway
See the TensorZero UI Deployment Guide for more details on how to deploy the TensorZero UI.
ClickHouse
The TensorZero Gateway stores inference and feedback data in a ClickHouse database. This data is later used for model observability, experimentation, and optimization.Development
For development purposes, you can run a single-node ClickHouse instance locally (e.g. using Homebrew or Docker) or a cheapDevelopment
-tier cluster on ClickHouse Cloud.
See the ClickHouse documentation for more details on configuring your ClickHouse deployment.
Production
Managed Services
For production deployments, the easiest setup is to use a managed service like ClickHouse Cloud. ClickHouse Cloud is also available through the AWS Marketplace, GCP Marketplace, and Azure Marketplace. Other options for managed ClickHouse deployments include Tinybird (serverless) and Altinity (hands-on support).Self-Managed Deployment
You can alternatively run your own self-managed ClickHouse instance or cluster.The TensorZero Gateway does not automatically enable data replication for ClickHouse tables.If you are using a self-managed distributed ClickHouse deployment, you must set up replication yourself.
See the ClickHouse replication documentation for more details.ClickHouse Cloud automatically sets up data replication, so this step is not necessary if you’re using the managed service.
Configuration
After setting up your database, you need to configure theTENSORZERO_CLICKHOUSE_URL
environment variable with the connection details.
The variable takes a standard format.
.env
[gateway]
in the Configuration Reference for the relevant configuration (e.g. customizing the port).
Disabling Observability (Not Recommended)
Disabling Observability (Not Recommended)
You can disable observability features if you’re not interested in storing any data for experimentation and optimization.
In this case, you won’t need to set up ClickHouse, and the TensorZero Gateway will act as a simple model gateway.To disable observability, set the following configuration in the If you only need to disable observability temporarily, you can pass a
tensorzero.toml
file:tensorzero.toml
dryrun: true
parameter to the inference and feedback API endpoints.Disabling Pseudonymous Usage Analytics
Disabling Pseudonymous Usage Analytics
TensorZero collects pseudonymous usage analytics to help our team improve the product.The collected data includes aggregated metrics about TensorZero itself, but does NOT include your application’s data.
To be explicit: TensorZero does NOT share any inference input or output.
TensorZero also does NOT share the name of any function, variant, metric, or similar application-specific identifiers.See Alternatively, you can also set the environment variable
howdy.rs
in the GitHub repository to see exactly what usage data is collected and shared with TensorZero.To disable usage analytics, set the following configuration in the tensorzero.toml
file:tensorzero.toml
TENSORZERO_DISABLE_PSEUDONYMOUS_USAGE_ANALYTICS=1
.TensorZero Gateway
The TensorZero Python client includes a built-in embedded gateway, so you don’t need to run a separate service for it.
The gateway is only needed if you want to use the OpenAI Python client or interact with TensorZero via its HTTP API (for other programming languages).
The TensorZero UI also requires the gateway service.
Development
Running with Docker (Recommended)
Running with Docker (Recommended)
You can easily run the TensorZero Gateway locally using Docker.You need to provide it with a path to a folder containing your
tensorzero.toml
file as well as its dependencies (e.g. schemas and templates), as well as the environment variables discussed above.Running with Docker
Building from source
Building from source
Alternatively, you can build the TensorZero Gateway from source and run it directly on your host machine using Cargo:
Building from source
Production
You can deploy the TensorZero Gateway alongside your application (e.g. as a sidecar container) or as a standalone service. A single gateway instance can handle over 1k QPS/core with sub-millisecond latency (see Benchmarks), so a simple deployment should suffice for the vast majority of applications. If you deploy it as an independent service, we recommend deploying at least two instances behind a load balancer for high availability. The gateway is stateless, so you can easily scale horizontally and don’t need to worry about persistence.Running with Docker (Recommended)
Running with Docker (Recommended)
The recommended way to run the TensorZero Gateway in production is to use Docker.There are many ways to run Docker containers in production.
A simple solution is to use Docker Compose.
We provide an example
docker-compose.yml
for reference.Running with Kubernetes (k8s) and Helm
Running with Kubernetes (k8s) and Helm
We provide a reference Helm chart contributed by the community in our GitHub repository.
You can use it to run TensorZero in Kubernetes.
Building from source
Building from source
Alternatively, you can build the TensorZero Gateway from source and run it directly on your host machine using Cargo.
For production deployments, we recommend enabling performance optimizations:
Building from source
Command Line Arguments
The TensorZero Gateway requires either--config-file
to specify a custom configuration file (e.g. --config-file /path/to/tensorzero.toml
) or --default-config
to use default settings (i.e. no custom functions, metrics, etc.). You can also use --log-format
to set the logging format to either pretty
(default) or json
.
Clients
See the Clients page for more details on how to interact with the TensorZero Gateway.Configuration
To run the TensorZero Gateway, first you need to create atensorzero.toml
configuration file. Read more about the configuration file here.
Model Provider Credentials
In addition to theTENSORZERO_CLICKHOUSE_URL
environment variable discussed above, the TensorZero Gateway accepts the following environment variables for provider credentials.
Unless you specify an alternative credential location in your configuration file, these environment variables are required for the providers that are used in a variant with positive weight.
If required credentials are missing, the gateway will fail on startup.
Provider | Environment Variable(s) |
---|---|
Anthropic | ANTHROPIC_API_KEY |
AWS Bedrock | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_REGION (optional) |
AWS SageMaker | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY , AWS_REGION (optional) |
Azure OpenAI | AZURE_OPENAI_API_KEY |
Fireworks | FIREWORKS_API_KEY |
GCP Vertex AI Anthropic | GCP_VERTEX_CREDENTIALS_PATH (see below for details) |
GCP Vertex AI Gemini | GCP_VERTEX_CREDENTIALS_PATH (see below for details) |
Google AI Studio Gemini | GOOGLE_AI_STUDIO_GEMINI_API_KEY |
Groq | GROQ_API_KEY |
Hyperbolic | HYPERBOLIC_API_KEY |
Mistral | MISTRAL_API_KEY |
OpenAI | OPENAI_API_KEY |
OpenRouter | OPENROUTER_API_KEY |
Together | TOGETHER_API_KEY |
xAI | XAI_API_KEY |
- AWS Bedrock supports many authentication methods, including environment variables, IAM roles, and more. See the AWS documentation for more details.
- If you’re using the GCP Vertex provider, you also need to mount the credentials for a service account in JWT form (described here) to
/app/gcp-credentials.json
using an additional-v
flag.
See
.env.example
for a complete example with every supported environment variable.