Skip to main content

TensorZero Autopilot is an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Join the waitlist →

TensorZero Docs home page

tensorzero/tensorzero
tensorzero/tensorzero

Documentation
Blog
Slack
Discord

Introduction

Overview
Quickstart
Vision & Roadmap
Frequently Asked Questions

Gateway

Overview
Call any LLM
Call the OpenAI Responses API
Configure models & providers
Configure functions & variants
Create a prompt template
Generate structured outputs
Generate embeddings
Batch Inference
Episodes
Inference Caching
Inference-Time Optimizations
Metrics & Feedback
Multimodal Inference
Retries & Fallbacks
Streaming Inference
Tool Use (Function Calling)
Benchmarks
Clients
Configuration Reference
Data Model

Observability

Query historical inferences

Optimization

Overview

Evaluations

Overview

Experimentation

Run adaptive A/B tests
Run static A/B tests

Deployment

Deploy the TensorZero Gateway
Deploy the TensorZero UI
Deploy ClickHouse (optional)
Deploy Postgres (optional)
Deploy Valkey / Redis (optional)
Set up TensorZero Autopilot
Optimize latency & throughput

Operations

Manage credentials (API keys)
Set up auth for TensorZero
Enforce custom rate limits
Centralize auth, rate limits, etc.
Organize your configuration
Export OpenTelemetry traces
Export Prometheus metrics
Extend TensorZero

404

Page Not Found

We couldn't find the page. Maybe you were looking for one of these pages below?

Inference-Time Optimizations Overview Inference Caching

⌘I