Overview

TensorZero is an open-source stack for industrial-grade LLM applications:

Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
Observability: store inferences and feedback in your database, available programmatically or in the UI
Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
Evaluations: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Take what you need, adopt incrementally, and complement with other tools.

How It Works

The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability, experimentation, and inference-time optimizations.
It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
Finally, the gateway’s experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: AI systems that learn from real-world experience. Read more about our Vision & Roadmap.