Skip to content

Overview

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

It’s fully open source.

  1. Integrate our model gateway
  2. Send metrics or feedback
  3. Optimize prompts, models, and inference strategies
  4. Watch your LLMs improve over time

It provides a data & learning flywheel for LLMs by unifying:

  • Inference: one API for all LLMs, with <1ms P99 overhead
  • Observability: inference & feedback → your database
  • Optimization: from prompts to fine-tuning and RL (& even 🍓? )
  • Experimentation: built-in A/B testing, routing, fallbacks

How It Works

TensorZero Flywheel
  1. The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
  2. It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability, experimentation, and inference-time optimizations.
  3. It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
  4. Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
  5. Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
  6. Finally, the gateway’s experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: AI systems that learn from real-world experience. Read more about our Vision & Roadmap.