-
One API for All LLMs.
The gateway provides a unified interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
TensorZero natively supports
Anthropic,
AWS Bedrock,
AWS SageMaker,
Azure OpenAI Service,
Fireworks,
GCP Vertex AI Anthropic,
GCP Vertex AI Gemini,
Google AI Studio (Gemini API),
Groq,
Hyperbolic,
Mistral,
OpenAI,
OpenRouter,
Together,
vLLM, and
xAI.
Need something else?
Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama).
Still not supported?
Open an issue on GitHub and we’ll integrate it!
Learn more in our How to call any LLM guide.
- Blazing Fast. The gateway (written in Rust 🦀) achieves <1ms P99 latency overhead under extreme load. In benchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than our gateway @ 10,000 QPS.
- Structured Inferences. The gateway enforces schemas for inputs and outputs, ensuring robustness for your application. Structured inference data is later used for powerful optimization recipes (e.g. swapping historical prompts before fine-tuning). Learn more about prompt templates & schemas.
- Multi-Step LLM Workflows. The gateway provides first-class support for complex multi-step LLM workflows by associating multiple inferences with an episode. Feedback can be assigned at the inference or episode level, allowing for end-to-end optimization of compound LLM systems. Learn more about episodes.
- Built-in Observability. The gateway collects structured inference traces along with associated downstream metrics and natural-language feedback. Everything is stored in a ClickHouse database for real-time, scalable, and developer-friendly analytics. TensorZero Recipes leverage this dataset to optimize your LLMs.
- Built-in Experimentation. The gateway automatically routes traffic between variants to enable A/B tests. It ensures consistent variants within an episode in multi-step workflows. More advanced experimentation techniques (e.g. asynchronous multi-armed bandits) are coming soon.
- Built-in Fallbacks. The gateway automatically fallbacks failed inferences to different inference providers, or even completely different variants. Ensure misconfiguration, provider downtime, and other edge cases don’t affect your availability.
- GitOps Orchestration. Orchestrate prompts, models, parameters, tools, experiments, and more with GitOps-friendly configuration. Manage a few LLMs manually with human-friendly readable configuration files, or thousands of prompts and LLMs entirely programmatically.
Next Steps
Quickstart
Make your first TensorZero API call with built-in observability and
fine-tuning in under 5 minutes.
Tutorial
Build a simple chatbot, an email copilot, a RAG system, and a data
extraction pipeline using TensorZero.
Deployment
Quickly deploy locally, or set up high-availability services for production
environments.
Integrations
The TensorZero Gateway integrates with the major LLM providers.
Benchmarks
The TensorZero Gateway achieves sub-millisecond latency overhead under
extreme load.
API Reference
The TensorZero Gateway provides an unified interface for making inference
and feedback API calls.
Configuration Reference
Easily manage your LLM applications with GitOps orchestration — even complex
multi-step systems.