The TensorZero Gateway integrates with the major LLM providers.
Provider | Chat Functions | JSON Functions | Streaming | Tool Use | Multimodal | Embeddings | Batch |
---|---|---|---|---|---|---|---|
Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
AWS Bedrock | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
AWS SageMaker | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
Azure OpenAI Service | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
DeepSeek | ✅ | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ |
Fireworks AI | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
GCP Vertex AI Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
GCP Vertex AI Gemini | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
Google AI Studio Gemini | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Groq | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
Hyperbolic | ✅ | ⚠️ | ✅ | ❌ | ❌ | ❌ | ❌ |
Mistral | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
OpenAI and OpenAI-Compatible | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ | ✅ |
OpenRouter | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
SGLang | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
TGI | ✅ | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ |
Together AI | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
vLLM | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
xAI | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
tool_choice: required
; in these cases,
TensorZero Gateway will coerce the request to tool_choice: auto
under the hood.
Currently, Fireworks AI and OpenAI are the only providers that support parallel_tool_calls
.
Additionally, TensorZero Gateway only supports strict
(commonly referred to as Structured Outputs, Guided Decoding, or similar names) for Azure, GCP Vertex AI Gemini, Google AI Studio, OpenAI, Together AI, vLLM, and xAI.
Below are the known limitations for each supported model provider.
tool_choice: none
.seed
.ModelInference.raw_response
for AWS Bedrock inference requests.tool_choice: none
.seed
.tool_choice: required
.deepseek-chat
model doesn’t support tool use for production use cases.deepseek-reasoner
model doesn’t support JSON mode or tool use.thought
blocks in the response (coming soon!).seed
.tool_choice: required
for Gemini Flash models.json_mode = "off"
(not recommended).seed
.tool_choice
in many cases.json_mode = "implicit_tool"
(recommended) or json_mode = "off"
.tool_choice: none
(bug report).