gateway
, clickhouse
, models
, model_providers
, functions
, variants
, tools
, metrics
, rate_limiting
, and object_storage
.
[gateway]
The [gateway]
section defines the behavior of the TensorZero Gateway.
base_path
- Type: string
- Required: no (default:
/
)
base_path
is set to /custom/prefix
, the inference endpoint will become /custom/prefix/inference
instead of /inference
.
bind_address
- Type: string
- Required: no (default:
[::]:3000
)
[::]:3000
.
Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
debug
- Type: boolean
- Required: no (default:
false
)
true
, the gateway will log more verbose errors to assist with debugging.
disable_pseudonymous_usage_analytics
- Type: boolean
- Required: no (default:
false
)
true
, TensorZero will not collect or share pseudonymous usage analytics.
export.otlp.traces.enabled
- Type: boolean
- Required: no (default:
false
)
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
environment variable. See the above-linked guide for details.export.otlp.traces.extra_headers
- Type: object (map of string to string)
- Required: no (default:
{}
)
export.otlp.traces.format
- Type: either “opentelemetry” or “openinference”
- Required: no (default:
"opentelemetry"
)
"opentelemetry"
, TensorZero will set gen_ai
attributes based on the OpenTelemetry GenAI semantic conventions.
If set to "openinference"
, TensorZero will set attributes based on the OpenInference semantic conventions.
fetch_and_encode_input_files_before_inference
- Type: boolean
- Required: no (default:
true
)
true
(default), the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt.
This is recommended to ensure that TensorZero and the model providers see identical inputs, which is important for observability and reproducibility.
If set to false
, TensorZero will forward the input file URLs directly to the model provider (when supported) and fetch them for observability in parallel with inference.
This can be more efficient, but may result in different content being observed if the URL content changes between when the provider fetches it and when TensorZero fetches it for observability.
observability.async_writes
- Type: boolean
- Required: no (default:
false
)
async_writes
and batch_writes
at the same time.
observability.batch_writes
- Type: object
- Required: no (default: disabled)
batch_writes
, multiple records are collected and written together in batches to improve efficiency.
The batch_writes
object supports the following fields:
enabled
(boolean): Must be set totrue
to enable batch writesflush_interval_ms
(integer, optional): Maximum time in milliseconds to wait before flushing a batch (default:100
)max_rows
(integer, optional): Maximum number of rows to collect before flushing a batch (default:1000
)
async_writes
and batch_writes
at the same time.
observability.enabled
- Type: boolean
- Required: no (default:
null
)
true
, the gateway will throw an error on startup if it fails to validate the ClickHouse connection.
If null
, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available.
If false
, the gateway will not use ClickHouse.
observability.disable_automatic_migrations
- Type: boolean
- Required: no (default
false
)
true
, then the migrations are not applied upon launch and must instead be applied manually
by running docker run --rm -e TENSORZERO_CLICKHOUSE_URL=$TENSORZERO_CLICKHOUSE_URL tensorzero/gateway:{version} --run-clickhouse-migrations
or docker compose run --rm gateway --run-clickhouse-migrations
.
If false
, then the migrations are run automatically upon launch.
template_filesystem_access
- Type: object
- Required: no (default disabled)
include
directive.
The object has two fields:
enabled
(boolean): Determines whether to enable file system access for templates.base_path
(string, optional): Determines the base path for template file system access.
base_path
will be the directory containing the configuration file.
If you split your configuration into multiple files, you must specify gateway.template_filesystem_access.base_path
(see Organize your configuration for details).
The include
paths must be relative to base_path
, and can only access files in that directory or its sub-directories.
[models.model_name]
The [models.model_name]
section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [models."llama-3.1-8b-instruct"]
.
routing
- Type: array of strings
- Required: yes
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests to this model.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [models.model_name.providers."vllm.internal"]
.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a variant entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a variant entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for individual requests to a model provider.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
timeout
field (or simply killing the request if you’re using a different client).
type
- Type: string
- Required: yes
anthropic
, aws_bedrock
, aws_sagemaker
, azure
, deepseek
, fireworks
, gcp_vertex_anthropic
, gcp_vertex_gemini
, google_ai_studio_gemini
, groq
, hyperbolic
, mistral
, openai
, openrouter
, sglang
, tgi
, together
, vllm
, and xai
.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
type: "anthropic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::ANTHROPIC_API_KEY
unless set otherwise inprovider_type.anthropic.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "aws_bedrock"
type: "aws_bedrock"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false
)
region
field (recommended).model_id
- Type: string
- Required: yes
model_id
requires special prefix (e.g. the us.
prefix in us.anthropic.claude-3-7-sonnet-20250219-v1:0
).
See the AWS documentation on inference profiles.region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1
)
type: "aws_sagemaker"
type: "aws_sagemaker"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false
)
region
field (recommended).endpoint_name
- Type: string
- Required: yes
hosted_provider
- Type: string
- Required: yes
aws_sagemaker
provider is a wrapper on other providers.Currently, the only supported hosted_provider
options are:openai
(including any OpenAI-compatible server e.g. Ollama)tgi
model_name
- Type: string
- Required: yes
region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1
)
type: "azure"
type: "azure"
2025-04-01-preview
).
You only need to set the deployment_id
and endpoint
fields.deployment_id
- Type: string
- Required: yes
endpoint
- Type: string
- Required: yes
env::
, the succeeding value will be treated as an environment variable name and the gateway will attempt to retrieve the value from the environment on startup.
If the endpoint starts with dynamic::
, the succeeding value will be treated as an dynamic credential name and the gateway will attempt to retrieve the value from the dynamic_credentials
field on each inference it is needed.api_key_location
- Type: string
- Required: no (default:
env::AZURE_OPENAI_API_KEY
unless set otherwise inprovider_type.azure.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "deepseek"
type: "deepseek"
model_name
- Type: string
- Required: yes
deepseek-chat
(DeepSeek-v3) and deepseek-reasoner
(R1).api_key_location
- Type: string
- Required: no (default:
env::DEEPSEEK_API_KEY
unless set otherwise inprovider_type.deepseek.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "fireworks"
type: "fireworks"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::FIREWORKS_API_KEY
unless set otherwise inprovider_type.fireworks.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "gcp_vertex_anthropic"
type: "gcp_vertex_anthropic"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_id
ormodel_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_id
orendpoint_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
unless otherwise set inprovider_type.gcp_vertex_anthropic.defaults.credential_location
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "gcp_vertex_gemini"
type: "gcp_vertex_gemini"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_id
ormodel_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_id
orendpoint_id
must be set)
project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
unless otherwise set inprovider_type.gcp_vertex_gemini.defaults.credential_location
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "google_ai_studio_gemini"
type: "google_ai_studio_gemini"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEY
unless otherwise set inprovider_type.google_ai_studio.defaults.credential_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "groq"
type: "groq"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GROQ_API_KEY
unless otherwise set inprovider_type.groq.defaults.credential_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "hyperbolic"
type: "hyperbolic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::HYPERBOLIC_API_KEY
unless otherwise set inprovider_type.hyperbolic.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "mistral"
type: "mistral"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::MISTRAL_API_KEY
unless otherwise set inprovider_type.mistral.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY
unless otherwise set inprovider_types.openai.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference for more details).api_type
- Type: string
- Required: no (default:
chat_completions
)
chat_completions
for the standard Chat Completions API.
Set to responses
to use the Responses API, which provides access to built-in tools like web search and reasoning capabilities.include_encrypted_reasoning
- Type: boolean
- Required: no (default:
false
)
api_type = "responses"
.model_name
- Type: string
- Required: yes
provider_tools
- Type: array of objects
- Required: no (default:
[]
)
api_type = "responses"
.type: "openrouter"
type: "openrouter"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENROUTER_API_KEY
unless otherwise set inprovider_types.openrouter.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "sglang"
type: "sglang"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "together"
type: "together"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::TOGETHER_API_KEY
unless otherwise set inprovider_types.together.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "vllm"
type: "vllm"
api_base
- Type: string
- Required: yes (default:
http://localhost:8000/v1/
)
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::VLLM_API_KEY
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "xai"
type: "xai"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::XAI_API_KEY
unless otherwise set inprovider_types.xai.defaults.api_key_location
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "tgi"
type: "tgi"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[embedding_models.model_name]
The [embedding_models.model_name]
section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1
as [embedding_models."embedding-0.1"]
.
routing
- Type: array of strings
- Required: yes
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeout_ms
- Type: integer
- Required: no
[embedding_models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [embedding_models.model_name.providers."vllm.internal"]
.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to the embedding model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.timeout_ms
- Type: integer
- Required: no
type
- Type: string
- Required: yes
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[provider_types]
The provider_types
section of the configuration allows users to specify global settings that are related to the handling of a particular inference provider type (like "openai"
or "anthropic"
), such as where to look by default for credentials.
[provider_types.anthropic]
[provider_types.anthropic]
defaults.api_key_location
- Type: string
- Required: no (default:
env::ANTHROPIC_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.azure]
[provider_types.azure]
defaults.api_key_location
- Type: string
- Required: no (default:
env::AZURE_OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.deepseek]
[provider_types.deepseek]
defaults.api_key_location
- Type: string
- Required: no (default:
env::DEEPSEEK_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.fireworks]
[provider_types.fireworks]
defaults.api_key_location
- Type: string
- Required: no (default:
env::FIREWORKS_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.gcp_vertex_anthropic]
[provider_types.gcp_vertex_anthropic]
defaults.credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.[provider_types.gcp_vertex_gemini]
[provider_types.gcp_vertex_gemini]
batch
- Type: object
- Required: no (default:
null
)
batch
object allows you to configure batch processing for GCP Vertex models.
Today we support batch inference through GCP Vertex using Google cloud storage as documented here.
To do this you must also have object_storage (see the object_storage section) configured using GCP.batch
object supports the following configuration:storage_type
- Type: string
- Required: no (default
"none"
)
"cloud_storage"
and "none"
are supported.input_uri_prefix
- Type: string
- Required: yes when
storage_type
is"cloud_storage"
output_uri_prefix
- Type: string
- Required: yes when
storage_type
is"cloud_storage"
defaults.credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.[provider_types.google_ai_studio]
[provider_types.google_ai_studio]
defaults.api_key_location
- Type: string
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.groq]
[provider_types.groq]
defaults.api_key_location
- Type: string
- Required: no (default:
env::GROQ_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.hyperbolic]
[provider_types.hyperbolic]
defaults.api_key_location
- Type: string
- Required: no (default:
env::HYPERBOLIC_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.mistral]
[provider_types.mistral]
defaults.api_key_location
- Type: string
- Required: no (default:
env::MISTRAL_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.openai]
[provider_types.openai]
defaults.api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.openrouter]
[provider_types.openrouter]
defaults.api_key_location
- Type: string
- Required: no (default:
env::OPENROUTER_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.together]
[provider_types.together]
defaults.api_key_location
- Type: string
- Required: no (default:
env::TOGETHER_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[provider_types.xai]
[provider_types.xai]
defaults.api_key_location
- Type: string
- Required: no (default:
env::XAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).[functions.function_name]
The [functions.function_name]
section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name]
sections.
A function can have multiple variants, and each variant is defined in the variants
sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0
as [functions."summarize-2.0"]
.
assistant_schema
- Type: string (path)
- Required: no
description
- Type: string
- Required: no
system_schema
- Type: string (path)
- Required: no
type
- Type: string
- Required: yes
chat
and json
.
Most other fields in the function section depend on the function type.
type: "chat"
type: "chat"
parallel_tool_calls
- Type: boolean
- Required: no
tool_choice
- Type: string
- Required: no (default:
auto
)
none
: The function should not use any tools.auto
: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required
: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }
: The model should use a specific tool. The tool must be defined in thetools
field (see below).
tools
- Type: array of strings
- Required: no (default:
[]
)
[tools.tool_name]
sections (see below).type: "json"
type: "json"
output_schema
- Type: string (path)
- Required: no (default:
{}
, the empty JSON schema that accepts any valid JSON output)
user_schema
- Type: string (path)
- Required: no
[functions.function_name.variants.variant_name]
The variants
sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name]
sections.
If your variant_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [functions.function_name.variants."llama-3.1-8b-instruct"]
.
type
- Type: string
- Required: yes
Type | Description |
---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_chain_of_thought | Encourages the model to reason step by step using a chain-of-thought prompting strategy, which is particularly useful for tasks requiring logical reasoning or multi-step problem-solving. Only available for non-streaming requests to JSON functions. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
experimental_mixture_of_n | Generates multiple response candidates with other variants, and combines the responses using a fuser model. |
type: "chat_completion"
type: "chat_completion"
assistant_template
- Type: string (path)
- Required: no
assistant_schema
field.extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…frequency_penalty
- Type: float
- Required: no (default:
null
)
json_mode
- Type: string
- Required: yes for
json
functions, forbidden forchat
functions
off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
max_tokens
- Type: integer
- Required: no (default:
null
)
model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls thegpt-4o
model in your configuration, which supports fallback fromopenai
toazure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for thegpt-4o
model, ignoring thegpt-4o
model defined above.
presence_penalty
- Type: float
- Required: no (default:
null
)
retries
- Type: object with optional keys
num_retries
andmax_delay_s
- Required: no (defaults to
num_retries = 0
and amax_delay_s = 10
)
num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null
)
system_template
- Type: string (path)
- Required: no
system_schema
field.temperature
- Type: float
- Required: no (default:
null
)
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):top_p
- Type: float, between 0 and 1
- Required: no (default:
null
)
top_p
to use for the variant during nucleus sampling.
Typically at most one of top_p
and temperature
is set.user_template
- Type: string (path)
- Required: no
user_schema
field.weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_best_of_n"
type: "experimental_best_of_n"
candidates
- Type: list of strings
- Required: yes
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The evaluator would then choose the best response from these three candidates.evaluator
- Type: object
- Required: yes
evaluator
parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.The evaluator is configured similarly to a chat_completion
variant for a JSON function, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.The evaluator can optionally include a json_mode
parameter (see the json_mode
documentation under chat_completion
variants). If not specified, it defaults to strict
.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
type: "experimental_chain_of_thought"
experimental_chain_of_thought
variant type uses the same configuration as a chat_completion
variant.type: "experimental_mixture_of_n"
type: "experimental_mixture_of_n"
candidates
- Type: list of strings
- Required: yes
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The fuser would then combine the three responses.fuser
- Type: object
- Required: yes for
json
functions, forbidden forchat
functions
fuser
parameter specifies the configuration for the model that will evaluate and combine the elements.The fuser is configured similarly to a chat_completion
variant, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
embedding_model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.embedding_model = "text-embedding-3-small"
calls thetext-embedding-3-small
model in your configuration.embedding_model = "openai::text-embedding-3-small"
calls the OpenAI API directly for thetext-embedding-3-small
model, ignoring thetext-embedding-3-small
model defined above.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.For experimental_dynamic_in_context_learning
variants, extra_body
only applies to the chat completion request.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…json_mode
- Type: string
- Required: yes for
json
functions, forbidden forchat
functions
off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
k
- Type: non-negative integer
- Required: yes
max_distance
- Type: non-negative float
- Required: no (default: none)
max_tokens
- Type: integer
- Required: no (default:
null
)
model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls thegpt-4o
model in your configuration, which supports fallback fromopenai
toazure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for thegpt-4o
model, ignoring thegpt-4o
model defined above.
retries
- Type: object with optional keys
num_retries
andmax_delay_s
- Required: no (defaults to
num_retries = 0
and amax_delay_s = 10
)
num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null
)
system_instructions
- Type: string (path)
- Required: no
system_template
, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.temperature
- Type: float
- Required: no (default:
null
)
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
Besides the type parameter, this variant has the same configuration options as the chat_completion
variant type.
Please refer to that documentation to see what options are available.
[metrics]
The [metrics]
section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name]
sections.
The metric name can’t be comment
or demonstration
, as those names are reserved for internal use.
If your metric_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-4.1
as [metrics."beats-gpt-4.1"]
.
level
- Type: string
- Required: yes
inference
and episode
.
optimize
- Type: string
- Required: yes
max
and min
.
type
- Type: string
- Required: yes
boolean
and float
.
[tools.tool_name]
The [tools.tool_name]
section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name]
sections.
If your tool_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10
as [tools."run-python-3.10"]
.
You can enable a tool for a function by adding it to the function’s tools
field.
description
- Type: string
- Required: yes
parameters
- Type: string (path)
- Required: yes
strict
- Type: boolean
- Required: no (default:
false
)
true
, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation.
For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
If the provider does not support strict mode, the TensorZero Gateway ignores this field.
name
- Type: string
- Required: no (defaults to the tool ID)
[tools.my_tool]
but don’t specify the name
, the name will be my_tool
.
This field allows you to specify a different name to be sent.
This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions).
At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.
[object_storage]
The [object_storage]
section defines the behavior of object storage, which is used for storing images used during multimodal inference.
type
- Type: string
- Required: yes
s3_compatible
: Use an S3-compatible object storage service.filesystem
: Store images in a local directory.disabled
: Disable object storage.
type: "s3_compatible"
type: "s3_compatible"
type = "s3_compatible"
, TensorZero will use an S3-compatible object storage service to store and retrieve images.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:S3_ACCESS_KEY_ID
andS3_SECRET_ACCESS_KEY
environment variablesAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables- Credentials from the AWS SDK (default profile)
type = "s3_compatible"
, the following fields are available.endpoint
- Type: string
- Required: no (defaults to AWS S3)
bucket_name
- Type: string
- Required: no
endpoint
field.region
- Type: string
- Required: no
allow_http
- Type: boolean
- Required: no (defaults to
false
)
true
, the TensorZero Gateway will instead use HTTP to access the object storage service.
This is useful for local development (e.g. a local MinIO deployment), but not recommended for production environments.allow_http
setting and use a secure method of authentication in combination with a production-grade object storage service.type: "filesystem"
type: "filesystem"
path
- Type: string
- Required: yes
type: "disabled"
type: "disabled"
type = "disabled"
, the TensorZero Gateway will not store or retrieve images.
There are no additional fields available for this type.[rate_limiting]
The [rate_limiting]
section allows you to configure granular rate limits for your TensorZero Gateway.
Rate limits help you control usage, manage costs, and prevent abuse.
See Enforce Custom Rate Limits for a comprehensive guide on rate limiting.
enabled
- Type: boolean
- Required: no (default:
true
)
false
, rate limiting rules will not be enforced even if they are defined.
[[rate_limiting.rules]]
Rate limiting rules are defined as an array of rule configurations.
Each rule specifies rate limits for specific resources (model inferences, tokens), time windows, scopes, and priorities.
Rate Limit Fields
You can set rate limits for different resources and time windows using the following field formats:model_inferences_per_second
model_inferences_per_minute
model_inferences_per_hour
model_inferences_per_day
model_inferences_per_week
model_inferences_per_month
tokens_per_second
tokens_per_minute
tokens_per_hour
tokens_per_day
tokens_per_week
tokens_per_month
capacity
and refill_rate
fields for fine-grained control over the token bucket algorithm.
capacity
and refill_rate
to the same value.
The bucket format allows you to configure burst capacity independently from the sustained rate.priority
- Type: integer
- Required: yes (unless
always
is set totrue
)
always
- Type: boolean
- Required: no (mutually exclusive with
priority
)
true
, this rule will always be applied regardless of priority.
This is useful for global fallback limits.
You cannot specify both always
and priority
in the same rule.
scope
- Type: array of scope objects
- Required: no (default:
[]
)
- Tags:
tag_key
(string): The tag key to match against.tag_value
(string): The tag value to match against. This can be:tensorzero::each
: Apply the limit separately to each unique value of the tag.tensorzero::total
: Apply the limit to the aggregate of all requests with this tag, regardless of the tag’s value.- Any other string: Apply the limit only when the tag has this specific value.