gateway
, clickhouse
, models
, model_providers
, functions
, variants
, tools
, and metrics
.
[gateway]
The [gateway]
section defines the behavior of the TensorZero Gateway.
base_path
- Type: string
- Required: no (default:
/
)
base_path
is set to /custom/prefix
, the inference endpoint will become /custom/prefix/inference
instead of /inference
.
bind_address
- Type: string
- Required: no (default:
[::]:3000
)
[::]:3000
.
Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
debug
- Type: boolean
- Required: no (default:
false
)
true
, the gateway will log more verbose errors to assist with debugging.
disable_pseudonymous_usage_analytics
- Type: boolean
- Required: no (default:
false
)
true
, TensorZero will not collect or share pseudonymous usage analytics.
enable_template_filesystem_access
- Type: boolean
- Required: no (default:
false
)
include
directive).
Paths must be relative to tensorzero.toml
, and can only access files in that directory or its sub-directories.
export.otlp.traces.enabled
- Type: boolean
- Required: no (default:
false
)
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
environment variable. See above linked guide for details.observability.async_writes
- Type: boolean
- Required: no (default:
true
)
observability.enabled
- Type: boolean
- Required: no (default:
null
)
true
, the gateway will throw an error on startup if it fails to validate the ClickHouse connection.
If null
, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available.
If false
, the gateway will not use ClickHouse.
[models.model_name]
The [models.model_name]
section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [models."llama-3.1-8b-instruct"]
.
routing
- Type: array of strings
- Required: yes
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests to this model.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [models.model_name.providers."vllm.internal"]
.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a variant entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a variant entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for individual requests to a model provider.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
timeout
field (or simply killing the request if you’re using a different client).
type
- Type: string
- Required: yes
anthropic
, aws_bedrock
, aws_sagemaker
, azure
, deepseek
, fireworks
, gcp_vertex_anthropic
, gcp_vertex_gemini
, google_ai_studio_gemini
, groq
, hyperbolic
, mistral
, openai
, openrouter
, sglang
, tgi
, together
, vllm
, and xai
.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
type: "anthropic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::ANTHROPIC_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "aws_bedrock"
type: "aws_bedrock"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false
)
region
field (recommended).model_id
- Type: string
- Required: yes
model_id
requires special prefix (e.g. the us.
prefix in us.anthropic.claude-3-7-sonnet-20250219-v1:0
).
See the AWS documentation on inference profiles.region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1
)
type: "aws_sagemaker"
type: "aws_sagemaker"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false
)
region
field (recommended).endpoint_name
- Type: string
- Required: yes
hosted_provider
- Type: string
- Required: yes
aws_sagemaker
provider is a wrapper on other providers.Currently, the only supported hosted_provider
options are:openai
(including any OpenAI-compatible server e.g. Ollama)tgi
model_name
- Type: string
- Required: yes
region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1
)
type: "azure"
type: "azure"
2024-06-01
).
You only need to set the deployment_id
and endpoint
fields.deployment_id
- Type: string
- Required: yes
endpoint
- Type: string
- Required: yes
env::
, the succeeding value will be treated as an environment variable name and the gateway will attempt to retrieve the value from the environment on startup.
If the endpoint starts with dynamic::
, the succeeding value will be treated as an dynamic credential name and the gateway will attempt to retrieve the value from the dynamic_credentials
field on each inference it is needed.api_key_location
- Type: string
- Required: no (default:
env::AZURE_OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "deepseek"
type: "deepseek"
model_name
- Type: string
- Required: yes
deepseek-chat
(DeepSeek-v3) and deepseek-reasoner
(R1).api_key_location
- Type: string
- Required: no (default:
env::DEEPSEEK_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "fireworks"
type: "fireworks"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::FIREWORKS_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "gcp_vertex_anthropic"
type: "gcp_vertex_anthropic"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_id
ormodel_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_id
orendpoint_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "gcp_vertex_gemini"
type: "gcp_vertex_gemini"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_id
ormodel_id
must be set)
model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_id
orendpoint_id
must be set)
project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH
)
env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "google_ai_studio_gemini"
type: "google_ai_studio_gemini"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "groq"
type: "groq"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GROQ_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "hyperbolic"
type: "hyperbolic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::HYPERBOLIC_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "mistral"
type: "mistral"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::MISTRAL_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "openrouter"
type: "openrouter"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENROUTER_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "sglang"
type: "sglang"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "together"
type: "together"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::TOGETHER_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "vllm"
type: "vllm"
api_base
- Type: string
- Required: yes (default:
http://localhost:8000/v1/
)
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::VLLM_API_KEY
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "xai"
type: "xai"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::XAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "tgi"
type: "tgi"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[embedding_models.model_name]
The [embedding_models.model_name]
section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1
as [embedding_models."embedding-0.1"]
.
routing
- Type: array of strings
- Required: yes
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
[embedding_models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [embedding_models.model_name.providers."vllm.internal"]
.
type
- Type: string
- Required: yes
openai
as a provider for embedding models.
More integrations are on the way.
The other fields in the provider sub-section depend on the provider type.
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY
)
env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[functions.function_name]
The [functions.function_name]
section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name]
sections.
A function can have multiple variants, and each variant is defined in the variants
sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0
as [functions."summarize-2.0"]
.
assistant_schema
- Type: string (path)
- Required: no
description
- Type: string
- Required: no
system_schema
- Type: string (path)
- Required: no
type
- Type: string
- Required: yes
chat
and json
.
Most other fields in the function section depend on the function type.
type: "chat"
type: "chat"
parallel_tool_calls
- Type: boolean
- Required: no
tool_choice
- Type: string
- Required: no (default:
auto
)
none
: The function should not use any tools.auto
: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required
: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }
: The model should use a specific tool. The tool must be defined in thetools
field (see below).
tools
- Type: array of strings
- Required: no (default:
[]
)
[tools.tool_name]
sections (see below).type: "json"
type: "json"
output_schema
- Type: string (path)
- Required: no (default:
{}
, the empty JSON schema that accepts any valid JSON output)
user_schema
- Type: string (path)
- Required: no
[functions.function_name.variants.variant_name]
The variants
sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name]
sections.
If your variant_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [functions.function_name.variants."llama-3.1-8b-instruct"]
.
type
- Type: string
- Required: yes
Type | Description |
---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_chain_of_thought | Encourages the model to reason step by step using a chain-of-thought prompting strategy, which is particularly useful for tasks requiring logical reasoning or multi-step problem-solving. Only available for non-streaming requests to JSON functions. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
experimental_mixture_of_n | Generates multiple response candidates with other variants, and combines the responses using a fuser model. |
type: "chat_completion"
type: "chat_completion"
assistant_template
- Type: string (path)
- Required: no
assistant_schema
field.extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…frequency_penalty
- Type: float
- Required: no (default:
null
)
json_mode
- Type: string
- Required: no (default:
strict
)
type = "json"
.The supported modes are:off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
max_tokens
- Type: integer
- Required: no (default:
null
)
model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls thegpt-4o
model in your configuration, which supports fallback fromopenai
toazure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for thegpt-4o
model, ignoring thegpt-4o
model defined above.
presence_penalty
- Type: float
- Required: no (default:
null
)
retries
- Type: object with optional keys
num_retries
andmax_delay_s
- Required: no (defaults to
num_retries = 0
and amax_delay_s = 10
)
num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null
)
system_template
- Type: string (path)
- Required: no
system_schema
field.temperature
- Type: float
- Required: no (default:
null
)
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):top_p
- Type: float, between 0 and 1
- Required: no (default:
null
)
top_p
to use for the variant during nucleus sampling.
Typically at most one of top_p
and temperature
is set.user_template
- Type: string (path)
- Required: no
user_schema
field.weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_best_of_n"
type: "experimental_best_of_n"
candidates
- Type: list of strings
- Required: yes
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The evaluator would then choose the best response from these three candidates.evaluator
- Type: object
- Required: yes
evaluator
parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.The evaluator is configured similarly to a chat_completion
variant, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
type: "experimental_chain_of_thought"
experimental_chain_of_thought
variant type uses the same configuration as a chat_completion
variant.type: "experimental_mixture_of_n"
type: "experimental_mixture_of_n"
candidates
- Type: list of strings
- Required: yes
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The fuser would then combine the three responses.fuser
- Type: object
- Required: yes
fuser
parameter specifies the configuration for the model that will evaluate and combine the elements.The evaluator is configured similarly to a chat_completion
variant, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
embedding_model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.embedding_model = "text-embedding-3-small"
calls thetext-embedding-3-small
model in your configuration.embedding_model = "openai::text-embedding-3-small"
calls the OpenAI API directly for thetext-embedding-3-small
model, ignoring thetext-embedding-3-small
model defined above.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.For experimental_dynamic_in_context_learning
variants, extra_body
only applies to the chat completion request.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request body- One of the following:
value
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.
extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body
…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g.anthropic-beta
)- One of the following:
value
(string): The value of the header (e.g.token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if present
extra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers
…json_mode
- Type: string
- Required: no (default:
strict
)
type = "json"
.The supported modes are:off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
k
- Type: non-negative integer
- Required: yes
max_tokens
- Type: integer
- Required: no (default:
null
)
model
- Type: string
- Required: yes
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls thegpt-4o
model in your configuration, which supports fallback fromopenai
toazure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for thegpt-4o
model, ignoring thegpt-4o
model defined above.
retries
- Type: object with optional keys
num_retries
andmax_delay_s
- Required: no (defaults to
num_retries = 0
and amax_delay_s = 10
)
num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null
)
system_instructions
- Type: string (path)
- Required: no
system_template
, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.temperature
- Type: float
- Required: no (default:
null
)
timeouts
- Type: object
- Required: no
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
Besides the type parameter, this variant has the same configuration options as the chat_completion
variant type.
Please refer to that documentation to see what options are available.
[metrics]
The [metrics]
section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name]
sections.
The metric name can’t be comment
or demonstration
, as those names are reserved for internal use.
If your metric_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-3.5
as [metrics."beats-gpt-3.5"]
.
level
- Type: string
- Required: yes
inference
and episode
.
optimize
- Type: string
- Required: yes
max
and min
.
type
- Type: string
- Required: yes
boolean
and float
.
[tools.tool_name]
The [tools.tool_name]
section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name]
sections.
If your tool_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10
as [tools."run-python-3.10"]
.
You can enable a tool for a function by adding it to the function’s tools
field.
description
- Type: string
- Required: yes
parameters
- Type: string (path)
- Required: yes
strict
- Type: boolean
- Required: no (default:
false
)
true
, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation.
For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
If the provider does not support strict mode, the TensorZero Gateway ignores this field.
name
- Type: string
- Required: no (defaults to the tool ID)
[tools.my_tool]
but don’t specify the name
, the name will be my_tool
.
This field allows you to specify a different name to be sent.
This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions).
At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.
[object_storage]
The [object_storage]
section defines the behavior of object storage, which is used for storing images used during multimodal inference.
type
- Type: string
- Required: yes
s3_compatible
: Use an S3-compatible object storage service.filesystem
: Store images in a local directory.disabled
: Disable object storage.
type: "s3_compatible"
type: "s3_compatible"
type = "s3_compatible"
, TensorZero will use an S3-compatible object storage service to store and retrieve images.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:S3_ACCESS_KEY_ID
andS3_SECRET_ACCESS_KEY
environment variablesAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables- Credentials from the AWS SDK (default profile)
type = "s3_compatible"
, the following fields are available.endpoint
- Type: string
- Required: no (defaults to AWS S3)
bucket_name
- Type: string
- Required: no
endpoint
field.region
- Type: string
- Required: no
allow_http
- Type: boolean
- Required: no (defaults to
false
)
true
, the TensorZero Gateway will instead use HTTP to access the object storage service.
This is useful for local development (e.g. a local MinIO deployment), but not recommended for production environments.allow_http
setting and use a secure method of authentication in combination with a production-grade object storage service.type: "filesystem"
type: "filesystem"
path
- Type: string
- Required: yes
type: "disabled"
type: "disabled"
type = "disabled"
, the TensorZero Gateway will not store or retrieve images.
There are no additional fields available for this type.