Learn how to configure the TensorZero Gateway.
gateway
, clickhouse
, models
, model_providers
, functions
, variants
, tools
, and metrics
.
[gateway]
[gateway]
section defines the behavior of the TensorZero Gateway.
base_path
/
)base_path
is set to /custom/prefix
, the inference endpoint will become /custom/prefix/inference
instead of /inference
.
bind_address
[::]:3000
)[::]:3000
.
Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
debug
false
)true
, the gateway will log more verbose errors to assist with debugging.
enable_template_filesystem_access
false
)include
directive).
Paths must be relative to tensorzero.toml
, and can only access files in that directory or its sub-directories.
export.otlp.traces.enabled
false
)OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
environment variable. See above linked guide for details.observability.async_writes
true
)observability.enabled
null
)true
, the gateway will throw an error on startup if it fails to validate the ClickHouse connection.
If null
, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available.
If false
, the gateway will not use ClickHouse.
[models.model_name]
[models.model_name]
section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [models."llama-3.1-8b-instruct"]
.
routing
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeouts
timeouts
object allows you to set granular timeouts for requests to this model.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name.providers.provider_name]
providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [models.model_name.providers."vllm.internal"]
.
extra_body
extra_body
field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer
: A JSON Pointer string specifying where to modify the request bodyvalue
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.extra_body
for a variant entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.Example: `extra_body`
extra_body
…extra_headers
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name
(string): The name of the header to modify (e.g. anthropic-beta
)value
(string): The value of the header (e.g. token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if presentextra_headers
for a variant entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.Example: `extra_headers`
extra_headers
…timeouts
timeouts
object allows you to set granular timeouts for individual requests to a model provider.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
timeout
field (or simply killing the request if you’re using a different client).
type
anthropic
, aws_bedrock
, aws_sagemaker
, azure
, deepseek
, fireworks
, gcp_vertex_anthropic
, gcp_vertex_gemini
, google_ai_studio_gemini
, groq
, hyperbolic
, mistral
, openai
, openrouter
, sglang
, tgi
, together
, vllm
, and xai
.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
model_name
api_key_location
env::ANTHROPIC_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "aws_bedrock"
allow_auto_detect_region
false
)region
field (recommended).model_id
model_id
requires special prefix (e.g. the us.
prefix in us.anthropic.claude-3-7-sonnet-20250219-v1:0
).
See the AWS documentation on inference profiles.region
us-east-1
)type: "aws_sagemaker"
allow_auto_detect_region
false
)region
field (recommended).endpoint_name
hosted_provider
aws_sagemaker
provider is a wrapper on other providers.Currently, the only supported hosted_provider
options are:openai
(including any OpenAI-compatible server e.g. Ollama)tgi
model_name
region
us-east-1
)type: "azure"
2024-06-01
).
You only need to set the deployment_id
and endpoint
fields.deployment_id
endpoint
api_key_location
env::AZURE_OPENAI_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "deepseek"
model_name
deepseek-chat
(DeepSeek-v3) and deepseek-reasoner
(R1).api_key_location
env::DEEPSEEK_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "fireworks"
model_name
api_key_location
env::FIREWORKS_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "gcp_vertex_anthropic"
endpoint_id
endpoint_id
or model_id
must be set)model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
model_id
model_id
or endpoint_id
must be set)model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.project_id
credential_location
env::GCP_CREDENTIALS_PATH
)env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "gcp_vertex_gemini"
endpoint_id
endpoint_id
or model_id
must be set)model_id
for off-the-shelf models and endpoint_id
for fine-tuned models and custom endpoints.location
model_id
model_id
or endpoint_id
must be set)project_id
credential_location
env::GCP_CREDENTIALS_PATH
)env::PATH_TO_CREDENTIALS_FILE
, dynamic::CREDENTIALS_ARGUMENT_NAME
(see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE
.type: "google_ai_studio_gemini"
model_name
api_key_location
env::GOOGLE_AI_STUDIO_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "groq"
model_name
api_key_location
env::GROQ_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "hyperbolic"
model_name
api_key_location
env::HYPERBOLIC_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "mistral"
model_name
api_key_location
env::MISTRAL_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "openai"
api_base
https://api.openai.com/v1/
)api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
api_key_location
env::OPENAI_API_KEY
)env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "openrouter"
model_name
api_key_location
env::OPENROUTER_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "sglang"
api_base
api_key_location
none
)env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "together"
model_name
api_key_location
env::TOGETHER_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "vllm"
api_base
http://localhost:8000/v1/
)model_name
api_key_location
env::VLLM_API_KEY
)env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).type: "xai"
model_name
api_key_location
env::XAI_API_KEY
)env::ENVIRONMENT_VARIABLE
and dynamic::ARGUMENT_NAME
(see the API reference) for more details).type: "tgi"
api_base
api_key_location
none
)env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[embedding_models.model_name]
[embedding_models.model_name]
section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1
as [embedding_models."embedding-0.1"]
.
routing
providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
[embedding_models.model_name.providers.provider_name]
providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [embedding_models.model_name.providers."vllm.internal"]
.
type
openai
as a provider for embedding models.
More integrations are on the way.
The other fields in the provider sub-section depend on the provider type.
type: "openai"
api_base
https://api.openai.com/v1/
)api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
api_key_location
env::OPENAI_API_KEY
)env::ENVIRONMENT_VARIABLE
, dynamic::ARGUMENT_NAME
, and none
(see the API reference) for more details).[functions.function_name]
[functions.function_name]
section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name]
sections.
A function can have multiple variants, and each variant is defined in the variants
sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0
as [functions."summarize-2.0"]
.
assistant_schema
description
system_schema
type
chat
and json
.
Most other fields in the function section depend on the function type.
type: "chat"
parallel_tool_calls
tool_choice
auto
)none
: The function should not use any tools.auto
: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required
: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }
: The model should use a specific tool. The tool must be defined in the tools
field (see below).tools
[]
)[tools.tool_name]
sections (see below).type: "json"
output_schema
{}
, the empty JSON schema that accepts any valid JSON output)user_schema
[functions.function_name.variants.variant_name]
variants
sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name]
sections.
If your variant_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [functions.function_name.variants."llama-3.1-8b-instruct"]
.
type
Type | Description |
---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_chain_of_thought | Encourages the model to reason step by step using a chain-of-thought prompting strategy, which is particularly useful for tasks requiring logical reasoning or multi-step problem-solving. Only available for non-streaming requests to JSON functions. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
experimental_mixture_of_n | Generates multiple response candidates with other variants, and combines the responses using a fuser model. |
type: "chat_completion"
assistant_template
assistant_schema
field.extra_body
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request bodyvalue
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.Example: `extra_body`
extra_body
…extra_headers
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g. anthropic-beta
)value
(string): The value of the header (e.g. token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if presentextra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.Example: `extra_headers`
extra_headers
…frequency_penalty
null
)json_mode
strict
)type = "json"
.The supported modes are:off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.max_tokens
null
)model
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls the gpt-4o
model in your configuration, which supports fallback from openai
to azure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for the gpt-4o
model, ignoring the gpt-4o
model defined above.presence_penalty
null
)retries
num_retries
and max_delay_s
num_retries = 0
and a max_delay_s = 10
)num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
null
)system_template
system_schema
field.temperature
null
)timeouts
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):top_p
null
)top_p
to use for the variant during nucleus sampling.
Typically at most one of top_p
and temperature
is set.user_template
user_schema
field.weight
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_best_of_n"
candidates
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The evaluator would then choose the best response from these three candidates.evaluator
evaluator
parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.The evaluator is configured similarly to a chat_completion
variant, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.timeout_s
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
experimental_chain_of_thought
variant type uses the same configuration as a chat_completion
variant.type: "experimental_mixture_of_n"
candidates
candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The fuser would then combine the three responses.fuser
fuser
parameter specifies the configuration for the model that will evaluate and combine the elements.The evaluator is configured similarly to a chat_completion
variant, but without the type
field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.timeout_s
timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_dynamic_in_context_learning"
embedding_model
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.embedding_model = "text-embedding-3-small"
calls the text-embedding-3-small
model in your configuration.embedding_model = "openai::text-embedding-3-small"
calls the OpenAI API directly for the text-embedding-3-small
model, ignoring the text-embedding-3-small
model defined above.extra_body
extra_body
field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.For experimental_dynamic_in_context_learning
variants, extra_body
only applies to the chat completion request.Each object in the array must have two fields:pointer
: A JSON Pointer string specifying where to modify the request bodyvalue
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.extra_body
for a model provider entry.
The model provider extra_body
entries take priority over variant extra_body
entries.Additionally, you can set extra_body
at inference-time.
The values provided at inference-time take priority over the values in the configuration file.Example: `extra_body`
extra_body
…extra_headers
extra_headers
field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name
(string): The name of the header to modify (e.g. anthropic-beta
)value
(string): The value of the header (e.g. token-efficient-tools-2025-02-19
)delete = true
: Deletes the header from the request, if presentextra_headers
for a model provider entry.
The model provider extra_headers
entries take priority over variant extra_headers
entries.Example: `extra_headers`
extra_headers
…json_mode
strict
)type = "json"
.The supported modes are:off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.k
max_tokens
null
)model
To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic
, deepseek
, fireworks
, google_ai_studio_gemini
, gcp_vertex_gemini
, gcp_vertex_anthropic
, hyperbolic
, groq
, mistral
, openai
, openrouter
, together
, and xai
.model = "gpt-4o"
calls the gpt-4o
model in your configuration, which supports fallback from openai
to azure
. See Retries & Fallbacks for details.model = "openai::gpt-4o"
calls the OpenAI API directly for the gpt-4o
model, ignoring the gpt-4o
model defined above.retries
num_retries
and max_delay_s
num_retries = 0
and a max_delay_s = 10
)num_retries
parameter defines the number of retries (not including the initial request).
The max_delay_s
parameter defines the maximum delay between retries.seed
null
)system_instructions
system_template
, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.temperature
null
)timeouts
timeouts
object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms
corresponds to the total request duration and timeouts.streaming.ttft_ms
corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
chat_completion
variant type.
Please refer to that documentation to see what options are available.
[metrics]
[metrics]
section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name]
sections.
The metric name can’t be comment
or demonstration
, as those names are reserved for internal use.
If your metric_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-3.5
as [metrics."beats-gpt-3.5"]
.
level
inference
and episode
.
optimize
max
and min
.
type
boolean
and float
.
[tools.tool_name]
[tools.tool_name]
section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name]
sections.
If your tool_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10
as [tools."run-python-3.10"]
.
You can enable a tool for a function by adding it to the function’s tools
field.
description
parameters
strict
false
)true
, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation.
For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
If the provider does not support strict mode, the TensorZero Gateway ignores this field.
name
[tools.my_tool]
but don’t specify the name
, the name will be my_tool
.
This field allows you to specify a different name to be sent.
This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions).
At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.
[object_storage]
[object_storage]
section defines the behavior of object storage, which is used for storing images used during multimodal inference.
type
s3_compatible
: Use an S3-compatible object storage service.filesystem
: Store images in a local directory.disabled
: Disable object storage.type: "s3_compatible"
type = "s3_compatible"
, TensorZero will use an S3-compatible object storage service to store and retrieve images.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:S3_ACCESS_KEY_ID
and S3_SECRET_ACCESS_KEY
environment variablesAWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variablestype = "s3_compatible"
, the following fields are available.endpoint
bucket_name
endpoint
field.region
allow_http
false
)true
, the TensorZero Gateway will instead use HTTP to access the object storage service.
This is useful for local development (e.g. a local MinIO deployment), but not recommended for production environments.allow_http
setting and use a secure method of authentication in combination with a production-grade object storage service.type: "filesystem"
path
type: "disabled"
type = "disabled"
, the TensorZero Gateway will not store or retrieve images.
There are no additional fields available for this type.