gateway, clickhouse, models, model_providers, functions, variants, tools, metrics, rate_limiting, and object_storage.
[gateway]
The [gateway] section defines the behavior of the TensorZero Gateway.
base_path
- Type: string
- Required: no (default:
/)
base_path is set to /custom/prefix, the inference endpoint will become /custom/prefix/inference instead of /inference.
bind_address
- Type: string
- Required: no (default:
[::]:3000)
[::]:3000.
Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
debug
- Type: boolean
- Required: no (default:
false)
true, the gateway will log more verbose errors to assist with debugging.
disable_pseudonymous_usage_analytics
- Type: boolean
- Required: no (default:
false)
true, TensorZero will not collect or share pseudonymous usage analytics.
export.otlp.traces.enabled
- Type: boolean
- Required: no (default:
false)
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
environment variable. See the above-linked guide for details.export.otlp.traces.extra_headers
- Type: object (map of string to string)
- Required: no (default:
{})
export.otlp.traces.format
- Type: either “opentelemetry” or “openinference”
- Required: no (default:
"opentelemetry")
"opentelemetry", TensorZero will set gen_ai attributes based on the OpenTelemetry GenAI semantic conventions.
If set to "openinference", TensorZero will set attributes based on the OpenInference semantic conventions.
fetch_and_encode_input_files_before_inference
- Type: boolean
- Required: no (default:
true)
true (default), the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt.
This is recommended to ensure that TensorZero and the model providers see identical inputs, which is important for observability and reproducibility.
If set to false, TensorZero will forward the input file URLs directly to the model provider (when supported) and fetch them for observability in parallel with inference.
This can be more efficient, but may result in different content being observed if the URL content changes between when the provider fetches it and when TensorZero fetches it for observability.
observability.async_writes
- Type: boolean
- Required: no (default:
false)
async_writes and batch_writes at the same time.
observability.batch_writes
- Type: object
- Required: no (default: disabled)
batch_writes, multiple records are collected and written together in batches to improve efficiency.
The batch_writes object supports the following fields:
enabled(boolean): Must be set totrueto enable batch writesflush_interval_ms(integer, optional): Maximum time in milliseconds to wait before flushing a batch (default:100)max_rows(integer, optional): Maximum number of rows to collect before flushing a batch (default:1000)
async_writes and batch_writes at the same time.
observability.enabled
- Type: boolean
- Required: no (default:
null)
true, the gateway will throw an error on startup if it fails to validate the ClickHouse connection.
If null, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available.
If false, the gateway will not use ClickHouse.
observability.disable_automatic_migrations
- Type: boolean
- Required: no (default
false)
true, then the migrations are not applied upon launch and must instead be applied manually
by running docker run --rm -e TENSORZERO_CLICKHOUSE_URL=$TENSORZERO_CLICKHOUSE_URL tensorzero/gateway:{version} --run-clickhouse-migrations or docker compose run --rm gateway --run-clickhouse-migrations.
If false, then the migrations are run automatically upon launch.
template_filesystem_access
- Type: object
- Required: no (default disabled)
include directive.
The object has two fields:
enabled(boolean): Determines whether to enable file system access for templates.base_path(string, optional): Determines the base path for template file system access.
base_path will be the directory containing the configuration file.
If you split your configuration into multiple files, you must specify gateway.template_filesystem_access.base_path (see Organize your configuration for details).
The include paths must be relative to base_path, and can only access files in that directory or its sub-directories.
[models.model_name]
The [models.model_name] section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name] sections.
A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).
If your model_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [models."llama-3.1-8b-instruct"].
routing
- Type: array of strings
- Required: yes
providers sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests to this model.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name.providers.provider_name]
The providers sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name] sections.
If your provider_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal as [models.model_name.providers."vllm.internal"].
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer: A JSON Pointer string specifying where to modify the request body- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
extra_body for a variant entry.
The model provider extra_body entries take priority over variant extra_body entries.Additionally, you can set extra_body at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
extra_headers for a variant entry.
The model provider extra_headers entries take priority over variant extra_headers entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers…timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for individual requests to a model provider.
You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).
For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
timeout field (or simply killing the request if you’re using a different client).
type
- Type: string
- Required: yes
anthropic, aws_bedrock, aws_sagemaker, azure, deepseek, fireworks, gcp_vertex_anthropic, gcp_vertex_gemini, google_ai_studio_gemini, groq, hyperbolic, mistral, openai, openrouter, sglang, tgi, together, vllm, and xai.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
type: "anthropic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::ANTHROPIC_API_KEYunless set otherwise inprovider_type.anthropic.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "aws_bedrock"
type: "aws_bedrock"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false)
region field (recommended).model_id
- Type: string
- Required: yes
model_id requires special prefix (e.g. the us. prefix in us.anthropic.claude-3-7-sonnet-20250219-v1:0).
See the AWS documentation on inference profiles.region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1)
type: "aws_sagemaker"
type: "aws_sagemaker"
allow_auto_detect_region
- Type: boolean
- Required: no (default:
false)
region field (recommended).endpoint_name
- Type: string
- Required: yes
hosted_provider
- Type: string
- Required: yes
aws_sagemaker provider is a wrapper on other providers.Currently, the only supported hosted_provider options are:openai(including any OpenAI-compatible server e.g. Ollama)tgi
model_name
- Type: string
- Required: yes
region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1)
type: "azure"
type: "azure"
2025-04-01-preview).
You only need to set the deployment_id and endpoint fields.deployment_id
- Type: string
- Required: yes
endpoint
- Type: string
- Required: yes
env::, the succeeding value will be treated as an environment variable name and the gateway will attempt to retrieve the value from the environment on startup.
If the endpoint starts with dynamic::, the succeeding value will be treated as an dynamic credential name and the gateway will attempt to retrieve the value from the dynamic_credentials field on each inference it is needed.api_key_location
- Type: string
- Required: no (default:
env::AZURE_OPENAI_API_KEYunless set otherwise inprovider_type.azure.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "deepseek"
type: "deepseek"
model_name
- Type: string
- Required: yes
deepseek-chat (DeepSeek-v3) and deepseek-reasoner (R1).api_key_location
- Type: string
- Required: no (default:
env::DEEPSEEK_API_KEYunless set otherwise inprovider_type.deepseek.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "fireworks"
type: "fireworks"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::FIREWORKS_API_KEYunless set otherwise inprovider_type.fireworks.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "gcp_vertex_anthropic"
type: "gcp_vertex_anthropic"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_idormodel_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_idorendpoint_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATHunless otherwise set inprovider_type.gcp_vertex_anthropic.defaults.credential_location)
env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE.type: "gcp_vertex_gemini"
type: "gcp_vertex_gemini"
endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_idormodel_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_idorendpoint_idmust be set)
project_id
- Type: string
- Required: yes
credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATHunless otherwise set inprovider_type.gcp_vertex_gemini.defaults.credential_location)
env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE.type: "google_ai_studio_gemini"
type: "google_ai_studio_gemini"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEYunless otherwise set inprovider_type.google_ai_studio.defaults.credential_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "groq"
type: "groq"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::GROQ_API_KEYunless otherwise set inprovider_type.groq.defaults.credential_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "hyperbolic"
type: "hyperbolic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::HYPERBOLIC_API_KEYunless otherwise set inprovider_type.hyperbolic.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "mistral"
type: "mistral"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::MISTRAL_API_KEYunless otherwise set inprovider_type.mistral.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/)
api_base field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEYunless otherwise set inprovider_types.openai.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference for more details).api_type
- Type: string
- Required: no (default:
chat_completions)
chat_completions for the standard Chat Completions API.
Set to responses to use the Responses API, which provides access to built-in tools like web search and reasoning capabilities.include_encrypted_reasoning
- Type: boolean
- Required: no (default:
false)
api_type = "responses".model_name
- Type: string
- Required: yes
provider_tools
- Type: array of objects
- Required: no (default:
[])
api_type = "responses".type: "openrouter"
type: "openrouter"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENROUTER_API_KEYunless otherwise set inprovider_types.openrouter.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "sglang"
type: "sglang"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none)
env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference) for more details).type: "together"
type: "together"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::TOGETHER_API_KEYunless otherwise set inprovider_types.together.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "vllm"
type: "vllm"
api_base
- Type: string
- Required: yes (default:
http://localhost:8000/v1/)
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::VLLM_API_KEY)
env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference) for more details).type: "xai"
type: "xai"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::XAI_API_KEYunless otherwise set inprovider_types.xai.defaults.api_key_location)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).type: "tgi"
type: "tgi"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
none)
env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference) for more details).[embedding_models.model_name]
The [embedding_models.model_name] section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name] sections.
A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).
If your model_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1 as [embedding_models."embedding-0.1"].
routing
- Type: array of strings
- Required: yes
providers sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeout_ms
- Type: integer
- Required: no
[embedding_models.model_name.providers.provider_name]
The providers sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name] sections.
If your provider_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal as [embedding_models.model_name.providers."vllm.internal"].
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to the embedding model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer: A JSON Pointer string specifying where to modify the request body- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
extra_body at inference-time.
The values provided at inference-time take priority over the values in the configuration file.timeout_ms
- Type: integer
- Required: no
type
- Type: string
- Required: yes
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/)
api_base field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.model_name
- Type: string
- Required: yes
api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY)
env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference) for more details).[provider_types]
The provider_types section of the configuration allows users to specify global settings that are related to the handling of a particular inference provider type (like "openai" or "anthropic"), such as where to look by default for credentials.
[provider_types.anthropic]
[provider_types.anthropic]
defaults.api_key_location
- Type: string
- Required: no (default:
env::ANTHROPIC_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.azure]
[provider_types.azure]
defaults.api_key_location
- Type: string
- Required: no (default:
env::AZURE_OPENAI_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.deepseek]
[provider_types.deepseek]
defaults.api_key_location
- Type: string
- Required: no (default:
env::DEEPSEEK_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.fireworks]
[provider_types.fireworks]
defaults.api_key_location
- Type: string
- Required: no (default:
env::FIREWORKS_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.gcp_vertex_anthropic]
[provider_types.gcp_vertex_anthropic]
defaults.credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH)
env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE.[provider_types.gcp_vertex_gemini]
[provider_types.gcp_vertex_gemini]
batch
- Type: object
- Required: no (default:
null)
batch object allows you to configure batch processing for GCP Vertex models.
Today we support batch inference through GCP Vertex using Google cloud storage as documented here.
To do this you must also have object_storage (see the object_storage section) configured using GCP.batch object supports the following configuration:storage_type
- Type: string
- Required: no (default
"none")
"cloud_storage" and "none" are supported.input_uri_prefix
- Type: string
- Required: yes when
storage_typeis"cloud_storage"
output_uri_prefix
- Type: string
- Required: yes when
storage_typeis"cloud_storage"
defaults.credential_location
- Type: string
- Required: no (default:
env::GCP_CREDENTIALS_PATH)
env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME (see the API reference) for more details), and file::PATH_TO_CREDENTIALS_FILE.[provider_types.google_ai_studio]
[provider_types.google_ai_studio]
defaults.api_key_location
- Type: string
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.groq]
[provider_types.groq]
defaults.api_key_location
- Type: string
- Required: no (default:
env::GROQ_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.hyperbolic]
[provider_types.hyperbolic]
defaults.api_key_location
- Type: string
- Required: no (default:
env::HYPERBOLIC_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.mistral]
[provider_types.mistral]
defaults.api_key_location
- Type: string
- Required: no (default:
env::MISTRAL_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.openai]
[provider_types.openai]
defaults.api_key_location
- Type: string
- Required: no (default:
env::OPENAI_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.openrouter]
[provider_types.openrouter]
defaults.api_key_location
- Type: string
- Required: no (default:
env::OPENROUTER_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.together]
[provider_types.together]
defaults.api_key_location
- Type: string
- Required: no (default:
env::TOGETHER_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[provider_types.xai]
[provider_types.xai]
defaults.api_key_location
- Type: string
- Required: no (default:
env::XAI_API_KEY)
env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference) for more details).[functions.function_name]
The [functions.function_name] section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name] sections.
A function can have multiple variants, and each variant is defined in the variants sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0 as [functions."summarize-2.0"].
assistant_schema
- Type: string (path)
- Required: no
description
- Type: string
- Required: no
system_schema
- Type: string (path)
- Required: no
type
- Type: string
- Required: yes
chat and json.
Most other fields in the function section depend on the function type.
type: "chat"
type: "chat"
parallel_tool_calls
- Type: boolean
- Required: no
tool_choice
- Type: string
- Required: no (default:
auto)
none: The function should not use any tools.auto: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }: The model should use a specific tool. The tool must be defined in thetoolsfield (see below).
tools
- Type: array of strings
- Required: no (default:
[])
[tools.tool_name] sections (see below).type: "json"
type: "json"
output_schema
- Type: string (path)
- Required: no (default:
{}, the empty JSON schema that accepts any valid JSON output)
user_schema
- Type: string (path)
- Required: no
[functions.function_name.variants.variant_name]
The variants sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name] sections.
If your variant_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [functions.function_name.variants."llama-3.1-8b-instruct"].
type
- Type: string
- Required: yes
| Type | Description |
|---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_chain_of_thought | Encourages the model to reason step by step using a chain-of-thought prompting strategy, which is particularly useful for tasks requiring logical reasoning or multi-step problem-solving. Only available for non-streaming requests to JSON functions. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
experimental_mixture_of_n | Generates multiple response candidates with other variants, and combines the responses using a fuser model. |
type: "chat_completion"
type: "chat_completion"
assistant_template
- Type: string (path)
- Required: no
assistant_schema field.extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:pointer: A JSON Pointer string specifying where to modify the request body- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
extra_body for a model provider entry.
The model provider extra_body entries take priority over variant extra_body entries.Additionally, you can set extra_body at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
extra_headers for a model provider entry.
The model provider extra_headers entries take priority over variant extra_headers entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers…frequency_penalty
- Type: float
- Required: no (default:
null)
json_mode
- Type: string
- Required: yes for
jsonfunctions, forbidden forchatfunctions
off: Make a chat completion request without any special JSON handling (not recommended).on: Make a chat completion request with JSON mode (if supported by the provider).strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
max_tokens
- Type: integer
- Required: no (default:
null)
model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic, deepseek, fireworks, google_ai_studio_gemini, gcp_vertex_gemini, gcp_vertex_anthropic, hyperbolic, groq, mistral, openai, openrouter, together, and xai.model = "gpt-4o"calls thegpt-4omodel in your configuration, which supports fallback fromopenaitoazure. See Retries & Fallbacks for details.model = "openai::gpt-4o"calls the OpenAI API directly for thegpt-4omodel, ignoring thegpt-4omodel defined above.
presence_penalty
- Type: float
- Required: no (default:
null)
retries
- Type: object with optional keys
num_retriesandmax_delay_s - Required: no (defaults to
num_retries = 0and amax_delay_s = 10)
num_retries parameter defines the number of retries (not including the initial request).
The max_delay_s parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null)
system_template
- Type: string (path)
- Required: no
system_schema field.temperature
- Type: float
- Required: no (default:
null)
timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):top_p
- Type: float, between 0 and 1
- Required: no (default:
null)
top_p to use for the variant during nucleus sampling.
Typically at most one of top_p and temperature is set.user_template
- Type: string (path)
- Required: no
user_schema field.weight
- Type: float
- Required: no (default: 0)
1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.You can disable a variant by setting its weight to 0.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_best_of_n"
type: "experimental_best_of_n"
candidates
- Type: list of strings
- Required: yes
candidates parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below.
The evaluator would then choose the best response from these three candidates.evaluator
- Type: object
- Required: yes
evaluator parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.The evaluator is configured similarly to a chat_completion variant for a JSON function, but without the type field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.The evaluator can optionally include a json_mode parameter (see the json_mode documentation under chat_completion variants). If not specified, it defaults to strict.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.You can disable a variant by setting its weight to 0.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
type: "experimental_chain_of_thought"
experimental_chain_of_thought variant type uses the same configuration as a chat_completion variant.type: "experimental_mixture_of_n"
type: "experimental_mixture_of_n"
candidates
- Type: list of strings
- Required: yes
candidates parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below.
The fuser would then combine the three responses.fuser
- Type: object
- Required: yes for
jsonfunctions, forbidden forchatfunctions
fuser parameter specifies the configuration for the model that will evaluate and combine the elements.The fuser is configured similarly to a chat_completion variant, but without the type field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.timeout_s
- Type: float
- Required: no (default: 300s)
timeout_s parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.You can disable a variant by setting its weight to 0.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
embedding_model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic, deepseek, fireworks, google_ai_studio_gemini, gcp_vertex_gemini, gcp_vertex_anthropic, hyperbolic, groq, mistral, openai, openrouter, together, and xai.embedding_model = "text-embedding-3-small"calls thetext-embedding-3-smallmodel in your configuration.embedding_model = "openai::text-embedding-3-small"calls the OpenAI API directly for thetext-embedding-3-smallmodel, ignoring thetext-embedding-3-smallmodel defined above.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.For experimental_dynamic_in_context_learning variants, extra_body only applies to the chat completion request.Each object in the array must have two fields:pointer: A JSON Pointer string specifying where to modify the request body- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
extra_body for a model provider entry.
The model provider extra_body entries take priority over variant extra_body entries.Additionally, you can set extra_body at inference-time.
The values provided at inference-time take priority over the values in the configuration file.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
extra_headers for a model provider entry.
The model provider extra_headers entries take priority over variant extra_headers entries.
Example: `extra_headers`
Example: `extra_headers`
extra_headers…json_mode
- Type: string
- Required: yes for
jsonfunctions, forbidden forchatfunctions
off: Make a chat completion request without any special JSON handling (not recommended).on: Make a chat completion request with JSON mode (if supported by the provider).strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
k
- Type: non-negative integer
- Required: yes
max_distance
- Type: non-negative float
- Required: no (default: none)
max_tokens
- Type: integer
- Required: no (default:
null)
model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
anthropic, deepseek, fireworks, google_ai_studio_gemini, gcp_vertex_gemini, gcp_vertex_anthropic, hyperbolic, groq, mistral, openai, openrouter, together, and xai.model = "gpt-4o"calls thegpt-4omodel in your configuration, which supports fallback fromopenaitoazure. See Retries & Fallbacks for details.model = "openai::gpt-4o"calls the OpenAI API directly for thegpt-4omodel, ignoring thegpt-4omodel defined above.
retries
- Type: object with optional keys
num_retriesandmax_delay_s - Required: no (defaults to
num_retries = 0and amax_delay_s = 10)
num_retries parameter defines the number of retries (not including the initial request).
The max_delay_s parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null)
system_instructions
- Type: string (path)
- Required: no
system_template, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.temperature
- Type: float
- Required: no (default:
null)
timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT).For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):weight
- Type: float
- Required: no (default: 0)
1.0 and variant B has a weight of 3.0, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25% and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%.You can disable a variant by setting its weight to 0.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.type: "experimental_chain_of_thought"
Besides the type parameter, this variant has the same configuration options as the chat_completion variant type.
Please refer to that documentation to see what options are available.
[metrics]
The [metrics] section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name] sections.
The metric name can’t be comment or demonstration, as those names are reserved for internal use.
If your metric_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-4.1 as [metrics."beats-gpt-4.1"].
level
- Type: string
- Required: yes
inference and episode.
optimize
- Type: string
- Required: yes
max and min.
type
- Type: string
- Required: yes
boolean and float.
[tools.tool_name]
The [tools.tool_name] section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name] sections.
If your tool_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10 as [tools."run-python-3.10"].
You can enable a tool for a function by adding it to the function’s tools field.
description
- Type: string
- Required: yes
parameters
- Type: string (path)
- Required: yes
strict
- Type: boolean
- Required: no (default:
false)
true, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation.
For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
If the provider does not support strict mode, the TensorZero Gateway ignores this field.
name
- Type: string
- Required: no (defaults to the tool ID)
[tools.my_tool] but don’t specify the name, the name will be my_tool.
This field allows you to specify a different name to be sent.
This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions).
At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.
[object_storage]
The [object_storage] section defines the behavior of object storage, which is used for storing images used during multimodal inference.
type
- Type: string
- Required: yes
s3_compatible: Use an S3-compatible object storage service.filesystem: Store images in a local directory.disabled: Disable object storage.
type: "s3_compatible"
type: "s3_compatible"
type = "s3_compatible", TensorZero will use an S3-compatible object storage service to store and retrieve images.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:S3_ACCESS_KEY_IDandS3_SECRET_ACCESS_KEYenvironment variablesAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables- Credentials from the AWS SDK (default profile)
type = "s3_compatible", the following fields are available.endpoint
- Type: string
- Required: no (defaults to AWS S3)
bucket_name
- Type: string
- Required: no
endpoint field.region
- Type: string
- Required: no
allow_http
- Type: boolean
- Required: no (defaults to
false)
true, the TensorZero Gateway will instead use HTTP to access the object storage service.
This is useful for local development (e.g. a local MinIO deployment), but not recommended for production environments.allow_http setting and use a secure method of authentication in combination with a production-grade object storage service.type: "filesystem"
type: "filesystem"
path
- Type: string
- Required: yes
type: "disabled"
type: "disabled"
type = "disabled", the TensorZero Gateway will not store or retrieve images.
There are no additional fields available for this type.[rate_limiting]
The [rate_limiting] section allows you to configure granular rate limits for your TensorZero Gateway.
Rate limits help you control usage, manage costs, and prevent abuse.
See Enforce Custom Rate Limits for a comprehensive guide on rate limiting.
enabled
- Type: boolean
- Required: no (default:
true)
false, rate limiting rules will not be enforced even if they are defined.
[[rate_limiting.rules]]
Rate limiting rules are defined as an array of rule configurations.
Each rule specifies rate limits for specific resources (model inferences, tokens), time windows, scopes, and priorities.
Rate Limit Fields
You can set rate limits for different resources and time windows using the following field formats:model_inferences_per_secondmodel_inferences_per_minutemodel_inferences_per_hourmodel_inferences_per_daymodel_inferences_per_weekmodel_inferences_per_monthtokens_per_secondtokens_per_minutetokens_per_hourtokens_per_daytokens_per_weektokens_per_month
capacity and refill_rate fields for fine-grained control over the token bucket algorithm.
capacity and refill_rate to the same value.
The bucket format allows you to configure burst capacity independently from the sustained rate.priority
- Type: integer
- Required: yes (unless
alwaysis set totrue)
always
- Type: boolean
- Required: no (mutually exclusive with
priority)
true, this rule will always be applied regardless of priority.
This is useful for global fallback limits.
You cannot specify both always and priority in the same rule.
scope
- Type: array of scope objects
- Required: no (default:
[])
- Tags:
tag_key(string): The tag key to match against.tag_value(string): The tag value to match against. This can be:tensorzero::each: Apply the limit separately to each unique value of the tag.tensorzero::total: Apply the limit to the aggregate of all requests with this tag, regardless of the tag’s value.- Any other string: Apply the limit only when the tag has this specific value.