gateway, clickhouse, postgres, models, model_providers, functions, variants, tools, metrics, rate_limiting, and object_storage.
[gateway]
The [gateway] section defines the behavior of the TensorZero Gateway.
auth.cache.enabled
- Type: boolean
- Required: no (default:
true)
auth.cache.ttl_ms
- Type: integer
- Required: no (default:
1000)
auth.enabled
- Type: boolean
- Required: no (default:
false)
/status and /health will require a valid API key.
You must set up Postgres to use authentication features.
The gateway will fail to start if authentication is enabled but TENSORZERO_POSTGRES_URL is not set.
API keys can be created and managed through the TensorZero UI or CLI.
base_path
- Type: string
- Required: no (default:
/)
base_path is set to /custom/prefix, the inference endpoint will become /custom/prefix/inference instead of /inference.
bind_address
- Type: string
- Required: no (default:
[::]:3000)
[::]:3000.
Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
You can also set the bind address using the --bind-address CLI flag or the TENSORZERO_GATEWAY_BIND_ADDRESS environment variable.
Only one of these methods can be used at a time.
See Customize the bind address and port for more details.
cache.valkey.ttl_s
- Type: integer
- Required: no (default:
86400= 24 hours)
TENSORZERO_VALKEY_URL is set); otherwise, the inference cache uses ClickHouse.
debug
- Type: boolean
- Required: no (default:
false)
true, the gateway will log more verbose errors to assist with debugging.
disable_pseudonymous_usage_analytics
- Type: boolean
- Required: no (default:
false)
true, TensorZero will not collect or share pseudonymous usage analytics.
export.otlp.traces.enabled
- Type: boolean
- Required: no (default:
false)
export.otlp.traces.extra_headers
- Type: object (map of string to string)
- Required: no (default:
{})
export.otlp.traces.format
- Type: either “opentelemetry” or “openinference”
- Required: no (default:
"opentelemetry")
"opentelemetry", TensorZero will set gen_ai attributes based on the OpenTelemetry GenAI semantic conventions.
If set to "openinference", TensorZero will set attributes based on the OpenInference semantic conventions.
fetch_and_encode_input_files_before_inference
- Type: boolean
- Required: no (default:
false)
true, the gateway will fetch remote input files and send them as a base64-encoded payload in the prompt.
This is recommended to ensure that TensorZero and the model providers see identical inputs, which is important for observability and reproducibility.
If set to false, TensorZero will forward the input file URLs directly to the model provider (when supported) and fetch them for observability in parallel with inference.
This can be more efficient, but may result in different content being observed if the URL content changes between when the provider fetches it and when TensorZero fetches it for observability.
global_outbound_http_timeout_ms
- Type: integer
- Required: no (default:
900000= 15 minutes)
global_outbound_http_timeout_ms acts as an upper bound for all more specific timeout configurations in your system.
Any variant-level timeouts (e.g., timeouts.non_streaming.total_ms, timeouts.streaming.ttft_ms, timeouts.streaming.total_ms), provider-level timeouts, or embedding model timeouts must be less than or equal to this global timeout.
metrics.tensorzero_inference_latency_overhead_seconds_buckets
- Type: array of floats
- Required: no (default:
[0.001, 0.01, 0.1])
tensorzero_inference_latency_overhead_seconds Prometheus metric.
This metric tracks the latency overhead introduced by TensorZero on HTTP requests.
The buckets must be in strictly ascending order and contain at least one value.
observability.async_writes
- Type: boolean
- Required: no (default:
false)
async_writes and batch_writes at the same time.
observability.batch_writes
- Type: object
- Required: no (default: disabled)
batch_writes, multiple records are collected and written together in batches to improve efficiency.
The batch_writes object supports the following fields:
enabled(boolean): Must be set totrueto enable batch writesflush_interval_ms(integer, optional): Maximum time in milliseconds to wait before flushing a batch (default:100)max_rows(integer, optional): Maximum number of rows to collect before flushing a batch (default:1000)max_rows_postgres(integer, optional): Maximum number of rows to collect before flushing a Postgres batch (default:max_rows)
async_writes and batch_writes at the same time.
observability.enabled
- Type: boolean
- Required: no (default:
null)
true, the gateway will throw an error on startup if it fails to validate the ClickHouse connection.
If null, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available.
If false, the gateway will not use ClickHouse.
observability.disable_automatic_migrations
- Type: boolean
- Required: no (default
false)
true, then the migrations are not applied upon launch and must instead be applied manually
by running docker run --rm -e TENSORZERO_CLICKHOUSE_URL=$TENSORZERO_CLICKHOUSE_URL tensorzero/gateway:{version} --run-clickhouse-migrations or docker compose run --rm gateway --run-clickhouse-migrations.
If false, then the migrations are run automatically upon launch.
relay
Configure gateway relay to forward inference requests through another TensorZero Gateway.
See Centralize auth, rate limits, and more for a complete guide.
api_key_location
- Type: string or object
- Required: no
default and fallback fields for credential fallback support.
The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none.
See the API reference and Credential Management for more details.
gateway_url
- Type: string (URL)
- Required: no
template_filesystem_access.base_path
- Type: string
- Required: no (default disabled)
template_filesystem_access.base_path to allow MiniJinja templates to load sub-templates using the {% include %} and {% import %} directives.
The directives will be relative to base_path and can only access files within that directory or its subdirectories.
The base_path can be absolute or relative to the configuration file’s location.
[models.model_name]
The [models.model_name] section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name] sections.
A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).
If your model_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [models."llama-3.1-8b-instruct"].
routing
- Type: array of strings
- Required: yes
providers sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
skip_relay
- Type: boolean
- Required: no (default:
false)
true, this model will bypass the relay gateway and call its providers directly.
This is useful when you want certain models to skip centralized controls like rate limits or credential management.
timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests to this model.
You can define timeouts for non-streaming and streaming requests separately:
timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
namespace
- Type: string
- Required: no
[models.model_name.providers.provider_name]
The providers sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name] sections.
If your provider_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal as [models.model_name.providers."vllm.internal"].
batch_cost
- Type: array of objects
- Required: no
batch_cost to configure the batch-specific rates.
If batch_cost is not configured, the gateway falls back to the regular cost configuration.
Since batch inference responses are always non-streaming, batch_cost only supports a single pointer (not the pointer_nonstreaming/pointer_streaming split available in cost).
Each entry has the following fields:
pointer(string) — a JSON Pointer into the provider’s batch response.- Rate (exactly one of the following):
cost_per_million(number) — cost in dollars per million units of the extracted value.cost_per_unit(number) — cost in dollars per unit of the extracted value.
required(boolean, default:false) — iftrueand the field is missing from the response, the cost for the entire request is reported asnull. Iffalse, missing fields contribute $0 to the total.
extracted_value * rate across all entries.
cost
- Type: array of objects
- Required: no
- Pointer (exactly one of the following):
pointer— a single JSON Pointer used for both streaming and non-streaming responses.pointer_nonstreamingandpointer_streaming— separate JSON Pointers for non-streaming and streaming responses respectively. Both must be provided together.
- Rate (exactly one of the following):
cost_per_million(number) — cost in dollars per million units of the extracted value. This is the most common option for token-based pricing.cost_per_unit(number) — cost in dollars per unit of the extracted value. This is useful for features billed per unit (e.g. web search calls), or when the provider response contains a cost value directly (set to1.0).
required(boolean, default:false) — iftrueand the field is missing from the response, the cost for the entire request is reported asnull. Iffalse, missing fields contribute $0 to the total.
extracted_value * rate across all entries.
Rates can be negative, which is useful when a provider double-counts cached tokens in its base token field (e.g. OpenAI includes cached tokens in prompt_tokens).
In that case, use a negative rate to subtract the discount.
Other providers (e.g. Anthropic) report cached tokens separately, so you sum them at their discounted rate instead.
See Track usage and cost for more information.
Example: Token-based pricing (e.g. OpenAI)
Example: Token-based pricing (e.g. OpenAI)
cost_per_million.
Include optional fields for usage that may vary (e.g. cached input tokens).
OpenAI includes cached tokens in prompt_tokens, so use a negative rate to subtract the discount.Example: Split pointers for streaming vs. non-streaming
Example: Split pointers for streaming vs. non-streaming
pointer_nonstreaming and pointer_streaming to handle this:Example: Per-unit pricing (e.g. web search)
Example: Per-unit pricing (e.g. web search)
cost_per_unit to set the price per occurrence.
This example configures Anthropic’s web search tool at 0.01 per search):discard_unknown_chunks
- Type: boolean
- Required: no (default:
false)
true, the gateway will silently discard streaming chunks with unknown or unsupported types instead of forwarding them.
A warning is emitted for each discarded chunk.
By default (false), unknown chunks are forwarded in the stream as-is.
This is useful when a model provider introduces new chunk types that TensorZero doesn’t yet support, and you’d prefer to drop them rather than receive unrecognized data.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer: A JSON Pointer string specifying where to modify the request body- Use
-as the final path element to append to an array (e.g.,/messages/-appends tomessages)
- Use
- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
Example: `extra_headers`
Example: `extra_headers`
extra_headers…timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for individual requests to a model provider.
You can define timeouts for non-streaming and streaming requests separately:
timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
timeout field (or simply killing the request if you’re using a different client).
type
- Type: string
- Required: yes
anthropic, aws_bedrock, aws_sagemaker, azure, deepseek, fireworks, gcp_vertex_anthropic, gcp_vertex_gemini, google_ai_studio_gemini, groq, hyperbolic, mistral, openai, openrouter, sglang, tgi, together, vllm, and xai.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
type: "anthropic"
model_name
- Type: string
- Required: yes
api_key_location
- Type: string or object
- Required: no (default:
env::ANTHROPIC_API_KEYunless set otherwise inprovider_type.anthropic.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none.See the API reference and Credential Management for more details.api_base
- Type: string
- Required: no (default:
https://api.anthropic.com/v1/messages)
https://example.com/v1/messages).provider_tools
- Type: array of objects
- Required: no (default:
[])
provider_tools parameter in the /inference endpoint or tensorzero::provider_tools in the OpenAI-compatible endpoint.
See the Inference API Reference for more details on dynamic usage.type: "aws_bedrock"
type: "aws_bedrock"
access_key_id
- Type: string
- Required: no
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
secret_access_key must also be provided.
Both fields must use the same source type (both env::, both dynamic::, or both sdk).You can’t combine this field with api_key.api_key
- Type: string
- Required: no
Authorization: Bearer <token> instead of AWS SigV4 signing.The supported locations are:env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfield
api_key nor IAM credentials are configured, TensorZero checks for AWS_BEARER_TOKEN_BEDROCK environment variable first.
If found, it uses bearer token authentication.
Otherwise, it falls back to the AWS SDK credential chain.You can’t combine this field with IAM credentials like access_key_id.endpoint_url
- Type: string
- Required: no
- Static URLs (e.g.,
"https://bedrock-runtime.us-east-1.amazonaws.com") env::VAR_NAME- read from environment variable at startuppath::/path/to/file- read from file at startupdynamic::key_name- resolve at request time fromcredentialsfieldnone- treat as unspecified
cn-north-1, cn-northwest-1) and AWS GovCloud regions use different DNS suffixes than standard AWS regions.
For these partitions, you must specify the full endpoint_url.
For example:model_id
- Type: string
- Required: yes
region
- Type: string
- Required: yes
- Static values (e.g.,
"us-east-1") env::VAR_NAME- read from environment variable at startuppath::/path/to/file- read from file at startupdynamic::key_name- resolve at inference time fromcredentialsfieldsdk- use AWS SDK auto-detection (may slow down initialization in non-AWS environments)
secret_access_key
- Type: string
- Required: no (required if
access_key_idis specified)
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
access_key_id must also be provided.
Both fields must use the same source type (both env::, both dynamic::, or both sdk).You can’t combine this field with api_key.session_token
- Type: string
- Required: no
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
access_key_id and secret_access_key must also be provided.
All three fields must use the same source type (all env::, all dynamic::, or all sdk).You can’t combine this field with api_key.type: "aws_sagemaker"
type: "aws_sagemaker"
access_key_id
- Type: string
- Required: no
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
secret_access_key must also be provided.
Both fields must use the same source type (both env::, both dynamic::, or both sdk).endpoint_name
- Type: string
- Required: yes
endpoint_url
- Type: string
- Required: no
- Static URLs (e.g.,
"https://runtime.sagemaker.us-east-1.amazonaws.com") env::VAR_NAME- read from environment variable at startuppath::/path/to/file- read from file at startupdynamic::key_name- resolve at request time fromcredentialsfieldnone- treat as unspecified
cn-north-1, cn-northwest-1) and AWS GovCloud regions use different DNS suffixes than standard AWS regions.
For these partitions, you must specify the full endpoint_url.
For example:hosted_provider
- Type: string
- Required: yes
aws_sagemaker provider is a wrapper on other providers.Currently, the only supported hosted_provider options are:openai(including any OpenAI-compatible server e.g. Ollama)tgi
model_name
- Type: string
- Required: yes
region
- Type: string
- Required: yes
- Static values (e.g.,
"us-east-1") env::VAR_NAME- read from environment variable at startuppath::/path/to/file- read from file at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK auto-detection (may slow down initialization in non-AWS environments)
secret_access_key
- Type: string
- Required: no (required if
access_key_idis specified)
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
access_key_id must also be provided.
Both fields must use the same source type (both env::, both dynamic::, or both sdk).session_token
- Type: string
- Required: no
env::VAR_NAME- read from environment variable at startupdynamic::key_name- resolve at request time fromcredentialsfieldsdk- use AWS SDK default credential chain
access_key_id and secret_access_key must also be provided.
All three fields must use the same source type (all env::, all dynamic::, or all sdk).type: "azure"
type: "azure"
2025-04-01-preview).
You only need to set the deployment_id and endpoint fields.api_key_location
- Type: string or object
- Required: no (default:
env::AZURE_API_KEYunless set otherwise inprovider_type.azure.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none.See the API reference and Credential Management for more details.deployment_id
- Type: string
- Required: yes
endpoint
- Type: string
- Required: yes
env::, the succeeding value will be treated as an environment variable name and the gateway will attempt to retrieve the value from the environment on startup.
If the endpoint starts with dynamic::, the succeeding value will be treated as an dynamic credential name and the gateway will attempt to retrieve the value from the dynamic_credentials field on each inference it is needed.type: "deepseek"
type: "deepseek"
api_key_location
- Type: string or object
- Required: no (default:
env::DEEPSEEK_API_KEYunless set otherwise inprovider_type.deepseek.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
deepseek-chat (DeepSeek-v3) and deepseek-reasoner (R1).type: "fireworks"
type: "fireworks"
api_key_location
- Type: string or object
- Required: no (default:
env::FIREWORKS_API_KEYunless set otherwise inprovider_type.fireworks.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "gcp_vertex_anthropic"
type: "gcp_vertex_anthropic"
credential_location
- Type: string or object
- Required: no (default:
path_from_env::GCP_VERTEX_CREDENTIALS_PATHunless otherwise set inprovider_type.gcp_vertex_anthropic.defaults.credential_location)
default and fallback fields for credential fallback support.The supported locations are env::PATH_TO_CREDENTIALS_FILE, path_from_env::ENVIRONMENT_VARIABLE, dynamic::CREDENTIALS_ARGUMENT_NAME, path::PATH_TO_CREDENTIALS_FILE, and sdk (use Google Cloud SDK to auto-discover credentials).See the API reference and Credential Management for more details.endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_idormodel_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_idorendpoint_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.project_id
- Type: string
- Required: yes
provider_tools
- Type: array of objects
- Required: no (default:
[])
provider_tools parameter in the /inference endpoint or tensorzero::provider_tools in the OpenAI-compatible endpoint.
See the Inference API Reference for more details on dynamic usage.type: "gcp_vertex_gemini"
type: "gcp_vertex_gemini"
credential_location
- Type: string or object
- Required: no (default:
path_from_env::GCP_VERTEX_CREDENTIALS_PATHunless otherwise set inprovider_type.gcp_vertex_gemini.defaults.credential_location)
default and fallback fields for credential fallback support.The supported locations are env::PATH_TO_CREDENTIALS_FILE, path_from_env::ENVIRONMENT_VARIABLE, dynamic::CREDENTIALS_ARGUMENT_NAME, path::PATH_TO_CREDENTIALS_FILE, and sdk (use Google Cloud SDK to auto-discover credentials).See the API reference and Credential Management for more details.endpoint_id
- Type: string
- Required: no (exactly one of
endpoint_idormodel_idmust be set)
model_id for off-the-shelf models and endpoint_id for fine-tuned models and custom endpoints.location
- Type: string
- Required: yes
model_id
- Type: string
- Required: no (exactly one of
model_idorendpoint_idmust be set)
project_id
- Type: string
- Required: yes
type: "google_ai_studio_gemini"
type: "google_ai_studio_gemini"
api_key_location
- Type: string or object
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEYunless otherwise set inprovider_type.google_ai_studio.defaults.credential_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "groq"
type: "groq"
api_key_location
- Type: string or object
- Required: no (default:
env::GROQ_API_KEYunless otherwise set inprovider_type.groq.defaults.credential_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
reasoning_format
- Type: string
- Required: no (default: none)
"parsed", Groq will return structured reasoning content (thinking) separately from the main response.
This should only be set for reasoning models that support the reasoning_format parameter (e.g. qwen/qwen3-32b).
Non-reasoning models will reject requests with this parameter.type: "hyperbolic"
type: "hyperbolic"
api_key_location
- Type: string or object
- Required: no (default:
env::HYPERBOLIC_API_KEYunless otherwise set inprovider_type.hyperbolic.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "mistral"
type: "mistral"
api_key_location
- Type: string or object
- Required: no (default:
env::MISTRAL_API_KEYunless otherwise set inprovider_type.mistral.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
prompt_mode
- Type: string
- Required: no
"reasoning" to enable reasoning output with models like magistral-small-latest and magistral-medium-latest.type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/)
api_base field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.api_key_location
- Type: string or object
- Required: no (default:
env::OPENAI_API_KEYunless otherwise set inprovider_types.openai.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference and Credential Management for more details).api_type
- Type: string
- Required: no (default:
chat_completions)
chat_completions for the standard Chat Completions API.
Set to responses to use the Responses API, which provides access to built-in tools like web search and reasoning capabilities.include_encrypted_reasoning
- Type: boolean
- Required: no (default:
false)
api_type = "responses".model_name
- Type: string
- Required: yes
provider_tools
- Type: array of objects
- Required: no (default:
[])
web_search tool that enables the model to search the web for information.This field can be set statically in the configuration file or dynamically at inference time via the provider_tools parameter in the /inference endpoint or tensorzero::provider_tools in the OpenAI-compatible endpoint.
See the Inference API Reference for more details on dynamic usage.type: "openrouter"
type: "openrouter"
api_key_location
- Type: string or object
- Required: no (default:
env::OPENROUTER_API_KEYunless otherwise set inprovider_types.openrouter.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "sglang"
type: "sglang"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string or object
- Required: no (default:
none)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference and Credential Management for more details).type: "together"
type: "together"
api_key_location
- Type: string or object
- Required: no (default:
env::TOGETHER_API_KEYunless otherwise set inprovider_types.together.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "vllm"
type: "vllm"
api_base
- Type: string
- Required: yes (default:
http://localhost:8000/v1/)
model_name
- Type: string
- Required: yes
api_key_location
- Type: string or object
- Required: no (default:
env::VLLM_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference and Credential Management for more details).type: "xai"
type: "xai"
api_key_location
- Type: string or object
- Required: no (default:
env::XAI_API_KEYunless otherwise set inprovider_types.xai.defaults.api_key_location)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
type: "tgi"
type: "tgi"
api_base
- Type: string
- Required: yes
api_key_location
- Type: string or object
- Required: no (default:
none)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference and Credential Management for more details).[embedding_models.model_name]
The [embedding_models.model_name] section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name] sections.
A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below).
If your model_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1 as [embedding_models."embedding-0.1"].
routing
- Type: array of strings
- Required: yes
providers sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
timeout_ms
- Type: integer
- Required: no
[embedding_models.model_name.providers.provider_name]
The providers sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name] sections.
If your provider_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal as [embedding_models.model_name.providers."vllm.internal"].
cost
- Type: array of objects
- Required: no
pointer(string) — a JSON Pointer into the provider’s response.- Rate (exactly one of the following):
cost_per_million(number) — cost in dollars per million units of the extracted value. This is the most common option for token-based pricing.cost_per_unit(number) — cost in dollars per unit of the extracted value. This is useful for features billed per unit, or when the provider response contains a cost value directly (set to1.0).
required(boolean, default:false) — iftrueand the field is missing from the response, the cost for the entire request is reported asnull. Iffalse, missing fields contribute $0 to the total.
extracted_value * rate across all entries.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to the embedding model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
pointer: A JSON Pointer string specifying where to modify the request body- Use
-as the final path element to append to an array (e.g.,/messages/-appends tomessages)
- Use
- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to an embedding model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have two fields:
name(string): The name of the header to modify- One of the following:
value(string): The value of the headerdelete = true: Deletes the header from the request, if present
timeout_ms
- Type: integer
- Required: no
type
- Type: string
- Required: yes
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/)
api_base field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.api_key_location
- Type: string or object
- Required: no (default:
env::OPENAI_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE, dynamic::ARGUMENT_NAME, and none (see the API reference and Credential Management for more details).model_name
- Type: string
- Required: yes
[provider_types]
The provider_types section of the configuration allows users to specify global settings that are related to the handling of a particular inference provider type (like "openai" or "anthropic"), such as where to look by default for credentials.
[provider_types.anthropic]
[provider_types.anthropic]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::ANTHROPIC_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.azure]
[provider_types.azure]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::AZURE_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.deepseek]
[provider_types.deepseek]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::DEEPSEEK_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.fireworks]
[provider_types.fireworks]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::FIREWORKS_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).sft
- Type: object
- Required: no (default:
null)
sft object configures supervised fine-tuning for Fireworks models.account_id
- Type: string
- Required: yes
[provider_types.gcp_vertex_anthropic]
[provider_types.gcp_vertex_anthropic]
defaults.credential_location
- Type: string or object
- Required: no (default:
path_from_env::GCP_VERTEX_CREDENTIALS_PATH)
default and fallback fields for credential fallback support.The supported locations are env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME, path::PATH_TO_CREDENTIALS_FILE, and path_from_env::ENVIRONMENT_VARIABLE (see the API reference and Credential Management for more details).[provider_types.gcp_vertex_gemini]
[provider_types.gcp_vertex_gemini]
batch
- Type: object
- Required: no (default:
null)
batch object allows you to configure batch inference for GCP Vertex Gemini models using Google Cloud Storage.batch object supports the following configuration:storage_type
- Type: string
- Required: no (default
"none")
"cloud_storage" and "none" are supported.input_uri_prefix
- Type: string
- Required: yes when
storage_typeis"cloud_storage"
output_uri_prefix
- Type: string
- Required: yes when
storage_typeis"cloud_storage"
sft
- Type: object
- Required: no (default:
null)
sft object configures supervised fine-tuning for GCP Vertex Gemini models.bucket_name
- Type: string
- Required: yes
bucket_path_prefix
- Type: string
- Required: no
kms_key_name
- Type: string
- Required: no
project_id
- Type: string
- Required: yes
region
- Type: string
- Required: yes
"us-central1").service_account
- Type: string
- Required: no
defaults.credential_location
- Type: string or object
- Required: no (default:
path_from_env::GCP_VERTEX_CREDENTIALS_PATH)
default and fallback fields for credential fallback support.The supported locations are env::PATH_TO_CREDENTIALS_FILE, dynamic::CREDENTIALS_ARGUMENT_NAME, path::PATH_TO_CREDENTIALS_FILE, and path_from_env::ENVIRONMENT_VARIABLE (see the API reference and Credential Management for more details).[provider_types.google_ai_studio]
[provider_types.google_ai_studio]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::GOOGLE_AI_STUDIO_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.groq]
[provider_types.groq]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::GROQ_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.hyperbolic]
[provider_types.hyperbolic]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::HYPERBOLIC_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.mistral]
[provider_types.mistral]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::MISTRAL_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.openai]
[provider_types.openai]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::OPENAI_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.openrouter]
[provider_types.openrouter]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::OPENROUTER_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[provider_types.together]
[provider_types.together]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::TOGETHER_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).sft
- Type: object
- Required: no (default:
null)
sft object configures supervised fine-tuning for Together models.hf_api_token
- Type: string
- Required: no
wandb_api_key
- Type: string
- Required: no
wandb_base_url
- Type: string
- Required: no
wandb_project_name
- Type: string
- Required: no
[provider_types.xai]
[provider_types.xai]
defaults.api_key_location
- Type: string or object
- Required: no (default:
env::XAI_API_KEY)
default and fallback fields for credential fallback support.The supported locations are env::ENVIRONMENT_VARIABLE and dynamic::ARGUMENT_NAME (see the API reference and Credential Management for more details).[functions.function_name]
The [functions.function_name] section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name] sections.
A function can have multiple variants, and each variant is defined in the variants sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0 as [functions."summarize-2.0"].
assistant_schema
- Type: string (path)
- Required: no
description
- Type: string
- Required: no
system_schema
- Type: string (path)
- Required: no
type
- Type: string
- Required: yes
chat and json.
Most other fields in the function section depend on the function type.
type: "chat"
type: "chat"
parallel_tool_calls
- Type: boolean
- Required: no
tool_choice
- Type: string
- Required: no (default:
auto)
none: The function should not use any tools.auto: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }: The model should use a specific tool. The tool must be defined in thetoolsfield (see below).
tools
- Type: array of strings
- Required: no (default:
[])
[tools.tool_name] sections (see below).type: "json"
type: "json"
output_schema
- Type: string (path)
- Required: no (default:
{}, the empty JSON schema that accepts any valid JSON output)
user_schema
- Type: string (path)
- Required: no
[functions.function_name.variants.variant_name]
The variants sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name] sections.
If your variant_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [functions.function_name.variants."llama-3.1-8b-instruct"].
type
- Type: string
- Required: yes
| Type | Description |
|---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
experimental_mixture_of_n | Generates multiple response candidates with other variants, and combines the responses using a fuser model. |
type: "chat_completion"
type: "chat_completion"
assistant_template
- Type: string (path)
- Required: no
assistant_schema field.extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:pointer: A JSON Pointer string specifying where to modify the request body- Use
-as the final path element to append to an array (e.g.,/messages/-appends tomessages)
- Use
- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
Example: `extra_headers`
Example: `extra_headers`
extra_headers…frequency_penalty
- Type: float
- Required: no (default:
null)
json_mode
- Type: string
- Required: yes for
jsonfunctions, forbidden forchatfunctions
off: Make a chat completion request without any special JSON handling (not recommended).on: Make a chat completion request with JSON mode (if supported by the provider).strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
max_tokens
- Type: integer
- Required: no (default:
null)
model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
model = "gpt-4o"calls thegpt-4omodel in your configuration, which supports fallback fromopenaitoazure. See Retries & Fallbacks for details.model = "openai::gpt-4o"calls the OpenAI API directly for thegpt-4omodel using the Chat Completions API, ignoring thegpt-4omodel defined above.model = "openai::responses::gpt-5-codex"calls the OpenAI Responses API directly for thegpt-5-codexmodel. See OpenAI Responses API for details.
presence_penalty
- Type: float
- Required: no (default:
null)
reasoning_effort
- Type: string
- Required: no (default:
null)
thinking.type: "adaptive") with the specified effort level via output_config.effort.
For Gemini, this value corresponds to generationConfig.thinkingConfig.thinkingLevel.retries
- Type: object with optional keys
num_retriesandmax_delay_s - Required: no (defaults to
num_retries = 0and amax_delay_s = 10)
num_retries parameter defines the number of retries (not including the initial request).
The max_delay_s parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null)
service_tier
- Type: string
- Required: no (default:
"auto")
auto: Let the provider automatically select the appropriate service tier (default).default: Use the provider’s standard service tier.priority: Use a higher-priority service tier with lower latency (may have higher costs).flex: Use a lower-priority service tier optimized for cost efficiency (may have higher latency).
stop_sequences
- Type: array of strings
- Required: no (default:
null)
system_template
- Type: string (path)
- Required: no
system_schema field.temperature
- Type: float
- Required: no (default:
null)
thinking_budget_tokens
- Type: integer
- Required: no (default:
null)
thinking.budget_tokens (manual thinking mode with thinking.type: "enabled").
For Gemini, this value corresponds to generationConfig.thinkingConfig.thinkingBudget.timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately:timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
top_p
- Type: float, between 0 and 1
- Required: no (default:
null)
top_p to use for the variant during nucleus sampling.
Typically at most one of top_p and temperature is set.verbosity
- Type: string
- Required: no (default:
null)
user_template
- Type: string (path)
- Required: no
user_schema field.type: "experimental_best_of_n"
type: "experimental_best_of_n"
candidates
- Type: list of strings
- Required: yes
candidates parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below.
The evaluator would then choose the best response from these three candidates.evaluator
- Type: object
- Required: yes
evaluator parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.The evaluator is configured similarly to a chat_completion variant for a JSON function, but without the type field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to an evaluator.The evaluator can optionally include a json_mode parameter (see the json_mode documentation under chat_completion variants). If not specified, it defaults to strict.timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately:timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
type: "experimental_mixture_of_n"
type: "experimental_mixture_of_n"
candidates
- Type: list of strings
- Required: yes
candidates parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA and promptB), you could set up the candidates list to generate two responses using promptA and one using promptB using the snippet below.
The fuser would then combine the three responses.fuser
- Type: object
- Required: yes for
jsonfunctions, forbidden forchatfunctions
fuser parameter specifies the configuration for the model that will evaluate and combine the elements.The fuser is configured similarly to a chat_completion variant, but without the type field.
The prompts here should be prompts that you would use to solve the original problem, as the gateway has special-purpose handling and templates to convert them to a fuser.timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately:timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
embedding_model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
embedding_model = "text-embedding-3-small"calls thetext-embedding-3-smallmodel in your configuration.embedding_model = "openai::text-embedding-3-small"calls the OpenAI API directly for thetext-embedding-3-smallmodel, ignoring thetext-embedding-3-smallmodel defined above.
extra_body
- Type: array of objects (see below)
- Required: no
extra_body field allows you to modify the request body that TensorZero sends to a variant’s model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.For experimental_dynamic_in_context_learning variants, extra_body only applies to the chat completion request.Each object in the array must have two fields:pointer: A JSON Pointer string specifying where to modify the request body- Use
-as the final path element to append to an array (e.g.,/messages/-appends tomessages)
- Use
- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
Example: `extra_body`
Example: `extra_body`
extra_body…extra_headers
- Type: array of objects (see below)
- Required: no
extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two fields:name(string): The name of the header to modify (e.g.anthropic-beta)- One of the following:
value(string): The value of the header (e.g.token-efficient-tools-2025-02-19)delete = true: Deletes the header from the request, if present
Example: `extra_headers`
Example: `extra_headers`
extra_headers…json_mode
- Type: string
- Required: yes for
jsonfunctions, forbidden forchatfunctions
off: Make a chat completion request without any special JSON handling (not recommended).on: Make a chat completion request with JSON mode (if supported by the provider).strict: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.tool: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
k
- Type: non-negative integer
- Required: yes
max_distance
- Type: non-negative float
- Required: no (default: none)
max_tokens
- Type: integer
- Required: no (default:
null)
model
- Type: string
- Required: yes
| To call… | Use this format… |
A model defined as [models.my_model] in your
tensorzero.toml
configuration file | model_name=“my_model” |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see
below) | model_name="{provider_type}::{model_name}" |
model = "gpt-4o"calls thegpt-4omodel in your configuration, which supports fallback fromopenaitoazure. See Retries & Fallbacks for details.model = "openai::gpt-4o"calls the OpenAI API directly for thegpt-4omodel using the Chat Completions API, ignoring thegpt-4omodel defined above.model = "openai::responses::gpt-5-codex"calls the OpenAI Responses API directly for thegpt-5-codexmodel. See OpenAI Responses API for details.
retries
- Type: object with optional keys
num_retriesandmax_delay_s - Required: no (defaults to
num_retries = 0and amax_delay_s = 10)
num_retries parameter defines the number of retries (not including the initial request).
The max_delay_s parameter defines the maximum delay between retries.seed
- Type: integer
- Required: no (default:
null)
system_instructions
- Type: string (path)
- Required: no
system_template, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.temperature
- Type: float
- Required: no (default:
null)
timeouts
- Type: object
- Required: no
timeouts object allows you to set granular timeouts for requests using this variant.You can define timeouts for non-streaming and streaming requests separately:timeouts.non_streaming.total_ms— the total time allowed for a non-streaming request.timeouts.streaming.ttft_ms— the time allowed to receive the first token (TTFT) in a streaming request.timeouts.streaming.total_ms— the total time allowed for the entire streaming request (measured from request start).
[functions.function_name.experimentation]
This section configures experimentation (A/B testing) over a set of variants in a function.
At inference time, the gateway will sample a variant from the function to complete the request.
By default, the gateway will sample a variant uniformly at random.
TensorZero supports multiple types of experiments that can help you learn about the relative performance of the variants.
type
- Type: string
- Required: yes
| Type | Description |
|---|---|
static | Samples variants according to fixed probabilities. Supports both uniform sampling (from a list of variant names) and weighted sampling (from a map of variant names to weights). |
adaptive | Samples variants according to probabilities that dynamically update based on accumulating feedback data. Designed to maximize experiment efficiency by minimizing the number of inferences needed to identify the best variant. |
type: "static"
type: "static"
static type samples variants according to fixed probabilities.
This is the default behavior when no [functions.function_name.experimentation] section is specified.By default, all variants defined in the function are sampled with equal probability.
You can specify candidate_variants as either a list of variant names (uniform sampling) or a map of variant names to weights (weighted sampling), and fallback_variants for sequential fallback behavior.candidate_variants
- Type: array of strings or map of strings to floats
- Required: yes
[functions.function_name.variants.variant_name] in the variants sub-section.When specified as a list of variant names, each variant is sampled with equal probability (uniform sampling).{"variant-a" = 5.0, "variant-b" = 1.0} result in sampling probabilities of 5/6 and 1/6 respectively.fallback_variants
- Type: array of strings
- Required: no
[functions.function_name.variants.variant_name] in the variants sub-section.If all candidate variants fail during inference, the gateway will select variants sequentially from fallback_variants (in order, not uniformly).
This behaves like a ranked list where the first active fallback variant is always selected.Examples
Default uniform sampling (all variants):type: "adaptive"
type: "adaptive"
adaptive type samples variants according to probabilities that dynamically update based on accumulating feedback data.
It is designed to efficiently identify the best variant with a specified level of confidence.candidate_variants
- Type: array of strings
- Required: yes
[functions.function_name.variants.variant_name] in the variants sub-section (see above).
Variants that are not included in candidate_variants will not be sampled.delta
- Type: float
- Required: no (default: 0.05)
delta must be a probability in the (0, 1) range.In simple terms, delta is the probability that the algorithm will incorrectly identify a variant as the winner.
A commonly used value in experimentation settings is 0.05, which caps the probability that an epsilon-best variant is not chosen as the winner at 5%.The algorithm aims to identify a “winner” variant that has the best average value for the chosen metric, or nearly the best (where “best” means highest if optimize = "max" or lowest if optimize = "min" for the chosen metric, and “nearly” is determined by a tolerance epsilon, defined below).
Once this variant is identified, random sampling ceases and the winner variant is used exclusively going forward.
The value delta instantiates a trade-off between the speed of identification and the confidence in the identified variant.
The smaller the value of delta, the higher the chance that the algorithm will correctly identify an epsilon-best variant, and the more data required to do so.epsilon
- Type: float
- Required: no (default: 0.0)
epsilon allow the algorithm to label a winner more quickly.
As an example, consider an experiment over three function variants with underlying (unknown) mean metric values of [0.6, 0.8, 0.85] for a metric with optimize = "max".
If delta = 0.05 and epsilon = 0.05, then the algorithm will label either the second or third variant as the winner with probability at least 1 - delta = 95%.
If delta = 0.05 and epsilon = 0, then the experiment will run longer and the algorithm will label the third variant as the winner with probability at least 95%.
If delta = 0.01 and epsilon = 0, then the experiment will run for even longer, and the algorithm will label the third variant as the winner with probability at least 99%.It is always possible to set epsilon = 0 to insist on identifying the strictly best variant with high probability.
Reasonable nonzero values of epsilon depend on the scale of the chosen metric.fallback_variants
- Type: array of string
- Required: no
[functions.function_name.variants.variant_name] in the variants sub-section (see above).
If inference fails with all of the candidate_variants, then variants will be sampled uniformly at random from fallback_variants.Feedback for fallback variants will not be used in the experiment itself; the sampling probabilities will be dynamically updated based only on feedback for the candidate_variants.metric
- Type: string
- Required: yes
[metrics] section.
The adaptive algorithm can handle both inference-level and episode-level metrics.
Plots based on the chosen metric are displayed in the Experimentation section of the Functions tab in the TensorZero UI.min_prob
- Type: float
- Required: no (default:
0)
min_prob times the number of candidate_variants must not exceed 1.0, since the minimum probabilities for all candidate variants must sum to at most 1.0.The aim of the adaptive algorithm is to identify an epsilon-best variant, without necessarily differentiating sub-optimal variants, so the primary use for this field is to enable the user to ensure that sufficient data is gathered to learn about the performance of sub-optimal variants.
Note that this field has no effect once the algorithm picks a winner variant, since at that point random sampling ceases and the winner variant is used exclusively.min_samples_per_variant
- Type: integer
- Required: no (default: 10)
candidate_variants will proceed round-robin (deterministically) until each variant has at least min_samples_per_variant feedback data points, at which point random sampling will begin.
It is strongly recommended to set this value to at least 10 so that the feedback sample statistics can stabilize before they are used to guide the sampling probabilities.update_period_s
- Type: integer
- Required: no (default: 300)
update_period_s) relative to the feedback throughput enable the algorithm to more quickly guide the sampling probabilities toward their theoretical optimum, which allows it to more quickly label the “winner” variant.
For example, updating the sampling probabilities every ~100 inferences should lead to faster convergence than updating them every ~500 inferences.Namespace-Specific Experimentation
You can override the base experimentation config for specific namespaces using thenamespaces sub-section.
Each namespace key maps to an experimentation config that follows the same schema as the base config (i.e. type, candidate_variants, etc.).
When a request provides a namespace, the gateway uses the matching namespace config if one exists.
If no matching namespace config is found, the base experimentation config is used as a fallback.
[metrics]
The [metrics] section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name] sections.
The metric name can’t be comment or demonstration, as those names are reserved for internal use.
If your metric_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-4.1 as [metrics."beats-gpt-4.1"].
description
- Type: string
- Required: no
level
- Type: string
- Required: yes
inference and episode.
optimize
- Type: string
- Required: yes
max and min.
type
- Type: string
- Required: yes
boolean and float.
[tools.tool_name]
The [tools.tool_name] section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name] sections.
If your tool_name is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10 as [tools."run-python-3.10"].
You can enable a tool for a function by adding it to the function’s tools field.
description
- Type: string
- Required: yes
parameters
- Type: string (path)
- Required: yes
strict
- Type: boolean
- Required: no (default:
false)
true, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation.
For example, the TensorZero Gateway uses Structured Outputs for OpenAI.
If the provider does not support strict mode, the TensorZero Gateway ignores this field.
name
- Type: string
- Required: no (defaults to the tool ID)
[tools.my_tool] but don’t specify the name, the name will be my_tool.
This field allows you to specify a different name to be sent.
This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions).
At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.
[object_storage]
The [object_storage] section defines the behavior of object storage, which is used for storing images used during multimodal inference.
type
- Type: string
- Required: yes
s3_compatible: Use an S3-compatible object storage service.filesystem: Store images in a local directory.disabled: Disable object storage.
type: "s3_compatible"
type: "s3_compatible"
type = "s3_compatible", TensorZero will use an S3-compatible object storage service to store and retrieve images.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:S3_ACCESS_KEY_IDandS3_SECRET_ACCESS_KEYenvironment variablesAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables- Credentials from the AWS SDK (default profile)
type = "s3_compatible", the following fields are available.endpoint
- Type: string
- Required: no (defaults to AWS S3)
bucket_name
- Type: string
- Required: no
endpoint field.region
- Type: string
- Required: no
allow_http
- Type: boolean
- Required: no (defaults to
false)
true, the TensorZero Gateway will instead use HTTP to access the object storage service.
This is useful for local development (e.g. a local MinIO deployment), but not recommended for production environments.type: "filesystem"
type: "filesystem"
path
- Type: string
- Required: yes
type: "disabled"
type: "disabled"
type = "disabled", the TensorZero Gateway will not store or retrieve images.
There are no additional fields available for this type.[postgres]
The [postgres] section defines the configuration for Postgres connectivity.
Postgres is required for certain TensorZero features including adaptive experimentation and authentication.
Postgres can also be used for rate limiting, though Valkey is recommended for high-throughput deployments.
You can connect to Postgres by setting the TENSORZERO_POSTGRES_URL environment variable.
connection_pool_size
- Type: integer
- Required: no (default:
20)
[rate_limiting]
The [rate_limiting] section allows you to configure granular rate limits for your TensorZero Gateway.
Rate limits help you control usage, manage costs, and prevent abuse.
See Enforce Custom Rate Limits for a comprehensive guide on rate limiting.
default_cost
- Type: number (dollars)
- Required: no (default:
1.00)
enabled
- Type: boolean
- Required: no (default:
true)
true and there are rate limiting rules, we validate that a valid rate limiting backend (Postgres or Valkey) is available, and fail gateway startup otherwise.
When false, rate limiting rules will not be enforced even if they are defined.
[[rate_limiting.rules]]
Rate limiting rules are defined as an array of rule configurations.
Each rule specifies rate limits for specific resources (model inferences, tokens, cost), time windows, scopes, and priorities.
Rate Limit Fields
You can set rate limits for different resources and time windows using the following field formats:model_inferences_per_secondmodel_inferences_per_minutemodel_inferences_per_hourmodel_inferences_per_daymodel_inferences_per_weekmodel_inferences_per_monthtokens_per_secondtokens_per_minutetokens_per_hourtokens_per_daytokens_per_weektokens_per_monthcost_per_secondcost_per_minutecost_per_hourcost_per_daycost_per_weekcost_per_month
model_inferences and tokens, this is an integer. For cost, this is a number in dollars.
capacity and refill_rate fields for fine-grained control over the token bucket algorithm.
capacity and refill_rate to the same value.
The bucket format allows you to configure burst capacity independently from the sustained rate.priority
- Type: integer
- Required: yes (unless
alwaysis set totrue)
always
- Type: boolean
- Required: no (mutually exclusive with
priority)
true, this rule will always be applied regardless of priority.
This is useful for global fallback limits.
You cannot specify both always and priority in the same rule.
scope
- Type: array of scope objects
- Required: no (default:
[])
-
Tags:
tag_key(string): The tag key to match against.tag_value(string): The tag value to match against. This can be:tensorzero::each: Apply the limit separately to each unique value of the tag.tensorzero::total: Apply the limit to the aggregate of all requests with this tag, regardless of the tag’s value.- Any other string: Apply the limit only when the tag has this specific value.
-
API Key Public ID (requires authentication to be enabled):
api_key_public_id(string): The API key public ID to match against. This can be:tensorzero::each: Apply the limit separately to each API key.- A specific 12-character public ID: Apply the limit only to requests authenticated with this API key.