The configuration file is the backbone of TensorZero. It defines the behavior of the gateway, including the models and their providers, functions and their variants, tools, metrics, and more. Developers express the behavior of LLM calls by defining the relevant prompt templates, schemas, and other parameters in this configuration file. You can see an example configuration file here. The configuration file is a TOML file with a few major sections (TOML tables): gateway, clickhouse, models, model_providers, functions, variants, tools, and metrics.

[gateway]

The [gateway] section defines the behavior of the TensorZero Gateway.

base_path

  • Type: string
  • Required: no (default: /)
If set, the gateway will prefix its HTTP endpoints with this base path. For example, if base_path is set to /custom/prefix, the inference endpoint will become /custom/prefix/inference instead of /inference.

bind_address

  • Type: string
  • Required: no (default: [::]:3000)
Defines the socket address to bind the TensorZero Gateway to. You can bind the gateway to IPv4 and/or IPv6 addresses. To bind to an IPv6 address, you can set this field to a value like [::]:3000. Depending on the operating system, this value binds only to IPv6 (e.g. Windows) or to both (e.g. Linux by default).
// tensorzero.toml
[gateway]
# ...
bind_address = "0.0.0.0:3000"
# ...

debug

  • Type: boolean
  • Required: no (default: false)
Typically, TensorZero will not include inputs and outputs in logs or errors to avoid leaking sensitive data. It may be helpful during development to be able to see more information about requests and responses. When this field is set to true, the gateway will log more verbose errors to assist with debugging.

enable_template_filesystem_access

  • Type: boolean
  • Required: no (default: false)
Enabling this setting will allow MiniJinja templates to load sub-templates from the file system (using the include directive). Paths must be relative to tensorzero.toml, and can only access files in that directory or its sub-directories.
Make sure to sanitize all user-provided template data before using this setting. Otherwise, a malicious input could read unintended files in the file system.

export.otlp.traces.enabled

  • Type: boolean
  • Required: no (default: false)
Enable exporting traces to an external OpenTelemetry-compatible observability system.
Note that you will still need to set the OTEL_EXPORTER_OTLP_TRACES_ENDPOINT environment variable. See above linked guide for details.

observability.async_writes

  • Type: boolean
  • Required: no (default: true)
Enabling this setting will improve the latency of the gateway by offloading the responsibility of writing inference responses to ClickHouse to a background task, instead of waiting for ClickHouse to return the inference response.
If you enable this setting, make sure that the gateway lives long enough to complete the writes. This can be problematic in serverless environments that terminate the gateway instance after the response is returned but before the writes are completed.

observability.enabled

  • Type: boolean
  • Required: no (default: null)
Enable the observability features of the TensorZero Gateway. If true, the gateway will throw an error on startup if it fails to validate the ClickHouse connection. If null, the gateway will log a warning but continue if ClickHouse is not available, and it will use ClickHouse if available. If false, the gateway will not use ClickHouse.
// tensorzero.toml
[gateway]
# ...
observability.enabled = true
# ...

[models.model_name]

The [models.model_name] section defines the behavior of a model. You can define multiple models by including multiple [models.model_name] sections. A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below). If your model_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [models."llama-3.1-8b-instruct"].
// tensorzero.toml
[models.claude-3-haiku-20240307]
# fieldA = ...
# fieldB = ...
# ...

[models."llama-3.1-8b-instruct"]
# fieldA = ...
# fieldB = ...
# ...

routing

  • Type: array of strings
  • Required: yes
A list of provider names to route requests to. The providers must be defined in the providers sub-section (see below). The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
// tensorzero.toml
[models.gpt-4o]
# ...
routing = ["openai", "azure"]
# ...

[models.gpt-4o.providers.openai]
# ...

[models.gpt-4o.providers.azure]
# ...

timeouts

  • Type: object
  • Required: no
The timeouts object allows you to set granular timeouts for requests to this model. You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT). For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name]
# ...
timeouts = { non_streaming.total_ms = 15000, streaming.ttft_ms = 3000 }
# ...
The specified timeouts apply to the scope of an entire model inference request, including all retries and fallbacks across its providers. You can also set timeouts at the variant level and provider level. Multiple timeouts can be active simultaneously.

[models.model_name.providers.provider_name]

The providers sub-section defines the behavior of a specific provider for a model. You can define multiple providers by including multiple [models.model_name.providers.provider_name] sections. If your provider_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define vllm.internal as [models.model_name.providers."vllm.internal"].
// tensorzero.toml
[models.gpt-4o]
# ...
routing = ["openai", "azure"]
# ...

[models.gpt-4o.providers.openai]
# ...

[models.gpt-4o.providers.azure]
# ...

extra_body

  • Type: array of objects (see below)
  • Required: no
The extra_body field allows you to modify the request body that TensorZero sends to a model provider. This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet. Each object in the array must have two fields:
  • pointer: A JSON Pointer string specifying where to modify the request body
  • One of the following:
    • value: The value to insert at that location; it can be of any type including nested types
    • delete = true: Deletes the field at the specified location, if present.
You can also set extra_body for a variant entry. The model provider extra_body entries take priority over variant extra_body entries.Additionally, you can set extra_body at inference-time. The values provided at inference-time take priority over the values in the configuration file.

extra_headers

  • Type: array of objects (see below)
  • Required: no
The extra_headers field allows you to set or overwrite the request headers that TensorZero sends to a model provider. This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet. Each object in the array must have two fields:
  • name (string): The name of the header to modify (e.g. anthropic-beta)
  • One of the following:
    • value (string): The value of the header (e.g. token-efficient-tools-2025-02-19)
    • delete = true: Deletes the header from the request, if present
You can also set extra_headers for a variant entry. The model provider extra_headers entries take priority over variant extra_headers entries.

timeouts

  • Type: object
  • Required: no
The timeouts object allows you to set granular timeouts for individual requests to a model provider. You can define timeouts for non-streaming and streaming requests separately: timeouts.non_streaming.total_ms corresponds to the total request duration and timeouts.streaming.ttft_ms corresponds to the time to first token (TTFT). For example, the following configuration sets a 15-second timeout for non-streaming requests and a 3-second timeout for streaming requests (TTFT):
[models.model_name.providers.provider_name]
# ...
timeouts = { non_streaming.total_ms = 15000, streaming.ttft_ms = 3000 }
# ...
This setting applies to individual requests to the model provider. If you’re using an advanced variant type that performs multiple requests, the timeout will apply to each request separately. If you’ve defined retries and fallbacks, the timeout will apply to each retry and fallback separately. This setting is particularly useful if you’d like to retry or fallback on a request that’s taking too long. You can also set timeouts at the model level and provider level. Multiple timeouts can be active simultaneously. Separately, you can set a global timeout for the entire inference request using the TensorZero client’s timeout field (or simply killing the request if you’re using a different client).

type

  • Type: string
  • Required: yes
Defines the types of the provider. See Integrations » Model Providers for details. The supported provider types are anthropic, aws_bedrock, aws_sagemaker, azure, deepseek, fireworks, gcp_vertex_anthropic, gcp_vertex_gemini, google_ai_studio_gemini, groq, hyperbolic, mistral, openai, openrouter, sglang, tgi, together, vllm, and xai. The other fields in the provider sub-section depend on the provider type.
// tensorzero.toml
[models.gpt-4o.providers.azure]
# ...
type = "azure"
# ...

[embedding_models.model_name]

The [embedding_models.model_name] section defines the behavior of an embedding model. You can define multiple models by including multiple [embedding_models.model_name] sections. A model is provider agnostic, and the relevant providers are defined in the providers sub-section (see below). If your model_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define embedding-0.1 as [embedding_models."embedding-0.1"].
// tensorzero.toml
[embedding_models.openai-text-embedding-3-small]
# fieldA = ...
# fieldB = ...
# ...

[embedding_models."t0-text-embedding-3.5-massive"]
# fieldA = ...
# fieldB = ...
# ...

routing

  • Type: array of strings
  • Required: yes
A list of provider names to route requests to. The providers must be defined in the providers sub-section (see below). The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
// tensorzero.toml
[embedding_models.model-name]
# ...
routing = ["openai", "alternative-provider"]
# ...

[embedding_models.model-name.providers.openai]
# ...

[embedding_models.model-name.providers.alternative-provider]
# ...

[embedding_models.model_name.providers.provider_name]

The providers sub-section defines the behavior of a specific provider for a model. You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name] sections. If your provider_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define vllm.internal as [embedding_models.model_name.providers."vllm.internal"].
// tensorzero.toml
[embedding_models.model-name]
# ...
routing = ["openai", "alternative-provider"]
# ...

[embedding_models.model-name.providers.openai]
# ...

[embedding_models.model-name.providers.alternative-provider]
# ...

type

  • Type: string
  • Required: yes
Defines the types of the provider. See Integrations » Model Providers for details. TensorZero currently only supports openai as a provider for embedding models. More integrations are on the way. The other fields in the provider sub-section depend on the provider type.
// tensorzero.toml
[embedding_models.model-name.providers.openai]
# ...
type = "openai"
# ...

[functions.function_name]

The [functions.function_name] section defines the behavior of a function. You can define multiple functions by including multiple [functions.function_name] sections. A function can have multiple variants, and each variant is defined in the variants sub-section (see below). A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models). If your function_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define summarize-2.0 as [functions."summarize-2.0"].
// tensorzero.toml
[functions.draft-email]
# fieldA = ...
# fieldB = ...
# ...

[functions.summarize-email]
# fieldA = ...
# fieldB = ...
# ...

assistant_schema

  • Type: string (path)
  • Required: no
Defines the path to the assistant schema file. The path is relative to the configuration file. If provided, the assistant schema file should contain a JSON Schema for the assistant messages. The variables in the schema are used for templating the assistant messages. If a schema is provided, all function variants must also provide an assistant template (see below).
// tensorzero.toml
[functions.draft-email]
# ...
assistant_schema = "./functions/draft-email/assistant_schema.json"
# ...

[functions.draft-email.variants.prompt-v1]
# ...
assistant_template = "./functions/draft-email/prompt-v1/assistant_template.minijinja"
# ...

description

  • Type: string
  • Required: no
Defines a description of the function. In the future, this description will inform automated optimization recipes.
// tensorzero.toml
[functions.extract_data]
# ...
description = "Extract the sender's name (e.g. 'John Doe'), email address (e.g. '[email protected]'), and phone number (e.g. '+1234567890') from a customer's email."
# ...

system_schema

  • Type: string (path)
  • Required: no
Defines the path to the system schema file. The path is relative to the configuration file. If provided, the system schema file should contain a JSON Schema for the system message. The variables in the schema are used for templating the system message. If a schema is provided, all function variants must also provide a system template (see below).
// tensorzero.toml
[functions.draft-email]
# ...
system_schema = "./functions/draft-email/system_schema.json"
# ...

[functions.draft-email.variants.prompt-v1]
# ...
system_template = "./functions/draft-email/prompt-v1/system_template.minijinja"
# ...

type

  • Type: string
  • Required: yes
Defines the type of the function. The supported function types are chat and json. Most other fields in the function section depend on the function type.
// tensorzero.toml
[functions.draft-email]
# ...
type = "chat"
# ...

user_schema

  • Type: string (path)
  • Required: no
Defines the path to the user schema file. The path is relative to the configuration file. If provided, the user schema file should contain a JSON Schema for the user messages. The variables in the schema are used for templating the user messages. If a schema is provided, all function variants must also provide a user template (see below).
// tensorzero.toml
[functions.draft-email]
# ...
user_schema = "./functions/draft-email/user_schema.json"
# ...

[functions.draft-email.variants.prompt-v1]
# ...
user_template = "./functions/draft-email/prompt-v1/user_template.minijinja"
# ...

[functions.function_name.variants.variant_name]

The variants sub-section defines the behavior of a specific variant of a function. You can define multiple variants by including multiple [functions.function_name.variants.variant_name] sections. If your variant_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct as [functions.function_name.variants."llama-3.1-8b-instruct"].
// tensorzero.toml
[functions.draft-email]
# ...

[functions.draft-email.variants."llama-3.1-8b-instruct"]
# ...

[functions.draft-email.variants.claude-3-haiku]
# ...

type

  • Type: string
  • Required: yes
Defines the type of the variant. TensorZero currently supports the following variant types:
TypeDescription
chat_completionUses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs.
experimental_best_of_nGenerates multiple response candidates with other variants, and selects the best one using an evaluator model.
experimental_chain_of_thoughtEncourages the model to reason step by step using a chain-of-thought prompting strategy, which is particularly useful for tasks requiring logical reasoning or multi-step problem-solving. Only available for non-streaming requests to JSON functions.
experimental_dynamic_in_context_learningSelects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality.
experimental_mixture_of_nGenerates multiple response candidates with other variants, and combines the responses using a fuser model.
// tensorzero.toml
[functions.draft-email.variants.prompt-v1]
# ...
type = "chat_completion"
# ...

type: "experimental_chain_of_thought"

Besides the type parameter, this variant has the same configuration options as the chat_completion variant type. Please refer to that documentation to see what options are available.

[metrics]

The [metrics] section defines the behavior of a metric. You can define multiple metrics by including multiple [metrics.metric_name] sections. The metric name can’t be comment or demonstration, as those names are reserved for internal use. If your metric_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define beats-gpt-3.5 as [metrics."beats-gpt-3.5"].
// tensorzero.toml
[metrics.task-completed]
# fieldA = ...
# fieldB = ...
# ...

[metrics.user-rating]
# fieldA = ...
# fieldB = ...
# ...

level

  • Type: string
  • Required: yes
Defines whether the metric applies to individual inference or across entire episodes. The supported levels are inference and episode.
// tensorzero.toml
[metrics.valid-output]
# ...
level = "inference"
# ...

[metrics.task-completed]
# ...
level = "episode"
# ...

optimize

  • Type: string
  • Required: yes
Defines whether the metric should be maximized or minimized. The supported values are max and min.
// tensorzero.toml
[metrics.mistakes-made]
# ...
optimize = "min"
# ...

[metrics.user-rating]
# ...
optimize = "max"
# ...

type

  • Type: string
  • Required: yes
Defines the type of the metric. The supported metric types are boolean and float.
// tensorzero.toml
[metrics.user-rating]
# ...
type = "float"
# ...

[metrics.task-completed]
# ...
type = "boolean"
# ...

[tools.tool_name]

The [tools.tool_name] section defines the behavior of a tool. You can define multiple tools by including multiple [tools.tool_name] sections. If your tool_name is not a basic string, it can be escaped with quotation marks. For example, periods are not allowed in basic strings, so you can define run-python-3.10 as [tools."run-python-3.10"]. You can enable a tool for a function by adding it to the function’s tools field.
// tensorzero.toml
[functions.weather-chatbot]
# ...
type = "chat"
tools = [
  # ...
  "get-temperature"
  # ...
]
# ...

[tools.get-temperature]
# ...

description

  • Type: string
  • Required: yes
Defines the description of the tool provided to the model. You can typically materially improve the quality of responses by providing a detailed description of the tool.
// tensorzero.toml
[tools.get-temperature]
# ...
description = "Get the current temperature in a given location (e.g. \"Tokyo\") using the specified unit (must be \"celsius\" or \"fahrenheit\")."
# ...

parameters

  • Type: string (path)
  • Required: yes
Defines the path to the parameters file. The path is relative to the configuration file. This file should contain a JSON Schema for the parameters of the tool.
// tensorzero.toml
[tools.get-temperature]
# ...
parameters = "./tools/get-temperature.json"
# ...

strict

  • Type: boolean
  • Required: no (default: false)
If set to true, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters. This typically improves the quality of responses. Only a few providers support strict JSON generation. For example, the TensorZero Gateway uses Structured Outputs for OpenAI. If the provider does not support strict mode, the TensorZero Gateway ignores this field.
// tensorzero.toml
[tools.get-temperature]
# ...
strict = true
# ...

name

  • Type: string
  • Required: no (defaults to the tool ID)
Defines the tool name to be sent to model providers. By default, TensorZero will use the tool ID in the configuration as the tool name sent to model providers. For example, if you define a tool as [tools.my_tool] but don’t specify the name, the name will be my_tool. This field allows you to specify a different name to be sent. This field is particularly useful if you want to define multiple tools that share the same name (e.g. for different functions). At inference time, the gateway ensures that an inference request doesn’t have multiple tools with the same name.

[object_storage]

The [object_storage] section defines the behavior of object storage, which is used for storing images used during multimodal inference.

type

  • Type: string
  • Required: yes
Defines the type of object storage to use. The supported types are:
  • s3_compatible: Use an S3-compatible object storage service.
  • filesystem: Store images in a local directory.
  • disabled: Disable object storage.
See the following sections for more details on each type.