Configuration Reference
The configuration file is the backbone of TensorZero. It defines the behavior of the gateway, including the models and their providers, functions and their variants, tools, metrics, and more. Developers express the behavior of LLM calls by defining the relevant prompt templates, schemas, and other parameters in this configuration file.
You can see an example configuration file here.
The configuration file is a TOML file with a few major sections (TOML tables): gateway
, clickhouse
, models
, model_providers
, functions
, variants
, tools
, and metrics
.
[gateway]
The [gateway]
section defines the behavior of the TensorZero Gateway.
bind_address
- Type: string
- Required: no (default:
0.0.0.0:3000
)
Defines the socket address to bind the TensorZero Gateway to.
disable_observability
- Type: boolean
- Required: no (default:
false
)
Disable the observability features of the TensorZero Gateway (not recommended).
[models.model_name]
The [models.model_name]
section defines the behavior of a model.
You can define multiple models by including multiple [models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [models."llama-3.1-8b-instruct"]
.
routing
- Type: array of strings
- Required: yes
A list of provider names to route requests to.
The providers must be defined in the providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
[models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [models.model_name.providers."vllm.internal"]
.
type
- Type: string
- Required: yes
Defines the types of the provider. See Integrations » Model Providers for details.
The supported provider types are anthropic
, aws_bedrock
, azure
, fireworks
, gcp_vertex
, mistral
, openai
, together
, and vllm
.
The other fields in the provider sub-section depend on the provider type.
type: "anthropic"
type: "anthropic"
type: "anthropic"
model_name
- Type: string
- Required: yes
Defines the model name to use with the Anthropic API. See Anthropic’s documentation for the list of available model names.
type: "aws_bedrock"
type: "aws_bedrock"
type: "aws_bedrock"
model_id
- Type: string
- Required: yes
Defines the model ID to use with the AWS Bedrock API. See AWS Bedrock’s documentation for the list of available model IDs.
region
- Type: string
- Required: no (default: based on credentials if set, otherwise
us-east-1
)
Defines the AWS region to use with the AWS Bedrock API.
type: "azure"
type: "azure"
type: "azure"
The TensorZero Gateway handles the API version under the hood (currently 2024-06-01
).
You only need to set the deployment_id
and endpoint
fields.
deployment_id
- Type: string
- Required: yes
Defines the deployment ID of the Azure OpenAI deployment.
See Azure OpenAI’s documentation for the list of available models.
endpoint
- Type: string
- Required: yes
Defines the endpoint of the Azure OpenAI deployment (protocol and hostname).
type: "fireworks"
type: "fireworks"
type: "fireworks"
model_name
- Type: string
- Required: yes
Defines the model name to use with the Fireworks API.
See Fireworks’ documentation for the list of available model names. You can also deploy your own models on Fireworks AI.
type: "gcp_vertex"
type: "gcp_vertex"
type: "gcp_vertex"
location
- Type: string
- Required: yes
Defines the location (region) of the GCP Vertex AI model.
model_id
- Type: string
- Required: yes
Defines the model ID of the GCP Vertex AI model.
See GCP Vertex AI’s documentation for the list of available model IDs.
project_id
- Type: string
- Required: yes
Defines the project ID of the GCP Vertex AI model.
type: "mistral"
type: "mistral"
type: "mistral"
model_name
- Type: string
- Required: yes
Defines the model name to use with the Mistral API.
See Mistral’s documentation for the list of available model names.
type: "openai"
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
Defines the base URL of the OpenAI API.
You can use the api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.
model_name
- Type: string
- Required: yes
Defines the model name to use with the OpenAI API.
See OpenAI’s documentation for the list of available model names.
type: "together"
type: "together"
type: "together"
model_name
- Type: string
- Required: yes
Defines the model name to use with the Together API.
See Together’s documentation for the list of available model names. You can also deploy your own models on Together AI.
type: "vllm"
type: "vllm"
type: "vllm"
api_base
- Type: string
- Required: yes (default:
http://localhost:8000/v1/
)
Defines the base URL of the VLLM API.
model_name
- Type: string
- Required: yes
Defines the model name to use with the vLLM API.
[embedding_models.model_name]
The [embedding_models.model_name]
section defines the behavior of an embedding model.
You can define multiple models by including multiple [embedding_models.model_name]
sections.
A model is provider agnostic, and the relevant providers are defined in the providers
sub-section (see below).
If your model_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define embedding-0.1
as [embedding_models."embedding-0.1"]
.
routing
- Type: array of strings
- Required: yes
A list of provider names to route requests to.
The providers must be defined in the providers
sub-section (see below).
The TensorZero Gateway will attempt to route a request to the first provider in the list, and fallback to subsequent providers in order if the request is not successful.
[embedding_models.model_name.providers.provider_name]
The providers
sub-section defines the behavior of a specific provider for a model.
You can define multiple providers by including multiple [embedding_models.model_name.providers.provider_name]
sections.
If your provider_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define vllm.internal
as [embedding_models.model_name.providers."vllm.internal"]
.
type
- Type: string
- Required: yes
Defines the types of the provider. See Integrations » Model Providers for details.
TensorZero currently only supports openai
as a provider for embedding models.
More integrations are on the way.
The other fields in the provider sub-section depend on the provider type.
type: "openai"
type: "openai"
type: "openai"
api_base
- Type: string
- Required: no (default:
https://api.openai.com/v1/
)
Defines the base URL of the OpenAI API.
You can use the api_base
field to use an API provider that is compatible with the OpenAI API.
However, many providers are only “approximately compatible” with the OpenAI API, so you might need to use a specialized model provider in those cases.
model_name
- Type: string
- Required: yes
Defines the model name to use with the OpenAI API.
See OpenAI’s documentation for the list of available model names.
[functions.function_name]
The [functions.function_name]
section defines the behavior of a function.
You can define multiple functions by including multiple [functions.function_name]
sections.
A function can have multiple variants, and each variant is defined in the variants
sub-section (see below).
A function expresses the abstract behavior of an LLM call (e.g. the schemas for the messages), and its variants express concrete instantiations of that LLM call (e.g. specific templates and models).
If your function_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define summarize-2.0
as [functions."summarize-2.0"]
.
assistant_schema
- Type: string (path)
- Required: no
Defines the path to the assistant schema file. The path is relative to the configuration file.
If provided, the assistant schema file should contain a JSON Schema for the assistant messages. The variables in the schema are used for templating the assistant messages. If a schema is provided, all function variants must also provide an assistant template (see below).
system_schema
- Type: string (path)
- Required: no
Defines the path to the system schema file. The path is relative to the configuration file.
If provided, the system schema file should contain a JSON Schema for the system message. The variables in the schema are used for templating the system message. If a schema is provided, all function variants must also provide a system template (see below).
type
- Type: string
- Required: yes
Defines the type of the function.
The supported function types are chat
and json
.
Most other fields in the function section depend on the function type.
type: "chat"
type: "chat"
type: "chat"
parallel_tool_calls
- Type: boolean
- Required: no (default:
false
)
Determines whether the function should be allowed to call multiple tools in a single conversation turn.
Most model providers do not support this feature. In those cases, this field will be ignored.
tool_choice
- Type: string
- Required: no (default:
auto
)
Determines the tool choice strategy for the function.
The supported tool choice strategies are:
none
: The function should not use any tools.auto
: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required
: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }
: The model should use a specific tool. The tool must be defined in thetools
field (see below).
tools
- Type: array of strings
- Required: no (default:
[]
)
Determines the tools that the function can use.
The supported tools are defined in [tools.tool_name]
sections (see below).
type: "json"
type: "json"
type: "json"
output_schema
- Type: string (path)
- Required: no (default:
{}
, the empty JSON schema that accepts any valid JSON output)
Defines the path to the output schema file, which should contain a JSON Schema for the output of the function. The path is relative to the configuration file.
This schema is used for validating the output of the function.
user_schema
- Type: string (path)
- Required: no
Defines the path to the user schema file. The path is relative to the configuration file.
If provided, the user schema file should contain a JSON Schema for the user messages. The variables in the schema are used for templating the user messages. If a schema is provided, all function variants must also provide a user template (see below).
[functions.function_name.variants.variant_name]
The variants
sub-section defines the behavior of a specific variant of a function.
You can define multiple variants by including multiple [functions.function_name.variants.variant_name]
sections.
If your variant_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define llama-3.1-8b-instruct
as [functions.function_name.variants."llama-3.1-8b-instruct"]
.
type
- Type: string
- Required: yes
Defines the type of the variant.
TensorZero currently supports the following variant types:
Type | Description |
---|---|
chat_completion | Uses a chat completion model to generate responses by processing a series of messages in a conversational format. This is typically what you use out of the box with most LLMs. |
experimental_best_of_n | Generates multiple response candidates with other variants, and selects the best one using an evaluator model. |
experimental_dynamic_in_context_learning | Selects similar high-quality examples using an embedding of the input, and incorporates them into the prompt to enhance context and improve response quality. |
type: "chat_completion"
type: "chat_completion"
type: "chat_completion"
assistant_template
- Type: string (path)
- Required: no
Defines the path to the assistant template file. The path is relative to the configuration file.
This file should contain a MiniJinja template for the assistant messages.
If the template uses any variables, the variables should be defined in the function’s assistant_schema
field.
json_mode
- Type: string
- Required: no (default:
on
)
Defines the strategy for generating JSON outputs.
This parameter is only supported for variants of functions with type = "json"
.
The supported modes are:
off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
max_tokens
- Type: integer
- Required: no (default:
null
)
Defines the maximum number of tokens to generate.
model
- Type: string
- Required: yes
Defines the model to use for the variant. The model must be defined in the [models.model_name]
section (see above).
seed
- Type: integer
- Required: no (default:
null
)
Defines the seed to use for the variant.
system_template
- Type: string (path)
- Required: no
Defines the path to the system template file. The path is relative to the configuration file.
This file should contain a MiniJinja template for the system messages.
If the template uses any variables, the variables should be defined in the function’s system_schema
field.
temperature
- Type: float
- Required: no (default:
null
)
Defines the temperature to use for the variant.
user_template
- Type: string (path)
- Required: no
Defines the path to the user template file. The path is relative to the configuration file.
This file should contain a MiniJinja template for the user messages.
If the template uses any variables, the variables should be defined in the function’s user_schema
field.
weight
- Type: float
- Required: no (default: 0)
Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.
Variants will be sampled with a probability proportional to their weight.
For example, if variant A has a weight of 1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.
You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.
type: "experimental_best_of_n"
type: "experimental_best_of_n"
type: "experimental_best_of_n"
candidates
- Type: list of strings
- Required: yes
This inference strategy generates N candidate responses, and an evaluator model selects the best one. This approach allows you to leverage multiple prompts or variants to increase the likelihood of getting a high-quality response.
The candidates
parameter specifies a list of variant names used to generate candidate responses.
For example, if you have two variants defined (promptA
and promptB
), you could set up the candidates
list to generate two responses using promptA
and one using promptB
using the snippet below.
The evaluator would then choose the best response from these three candidates.
evaluator
- Type: object
- Required: yes
The evaluator
parameter specifies the configuration for the model that will evaluate and select the best response from the generated candidates.
The evaluator is configured similarly to a chat_completion
variant, but without the type
field.
timeout_s
- Type: float
- Required: no (default: 300s)
The timeout_s
parameter specifies the maximum time in seconds allowed for generating candidate responses.
Any candidate that takes longer than this duration to generate a response will be dropped from consideration.
weight
- Type: float
- Required: no (default: 0)
Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.
Variants will be sampled with a probability proportional to their weight.
For example, if variant A has a weight of 1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.
You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.
type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
type: "experimental_dynamic_in_context_learning"
embedding_model
- Type: string
- Required: yes
Defines the model to use for retrieving the similar examples.
The model must be defined in the [embedding_models.model_name]
section (see above).
The embedding model used for inference should be the same model previously used to generate the embeddings stored in ClickHouse.
json_mode
- Type: string
- Required: no (default:
on
)
Defines the strategy for generating JSON outputs.
This parameter is only supported for variants of functions with type = "json"
.
The supported modes are:
off
: Make a chat completion request without any special JSON handling (not recommended).on
: Make a chat completion request with JSON mode (if supported by the provider).strict
: Make a chat completion request with strict JSON mode (if supported by the provider). For example, the TensorZero Gateway uses Structured Outputs for OpenAI.implicit_tool
: Make a special-purpose tool use request under the hood, and convert the tool call into a JSON response.
k
- Type: non-negative integer
- Required: yes
Defines the number of examples to retrieve for the inference.
max_tokens
- Type: integer
- Required: no (default:
null
)
Defines the maximum number of tokens to generate.
model
- Type: string
- Required: yes
Defines the model to use for the variant.
The model must be defined in the [models.model_name]
section (see above).
seed
- Type: integer
- Required: no (default:
null
)
Defines the seed to use for the variant.
system_instructions
- Type: string (path)
- Required: no
Defines the path to the system instructions file. The path is relative to the configuration file.
The system instruction is a text file that will be added to the evaluator’s system prompt.
Unlike system_template
, it doesn’t support variables.
This file contains static instructions that define the behavior and role of the AI assistant for the specific function variant.
temperature
- Type: float
- Required: no (default:
null
)
Defines the temperature to use for the variant.
weight
- Type: float
- Required: no (default: 0)
Defines the weight of the variant. When you call a function, the weight determines the relative importance of the variant when sampling.
Variants will be sampled with a probability proportional to their weight.
For example, if variant A has a weight of 1.0
and variant B has a weight of 3.0
, variant A will be sampled with probability 1.0 / (1.0 + 3.0) = 25%
and variant B will be sampled with probability 3.0 / (1.0 + 3.0) = 75%
.
You can disable a variant by setting its weight to 0
.
The variant will only be used if there are no other variants available for sampling or if the variant is requested explicitly in the request with variant_name
.
This is useful for defining fallback variants, which won’t be used unless no other variants are available.
[metrics]
The [metrics]
section defines the behavior of a metric.
You can define multiple metrics by including multiple [metrics.metric_name]
sections.
The metric name can’t be comment
or demonstration
, as those names are reserved for internal use.
If your metric_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define beats-gpt-3.5
as [metrics."beats-gpt-3.5"]
.
level
- Type: string
- Required: yes
Defines whether the metric applies to individual inference or across entire episodes.
The supported levels are inference
and episode
.
optimize
- Type: string
- Required: yes
Defines whether the metric should be maximized or minimized.
The supported values are max
and min
.
type
- Type: string
- Required: yes
Defines the type of the metric.
The supported metric types are boolean
and float
.
[tools.tool_name]
The [tools.tool_name]
section defines the behavior of a tool.
You can define multiple tools by including multiple [tools.tool_name]
sections.
If your tool_name
is not a basic string, it can be escaped with quotation marks.
For example, periods are not allowed in basic strings, so you can define run-python-3.10
as [tools."run-python-3.10"]
.
You can enable a tool for a function by adding it to the function’s tools
field.
description
- Type: string
- Required: yes
Defines the description of the tool provided to the model.
You can typically materially improve the quality of responses by providing a detailed description of the tool.
parameters
- Type: string (path)
- Required: yes
Defines the path to the parameters file. The path is relative to the configuration file.
This file should contain a JSON Schema for the parameters of the tool.
strict
- Type: boolean
- Required: no (default:
false
)
If set to true
, the TensorZero Gateway attempts to use strict JSON generation for the tool parameters.
This typically improves the quality of responses.
Only a few providers support strict JSON generation. For example, the TensorZero Gateway uses Structured Outputs for OpenAI. If the provider does not support strict mode, the TensorZero Gateway ignores this field.