Skip to main content
You can configure the TensorZero Gateway to distribute inference requests between different variants (prompts, models, etc.) of a function (a “task” or “agent”). Variants enable you to experiment with different models, prompts, parameters, inference strategies, and more.
We recommend running adaptive A/B tests if you have a metric you can optimize for.

Configure multiple variants

If you specify multiple variants for a function, by default the gateway will sample between them with equal probability (uniform sampling). For example, if you call the draft_email function below, the gateway will sample between the two variants at each inference with equal probability.
[functions.draft_email]
type = "chat"

[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"

[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"
During an episode, multiple inference requests to the same function will receive the same variant (unless fallbacks are necessary). This consistent variant assignment acts as a randomized controlled experiment, providing the statistical foundation needed to make causal inferences about which configurations perform best.

Configure candidate variants explicitly

You can explicitly specify which variants to sample uniformly from using candidate_variants.
[functions.draft_email]
type = "chat"

[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"

[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"

[functions.draft_email.variants.grok_4]
type = "chat_completion"
model = "xai::grok-4-0709"

[functions.draft_email.experimentation]  
type = "uniform"
candidate_variants = ["gpt_5_mini", "claude_haiku_4_5"]
In this example, the gateway samples uniformly between gpt_5_mini and claude_haiku_4_5 (50% each).

Configure sampling weights for variants

You can configure weights for variants to control the probability of each variant being sampled. This is particularly useful for canary tests where you want to gradually roll out a new variant to a small percentage of users.
[functions.draft_email]
type = "chat"

[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"

[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"

[functions.draft_email.experimentation] 
type = "static_weights"
candidate_variants = {"gpt_5_mini" = 0.9, "claude_haiku_4_5" = 0.1}
In this example, 90% of episodes will be sampled from the gpt_5_mini variant and 10% will be sampled from the claude_haiku_4_5 variant.
If the weights don’t add up to 1, TensorZero will automatically normalize them and sample the variants accordingly. For example, if a variant has weight 5 and another has weight 1, the first variant will be sampled 5/6 of the time (≈ 83.3%) and the second variant will be sampled 1/6 of the time (≈ 16.7%).

Configure fallback-only variants

You can configure variants that are only used as fallbacks with fallback_variants. You can use this field with both uniform and static_weights sampling.
[functions.draft_email]
type = "chat"

[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"

[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"

[functions.draft_email.variants.grok_4] 
type = "chat_completion"
model = "xai::grok-4-0709"

[functions.draft_email.experimentation]
type = "static_weights"
candidate_variants = {"gpt_5_mini" = 0.9, "claude_haiku_4_5" = 0.1}
fallback_variants = ["grok_4"] 
The gateway will first sample among the candidate_variants. If all candidates fail, the gateway attempts each variant in fallback_variants in order. See Retries & Fallbacks for more information.