You can configure the TensorZero Gateway to distribute inference requests between different variants (prompts, models, etc.) of a function (a “task” or “agent”).
Variants enable you to experiment with different models, prompts, parameters, inference strategies, and more.
If you specify multiple variants for a function, by default the gateway will sample between them with equal probability (uniform sampling).
For example, if you call the draft_email function below, the gateway will sample between the two variants at each inference with equal probability.
[functions.draft_email]
type = "chat"
[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"
[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"
During an episode, multiple inference requests to the same function will receive the same variant (unless fallbacks are necessary).
This consistent variant assignment acts as a randomized controlled experiment, providing the statistical foundation needed to make causal inferences about which configurations perform best.
You can explicitly specify which variants to sample uniformly from using candidate_variants.
[functions.draft_email]
type = "chat"
[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"
[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"
[functions.draft_email.variants.grok_4]
type = "chat_completion"
model = "xai::grok-4-0709"
[functions.draft_email.experimentation]
type = "uniform"
candidate_variants = ["gpt_5_mini", "claude_haiku_4_5"]
In this example, the gateway samples uniformly between gpt_5_mini and claude_haiku_4_5 (50% each).
You can configure weights for variants to control the probability of each variant being sampled.
This is particularly useful for canary tests where you want to gradually roll out a new variant to a small percentage of users.
[functions.draft_email]
type = "chat"
[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"
[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"
[functions.draft_email.experimentation]
type = "static_weights"
candidate_variants = {"gpt_5_mini" = 0.9, "claude_haiku_4_5" = 0.1}
In this example, 90% of episodes will be sampled from the gpt_5_mini variant and 10% will be sampled from the claude_haiku_4_5 variant.
If the weights don’t add up to 1, TensorZero will automatically normalize them and sample the variants accordingly.
For example, if a variant has weight 5 and another has weight 1, the first variant will be sampled 5/6 of the time (≈ 83.3%) and the second variant will be sampled 1/6 of the time (≈ 16.7%).
You can configure variants that are only used as fallbacks with fallback_variants.
You can use this field with both uniform and static_weights sampling.
[functions.draft_email]
type = "chat"
[functions.draft_email.variants.gpt_5_mini]
type = "chat_completion"
model = "openai::gpt-5-mini"
[functions.draft_email.variants.claude_haiku_4_5]
type = "chat_completion"
model = "anthropic::claude-haiku-4-5"
[functions.draft_email.variants.grok_4]
type = "chat_completion"
model = "xai::grok-4-0709"
[functions.draft_email.experimentation]
type = "static_weights"
candidate_variants = {"gpt_5_mini" = 0.9, "claude_haiku_4_5" = 0.1}
fallback_variants = ["grok_4"]
The gateway will first sample among the candidate_variants.
If all candidates fail, the gateway attempts each variant in fallback_variants in order.
See Retries & Fallbacks for more information.