Best-of-N Sampling

- Generate multiple response candidates using one or more variants (i.e. possibly using different models and prompts)
- Use an evaluator model to select the best response from these candidates
- Return the selected response as the final output
TensorZero also supports a similar inference-time strategy called Mixture-of-N Sampling.
experimental_best_of_n
type.
Here’s a simple example configuration:
tensorzero.toml
- We define a
best_of_n
variant that uses two different variants (promptA
andpromptB
) to generate candidates. It generates two candidates usingpromptA
and one candidate usingpromptB
. - The
evaluator
block specifies the model and instructions for selecting the best response.
You should define the evaluator model as if it were solving the problem (not judging the quality of the candidates).
TensorZero will automatically make the necessary prompt modifications to evaluate the candidates.
experimental_best_of_n
variant type in Configuration Reference.
We also provide a complete runnable example:Improving LLM Chess Ability with Best/Mixture-of-N SamplingThis example showcases how best-of-N sampling can significantly enhance an LLM’s chess-playing abilities by selecting the most promising moves from multiple generated options.
Chain-of-Thought (CoT)
experimental_chain_of_thought
variant type is only available for non-streaming requests to JSON functions.
For chat functions, we recommend using reasoning models instead (e.g. OpenAI o3, DeepSeek R1).
To use CoT in TensorZero, you need to configure a variant with the experimental_chain_of_thought
type.
It uses the same configuration as a chat_completion
variant.
Under the hood, TensorZero will prepend an additional field to the desired output schema to include the chain-of-thought reasoning and remove it from the final output.
The reasoning is stored in the database for downstream observability and optimization.
Dynamic In-Context Learning (DICL)

- Before inference: Curate reference examples, embed them, and store in the database
- Embed the current input using an embedding model and retrieve similar high-quality examples from a database of past interactions
- Incorporate these examples into the prompt to provide additional context
- Generate a response using the enhanced prompt
experimental_dynamic_in_context_learning
type.
Here’s a simple example configuration:
tensorzero.toml
- We define a
dicl
variant that uses theexperimental_dynamic_in_context_learning
type. - The
embedding_model
field specifies the model used to embed inputs for similarity search. We also need to define this model in theembedding_models
section. - The
k
parameter determines the number of similar examples to retrieve and incorporate into the prompt.
DynamicInContextLearningExample
table in your ClickHouse database.
These examples will be used by the DICL variant to enhance the context of your prompts at inference time.
The process of adding these examples to the database is crucial for DICL to function properly.
We provide a sample recipe that simplifies this process: Dynamic In-Context Learning with OpenAI.
This recipe supports selecting examples based on boolean metrics, float metrics, and demonstrations.
It helps you populate the DynamicInContextLearningExample
table with high-quality, relevant examples from your historical data.
For more information on the DynamicInContextLearningExample
table and its role in the TensorZero data model, see Data Model.
For a comprehensive list of configuration options for the experimental_dynamic_in_context_learning
variant type, see Configuration Reference.
We also provide a complete runnable example:Optimizing Data Extraction (NER) with TensorZeroThis example demonstrates how Dynamic In-Context Learning (DICL) can enhance Named Entity Recognition (NER) performance by leveraging relevant historical examples to improve data extraction accuracy and consistency without having to fine-tune a model.
Mixture-of-N Sampling

- Generate multiple response candidates using one or more variants (i.e. possibly using different models and prompts)
- Use a fuser model to combine the candidates into a single response
- Return the combined response as the final output
TensorZero also supports a similar inference-time strategy called Best-of-N Sampling.
experimental_mixture_of_n
type.
Here’s a simple example configuration:
tensorzero.toml
- We define a
mixture_of_n
variant that uses two different variants (promptA
andpromptB
) to generate candidates. It generates two candidates usingpromptA
and one candidate usingpromptB
. - The
fuser
block specifies the model and instructions for combining the candidates into a single response.
You should define the fuser model as if it were solving the problem (not judging the quality of the candidates).
TensorZero will automatically make the necessary prompt modifications to combine the candidates.
experimental_mixture_of_n
variant type in Configuration Reference.
We also provide a complete runnable example:Improving LLM Chess Ability with Best/Mixture-of-N SamplingThis example showcases how Mixture-of-N sampling can significantly enhance an LLM’s chess-playing abilities by selecting the most promising moves from multiple generated options.