Dynamic In-Context Learning (DICL)

Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt. Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time. Here’s how it works:

Before inference: You curate examples of good LLM behavior. TensorZero embeds them using an embedding model and stores them in your database.
TensorZero embeds inference inputs before sending them to the LLM and retrieves similar curated examples from your database.
TensorZero inserts these examples into your prompt and sends the request to the LLM.
The LLM generates a response using the enhanced prompt.

When should you use DICL?

DICL is particularly useful if you have limited high-quality data.

Criterion	Impact	Details
Complexity	Low	Requires data curation; few parameters
Data Efficiency	High	Achieves good results with limited data
Optimization Ceiling	Moderate	Plateaus quickly with more data; prompt only but dynamic
Optimization Cost	Low	Generates embeddings for curated examples
Inference Cost	High	Scales input tokens proportional to `k`
Inference Latency	Moderate	Requires embedding and retrieval before LLM call

DICL tends to work best when:

You have dozens to thousands of curated examples of good LLM behavior.
- If less: you should label a few dozen datapoints manually.
- If more: DICL still works well, but you should consider supervised fine-tuning instead.
The inference inputs are reasonably sized. Large inputs inflate the context and limit k (see below), degrading performance.
- If prompts have a lot of boilerplate: see configure prompt templates to mitigate impact.
- If still very large: consider supervised fine-tuning instead.
Inference cost (and to a lesser extent, latency) is not a bottleneck. Optimization is relatively cheap (generating embeddings), but DICL materially increases input tokens at inference time.
- If inference cost matters: consider supervised fine-tuning instead, which shifts the marginal cost to a one-time optimization workflow.

Optimize your LLM inferences with Dynamic In-Context Learning

You can find a complete runnable example of this guide on GitHub.

Configure your LLM application

Define a function with a baseline variant for your application.

tensorzero.toml

[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"

[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-5-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"

If your prompt has a lot of boilerplate, configure prompt templates. DICL operates on template variables, so it’ll improve retrieval (and therefore inference quality) and mitigate the marginal cost and latency. Set system_instructions in your variant configuration with the boilerplate instead.

Example: Data Extraction (Named Entity Recognition) — Configuration

system_template.minijinja

You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.

The entities you are extracting are:

- people
- organizations
- locations
- miscellaneous other entities

Please return the entities in the following JSON format:

{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}

Collect your optimization data

After deploying the TensorZero Gateway with Postgres, build a dataset of good examples for the extract_entities function you configured. You can create datapoints from historical inferences or external/synthetic datasets.

The performance of DICL degrades as the curated examples become noisier with bad behavior. There is a trade-off between dataset size and quality of datapoints.

Configure DICL

Configure DICL by specifying the name of your function, variant, and embedding model.

from tensorzero import DICLOptimizationConfig

optimization_config = DICLOptimizationConfig(
    function_name="extract_entities",
    variant_name="dicl",
    embedding_model="openai::text-embedding-3-small",
    k=10,  # how many examples are retrieved and injected as context
    model="openai::gpt-5-mini",  # LLM that will generate outputs using the retrieved examples
)

You can also define a custom embedding model in your configuration.

You should experiment with different choices of k. Typical values are 3-10, with smaller values when inputs tend to be larger.

If you see inferences with irrelevant examples, consider setting a max_distance in your variant configuration later. With this setting, the retrieval step can return less than k examples if they don’t meet a cosine distance threshold. Make sure to tune the value according to your embedding model.

Launch DICL

You can now launch your DICL optimization job using the TensorZero Gateway:

job_handle = t0.experimental_launch_optimization_workflow(
    function_name="extract_entities",
    template_variant_name="baseline",
    dataset_name="extract_entities_dataset",
    optimizer_config=optimization_config,
)

job_info = t0.experimental_poll_optimization(
    job_handle=job_handle
)

DICL will embed all your training samples and store them in Postgres.

Update your configuration

After optimization completes, add the DICL variant to your configuration:

tensorzero.toml

[functions.extract_entities.variants.dicl]
type = "experimental_dynamic_in_context_learning"
embedding_model = "openai::text-embedding-3-small"
k = 10
model = "openai::gpt-5-mini"
json_mode = "strict"

The embedding_model in the configuration must match the embedding model you used during optimization.

That’s it! At inference time, the DICL variant will retrieve the k most similar examples from your training data and include them as context for in-context learning.

You can run experiments comparing your baseline and DICL variants using adaptive A/B testing.

`DICLOptimizationConfig`

Configure DICL optimization by creating a DICLOptimizationConfig object with the following parameters:

embedding_model

str

required

Name of the embedding model to use.

function_name

str

required

Name of the TensorZero function to optimize.

model

str

required

Model to use for the DICL variant.

variant_name

str

required

Name to use for the DICL variant.

append_to_existing_variants

bool

default:false

Whether to append to existing variants. If false, raises an error if the variant already exists.

batch_size

int

default:128

Batch size for embedding generation.

dimensions

int

Embedding dimensions. If not specified, uses the embedding model’s default.

int

default:10

Number of nearest neighbors to retrieve at inference time.

max_concurrency

int

default:10

Maximum concurrent embedding requests.

Documentation Index

​When should you use DICL?

​Optimize your LLM inferences with Dynamic In-Context Learning

​DICLOptimizationConfig

When should you use DICL?

Optimize your LLM inferences with Dynamic In-Context Learning

`DICLOptimizationConfig`