Skip to main content
Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt. Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time. Here’s how it works:
  1. Before inference: You curate examples of good LLM behavior. TensorZero embeds them using an embedding model and stores them in your database.
  2. TensorZero embeds inference inputs before sending them to the LLM and retrieves similar curated examples from your database.
  3. TensorZero inserts these examples into your prompt and sends the request to the LLM.
  4. The LLM generates a response using the enhanced prompt.
Diagram: Dynamic In-Context Learning

When should you use DICL?

DICL is particularly useful if you have limited high-quality data.
CriterionImpactDetails
ComplexityLowRequires data curation; few parameters
Data EfficiencyHighAchieves good results with limited data
Optimization CeilingModeratePlateaus quickly with more data; prompt only but dynamic
Optimization CostLowGenerates embeddings for curated examples
Inference CostHighScales input tokens proportional to k
Inference LatencyModerateRequires embedding and retrieval before LLM call
DICL tends to work best when:
  • You have dozens to thousands of curated examples of good LLM behavior.
    • If less: you should label a few dozen datapoints manually.
    • If more: DICL still works well, but you should consider supervised fine-tuning instead.
  • The inference inputs are reasonably sized. Large inputs inflate the context and limit k (see below), degrading performance.
    • If prompts have a lot of boilerplate: see configure prompt templates to mitigate impact.
    • If still very large: consider supervised fine-tuning instead.
  • Inference cost (and to a lesser extent, latency) is not a bottleneck. Optimization is relatively cheap (generating embeddings), but DICL materially increases input tokens at inference time.
    • If inference cost matters: consider supervised fine-tuning instead, which shifts the marginal cost to a one-time optimization workflow.

Optimize your LLM inferences with Dynamic In-Context Learning

You can find a complete runnable example of this guide on GitHub.
1

Configure your LLM application

Define a function with a baseline variant for your application.
tensorzero.toml
[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"

[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-5-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"
If your prompt has a lot of boilerplate, configure prompt templates. DICL operates on template variables, so it’ll improve retrieval (and therefore inference quality) and mitigate the marginal cost and latency. Set system_instructions in your variant configuration with the boilerplate instead.
system_template.minijinja
You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.

The entities you are extracting are:

- people
- organizations
- locations
- miscellaneous other entities

Please return the entities in the following JSON format:

{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}
2

Collect your optimization data

After deploying the TensorZero Gateway with ClickHouse, build a dataset of good examples for the extract_entities function you configured. You can create datapoints from historical inferences or external/synthetic datasets.
from tensorzero import ListDatapointsRequest

datapoints = t0.list_datapoints(
    dataset_name="extract_entities_dataset",
    request=ListDatapointsRequest(
        function_name="extract_entities",
    ),
)

rendered_samples = t0.experimental_render_samples(
    stored_samples=datapoints.datapoints,
    variants={"extract_entities": "baseline"},
)
3

Configure DICL

Configure DICL by specifying the name of your function, variant, and embedding model.
from tensorzero import DICLOptimizationConfig

optimization_config = DICLOptimizationConfig(
    function_name="extract_entities",
    variant_name="dicl",
    embedding_model="openai::text_embedding_3_small",
    k=10,  # how many examples are retrieved and injected as context
    model="openai::gpt-5-mini",  # LLM that will generate outputs using the retrieved examples
)
You can also define a custom embedding model in your configuration.
You should experiment with different choices of k. Typical values are 3-10, with smaller values when inputs tend to be larger.
If you see inferences with irrelevant examples, consider setting a max_distance in your variant configuration later. With this setting, the retrieval step can return less than k examples if they don’t meet a cosine distance threshold. Make sure to tune the value according to your embedding model.
4

Launch DICL

You can now launch your DICL optimization job using the TensorZero Gateway:
job_handle = t0.experimental_launch_optimization(
    train_samples=rendered_samples,
    optimization_config=optimization_config,
)

job_info = t0.experimental_poll_optimization(
    job_handle=job_handle
)
DICL will embed all your training samples and store them in ClickHouse.
5

Update your configuration

After optimization completes, add the DICL variant to your configuration:
tensorzero.toml
[functions.extract_entities.variants.dicl]
type = "experimental_dynamic_in_context_learning"
embedding_model = "openai::text_embedding_3_small"
k = 10
model = "openai::gpt-5-mini"
json_mode = "strict"
The embedding_model in the configuration must match the embedding model you used during optimization.
That’s it! At inference time, the DICL variant will retrieve the k most similar examples from your training data and include them as context for in-context learning.
You can run experiments comparing your baseline and DICL variants using adaptive A/B testing.

DICLOptimizationConfig

Configure DICL optimization by creating a DICLOptimizationConfig object with the following parameters:
embedding_model
str
required
Name of the embedding model to use.
function_name
str
required
Name of the TensorZero function to optimize.
variant_name
str
required
Name to use for the DICL variant.
append_to_existing_variants
bool
default:false
Whether to append to existing variants. If false, raises an error if the variant already exists.
batch_size
int
default:128
Batch size for embedding generation.
dimensions
int
Embedding dimensions. If not specified, uses the embedding model’s default.
k
int
default:10
Number of nearest neighbors to retrieve at inference time.
max_concurrency
int
default:10
Maximum concurrent embedding requests.
model
str
default:"openai::gpt-5-mini"
Model to use for the DICL variant.