Learn how to use Dynamic In-Context Learning to optimize your LLM applications.
Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt.
Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time.Here’s how it works:
Before inference: You curate examples of good LLM behavior. TensorZero embeds them using an embedding model and stores them in your database.
TensorZero embeds inference inputs before sending them to the LLM and retrieves similar curated examples from your database.
TensorZero inserts these examples into your prompt and sends the request to the LLM.
The LLM generates a response using the enhanced prompt.
If still very large: consider supervised fine-tuning instead.
Inference cost (and to a lesser extent, latency) is not a bottleneck. Optimization is relatively cheap (generating embeddings), but DICL materially increases input tokens at inference time.
If inference cost matters: consider supervised fine-tuning instead, which shifts the marginal cost to a one-time optimization workflow.
If your prompt has a lot of boilerplate, configure prompt templates. DICL operates on template variables, so it’ll improve retrieval (and therefore inference quality) and mitigate the marginal cost and latency. Set system_instructions in your variant configuration with the boilerplate instead.
Example: Data Extraction (Named Entity Recognition) — Configuration
system_template.minijinja
Copy
You are an assistant that is performing a named entity recognition task.Your job is to extract entities from a given text.The entities you are extracting are:- people- organizations- locations- miscellaneous other entitiesPlease return the entities in the following JSON format:{"person": ["person1", "person2", ...],"organization": ["organization1", "organization2", ...],"location": ["location1", "location2", ...],"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]}
2
Collect your optimization data
TensorZero Dataset
Historical Inferences
After deploying the TensorZero Gateway with ClickHouse, build a dataset of good examples for the extract_entities function you configured.
You can create datapoints from historical inferences or external/synthetic datasets.
After deploying the TensorZero Gateway with ClickHouse, make inference calls to the extract_entities function you configured.
TensorZero automatically collects structured data about those inferences, which can later be used as training examples for DICL.You can curate good examples in multiple ways:
Collecting demonstrations: Collect demonstrations of good behavior (or labels) from human annotation or other sources.
Filtering with metrics: Query inferences that scored well on your metrics (e.g. output_source="inference" with a filter for high scores).
Examples from an expensive model: Run inferences with a powerful model (e.g. GPT-5) and use those outputs as demonstrations for a smaller model (e.g. GPT-5 Mini).
The performance of DICL degrades as the curated examples become noisier with bad behavior.
There is a trade-off between dataset size and quality of datapoints.
For this example, we’ll use demonstrations.
You can submit demonstration feedback using the demonstration metric:
Copy
t0.feedback( metric_name="demonstration", value=corrected_output, # Provide the ideal output for that inference inference_id=response.inference_id,)
Then, query inferences with output_source="demonstration" to get examples where the output has been corrected:
Configure DICL by specifying the name of your function, variant, and embedding model.
Copy
from tensorzero import DICLOptimizationConfigoptimization_config = DICLOptimizationConfig( function_name="extract_entities", variant_name="dicl", embedding_model="openai::text_embedding_3_small", k=10, # how many examples are retrieved and injected as context model="openai::gpt-5-mini", # LLM that will generate outputs using the retrieved examples)
You should experiment with different choices of k.
Typical values are 3-10, with smaller values when inputs tend to be larger.
If you see inferences with irrelevant examples, consider setting a max_distance in your variant configuration later. With this setting, the retrieval step can return less than k examples if they don’t meet a cosine distance threshold. Make sure to tune the value according to your embedding model.
4
Launch DICL
You can now launch your DICL optimization job using the TensorZero Gateway:
The embedding_model in the configuration must match the embedding model you used during optimization.
That’s it!
At inference time, the DICL variant will retrieve the k most similar examples from your training data and include them as context for in-context learning.
You can run experiments comparing your baseline and DICL variants using adaptive A/B testing.