Optimize your prompt templates with GEPA
1
Configure your LLM application
Define a function and variant for your application.
The variant must have at least one prompt template (e.g. the LLM system instructions).
tensorzero.toml
Example: Data Extraction (Named Entity Recognition) — Configuration
Example: Data Extraction (Named Entity Recognition) — Configuration
system_template.minijinja
2
Collect your optimization data
- Historical Inferences
- TensorZero Dataset
After deploying the TensorZero Gateway with ClickHouse, make inference calls to the
extract_entities function you configured.
TensorZero automatically collects structured data about those inferences, which can later be used as training examples for GEPA.3
Configure an evaluation
GEPA template refinement is guided by evaluator scores.
Define an Inference Evaluation in your TensorZero configuration.
To demonstrate that GEPA works even with noisy evaluators, we don’t provide demonstrations (labels), only an LLM judge.
GEPA supports evaluations with any number of evaluators and any evaluator type (e.g. exact match, LLM judges).
Example: Data Extraction (Named Entity Recognition) — Evaluation
Example: Data Extraction (Named Entity Recognition) — Evaluation
tensorzero.toml
system_instructions.txt
4
Configure GEPA
Configure GEPA by specifying the name of your function and evaluation.
You are also free to choose the models used to analyze inferences and generate new templates.The
analysis_model reflects on individual inferences, reports on whether they are optimal, need improvement, or are erroneous, and provides suggestions for prompt template improvement.
The mutation_model generates new templates based on the collected analysis reports.
We recommend using strong models for these tasks.5
Launch GEPA
You can now launch your GEPA optimization job using the TensorZero Gateway:
6
Update your configuration
Review the generated templates and write them to your config directory:Finally, add the new variant to your configuration.
That’s it!
You are now ready to deploy your GEPA-optimized LLM application!
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
Example: Data Extraction (Named Entity Recognition) — Optimized Variant
tensorzero.toml
gepa-iter-9-gepa-iter-6-gepa-iter-4-baseline/system_template.minijinja
GEPAConfig
Configure GEPA optimization by creating a GEPAConfig object with the following parameters:
Required Parameters
| Parameter | Type | Description |
|---|---|---|
function_name | str | Name of the TensorZero function to optimize. |
evaluation_name | str | Name of the evaluation used to score candidate variants. |
analysis_model | str | Model used to analyze inference results (e.g. "anthropic::claude-sonnet-4-5"). |
mutation_model | str | Model used to generate prompt mutations (e.g. "anthropic::claude-sonnet-4-5"). |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
initial_variants | list[str] | All variants | List of variant names to initialize GEPA with. If not specified, uses all variants defined for the function. |
variant_prefix | str | None | Prefix for naming newly generated variants. |
batch_size | int | 5 | Number of training samples to analyze per iteration. |
max_iterations | int | 1 | Maximum number of optimization iterations. |
max_concurrency | int | 10 | Maximum number of concurrent inference calls. |
seed | int | None | Random seed for reproducibility. |
timeout | int | 300 | Client timeout in seconds for TensorZero gateway operations. |
include_inference_for_mutation | bool | True | Whether to include inference input/output in the analysis passed to the mutation model. Useful for few-shot examples but can cause context overflow with long conversations or outputs. |
retries | RetryConfig | None | Retry configuration for inference calls during optimization. |
max_tokens | int | None | Maximum tokens for analysis and mutation model calls. Required for Anthropic models. |