Prompt Templates & Schemas
Prompt templates and schemas simplify engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. Notably, they enable you to:
- Decouple prompts from application code. As you iterate on your prompts over time (or A/B test different prompts), you’ll be able to manage them in a centralized way without making changes to the application code.
- Collect a structured inference dataset. Imagine down the road you want to fine-tune a model using your historical data. If you had only stored prompts as strings, you’d be stuck with the outdated prompts that were actually used at inference time. However, if you had access to the input variables in a structured dataset, you’d easily be able to counterfactually swap new prompts into your training data before fine-tuning. This is particularly important when experimenting with new models, because prompts don’t always translate well between them.
- Implement model-specific prompts. We often find that the best prompt for one model is different from the best prompt for another. As you try out different models, you’ll need to be able to independently vary the prompt and the model and try different combinations thereof. This is commonly challenging to implement in application code, but trivial in TensorZero.
Scenario
In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence. But what if we wanted to generate haikus about different topics?
The naive solution is to parametrize the prompt in your application.
This works fine, and it’s typically how most people tackle it today. But there’s room for improvement!
For this function, what we really care about here is the topic -> haiku
mapping.
The rest of the prompt is a detail of the current implemention, and it might evolve over time.
Instead, let’s move the boilerplate for this user message to our configuration.
Prompt Templates
TensorZero uses the MiniJinja templating language. MiniJinja is mostly compatible with Jinja2, which is used by many popular projects like Flask and Django.
We’ll save the template in a separate file and later reference it in a variant in our main configuration file, tensorzero.toml
.
If your template includes any variables, you must also provide a schema that fits the template.
Prompt Schemas
Schemas ensure that different templates for a function share a consistent interface and validate inputs before inference.
TensorZero uses the JSON Schema format. Similar to templates, we’ll specify it in a separate file and reference it in our configuration.
JSON Schemas are a bit cumbersome to write, but luckily LLMs are great at doing it!
Let’s give Claude (Sonnet 3.5) the following query:
It correctly generates the following schema:
Putting It All Together
Let’s incorporate our template and our schema in our configuration file.
In TensorZero, schemas belong to functions and templates belong to variants. Since a function can have multiple variants, you’ll be able to experiment with different prompts for a given function, but you’ll still ensure they have a consistent interface for your application. If you have multiple templates, you’ll need a single schema that accounts for the variables in all of them. In other words, your schema should contain all the variables you might want for your LLM message, but a particular template doesn’t need to use every variable defined in your schema.
You can use any file structure with TensorZero. We recommend the following structure to keep things organized:
Directoryconfig/
Directoryfunctions/
Directorygenerate_haiku_with_topic/
Directorygpt_4o_mini/
- …
- user_template.minijinja
- user_schema.json
- tensorzero.toml
- docker-compose.yml see below
- run.py see below
With everything in place, launch the TensorZero Gateway using this configuration. You can use the same Docker Compose configuration as the Quick Start (available below for convenience).
Docker Compose Configuration
Let’s launch everything.
Structured Inference
Let’s update our original Python script to leverage our schema.
Instead of sending the entire prompt in our inference request, now we only need to provide an object with the variables we need.
Sample Output
Like in the Quick Start, the gateway will store inference data in our database.
But this time, the input
field will be structured according to our schema.
Let’s check our database.
Sample Output
Conclusion & Next Steps
Now we can manage our prompts as configuration files, and get structured inference data from the gateway!
As discussed, it’s helpful to manage prompts in a centralized way. With TensorZero’s approach, these prompts still live in your repository, which simplifies versioning, access control, GitOps, and more. This setup also let us easily benefit from more advanced features like A/B testing different prompts or fine-tuning a model.