Prompt templates and schemas simplify engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. Notably, they enable you to:
  1. Decouple prompts from application code. As you iterate on your prompts over time (or A/B test different prompts), you’ll be able to manage them in a centralized way without making changes to the application code.
  2. Collect a structured inference dataset. Imagine down the road you want to fine-tune a model using your historical data. If you had only stored prompts as strings, you’d be stuck with the outdated prompts that were actually used at inference time. However, if you had access to the input variables in a structured dataset, you’d easily be able to counterfactually swap new prompts into your training data before fine-tuning. This is particularly important when experimenting with new models, because prompts don’t always translate well between them.
  3. Implement model-specific prompts. We often find that the best prompt for one model is different from the best prompt for another. As you try out different models, you’ll need to be able to independently vary the prompt and the model and try different combinations thereof. This is commonly challenging to implement in application code, but trivial in TensorZero.
You can also find the runnable code for this example on GitHub.

Scenario

In the Quickstart, we built a simple LLM application that writes haikus about artificial intelligence. But what if we wanted to generate haikus about different topics? The naive solution is to parametrize the prompt in your application.
run.py
# Naive Solution (Not Recommended)

from tensorzero import TensorZeroGateway


def generate_haiku(topic):
    with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
        return client.inference(
            function_name="generate_haiku",
            input={
                "messages": [
                    {
                        "role": "user",
                        "content": f"Write a haiku about: {topic}",
                    }
                ]
            },
        )


print(generate_haiku("artificial intelligence"))
This works fine, and it’s typically how most people tackle it today. But there’s room for improvement! For this function, what we really care about here is the topic -> haiku mapping. The rest of the prompt is a detail of the current implemention, and it might evolve over time. Instead, let’s move the boilerplate for this user message to our configuration.

Prompt Templates

TensorZero uses the MiniJinja templating language. MiniJinja is mostly compatible with Jinja2, which is used by many popular projects like Flask and Django. We’ll save the template in a separate file and later reference it in a variant in our main configuration file, tensorzero.toml.
user_template.minijinja
Write a haiku about: {{ topic }}
If your template includes any variables, you must also provide a schema that fits the template.
MiniJinja also provides a browser playground where you can test your templates.
If you don’t want to use a template for a particular content block, you can use the raw_text content block type instead of text.

Prompt Schemas

Schemas ensure that different templates for a function share a consistent interface and validate inputs before inference.
Defining a schema is optional: you can use templates without schemas. If you don’t define a schema, the gateway will automatically supply the variables system_text, user_text, and assistant_text, which you can optionally use in your template. These variables include the content of the text content block provided at inference time.
TensorZero uses the JSON Schema format. Similar to templates, we’ll specify it in a separate file and reference it in our configuration. JSON Schemas are a bit cumbersome to write, but luckily LLMs are great at doing it! Let’s give Claude (Sonnet 3.5) the following query:
Generate a JSON schema with a single field: `topic`.
The `topic` field is required. No additional fields are allowed.
It correctly generates the following schema:
user_schema.json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "topic": {
      "type": "string"
    }
  },
  "required": ["topic"],
  "additionalProperties": false
}
You can export JSON Schemas from Pydantic models and Zod schemas.

Putting It All Together

Let’s incorporate our template and our schema in our configuration file. In TensorZero, schemas belong to functions and templates belong to variants. Since a function can have multiple variants, you’ll be able to experiment with different prompts for a given function, but you’ll still ensure they have a consistent interface for your application. If you have multiple templates, you’ll need a single schema that accounts for the variables in all of them. In other words, your schema should contain all the variables you might want for your LLM message, but a particular template doesn’t need to use every variable defined in your schema.
tensorzero.toml
# We define a function and a variant, just like in our Quickstart...
# ... but this time we include a schema and a template.
[functions.generate_haiku_with_topic]
type = "chat"
user_schema = "functions/generate_haiku_with_topic/user_schema.json" # relative to tensorzero.toml
# system_schema = "..."
# assistant_schema = "..."

[functions.generate_haiku_with_topic.variants.gpt_4o_mini]
type = "chat_completion"
model = "openai::gpt-4o-mini"
user_template = "functions/generate_haiku_with_topic/gpt_4o_mini/user_template.minijinja" # relative to tensorzero.toml
# system_template = "..."
# assistant_template = "..."
You can define separate templates and schemas for system, user, and assistant messages.
Note that our function’s name differs from the one in the Quickstart. We strongly encourage defining a new function when you change the schemas, to ensure you’ll always collect inference data with a consistent structure.We plan to introduce functionality to streamline schema migrations down the road.
You can use any file structure with TensorZero. We recommend the following structure to keep things organized:
- config/
  - functions/
    - generate_haiku_with_topic/
      - gpt_4o_mini/
        - user_template.minijinja
      - user_schema.json
  - tensorzero.toml
- docker-compose.yml see below
- run.py see below
The paths in tensorzero.toml are relative to its location, so we don’t specify the parent folder config/.
With everything in place, launch the TensorZero Gateway using this configuration. You can use the same Docker Compose configuration as the Quickstart (available below for convenience).
Let’s launch everything.
docker compose up
If the gateway is already running, you can update the configuration by restarting the container.

Structured Inference

Let’s update our original Python script to leverage our schema. Instead of sending the entire prompt in our inference request, now we only need to provide an object with the variables we need.
from tensorzero import TensorZeroGateway


def generate_haiku(topic):
    with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
        return client.inference(
            function_name="generate_haiku_with_topic",
            input={
                "messages": [
                    {
                        "role": "user",
                        "content": [{"type": "text", "arguments": {"topic": topic}}],
                    }
                ],
            },
        )


print(generate_haiku("artificial intelligence"))

Sample Output

Like in the Quickstart, the gateway will store inference data in our database. But this time, the input field will be structured according to our schema. Let’s check our database.
curl "http://localhost:8123/" \
  -d "SELECT * FROM tensorzero.ChatInference
      WHERE function_name = 'generate_haiku_with_topic'
      ORDER BY timestamp DESC
      LIMIT 1
      FORMAT Vertical"

Sample Output

Conclusion & Next Steps

Now we can manage our prompts as configuration files, and get structured inference data from the gateway! As discussed, it’s helpful to manage prompts in a centralized way. With TensorZero’s approach, these prompts still live in your repository, which simplifies versioning, access control, GitOps, and more. This setup also let us easily benefit from more advanced features like A/B testing different prompts or fine-tuning a model.