Prompt Templates & Schemas

Prompt templates and schemas simplify engineering iteration, experimentation, and optimization, especially as application complexity and team size grow. Notably, they enable you to:

Decouple prompts from application code. As you iterate on your prompts over time (or A/B test different prompts), you’ll be able to manage them in a centralized way without making changes to the application code.
Collect a structured inference dataset. Imagine down the road you want to fine-tune a model using your historical data. If you had only stored prompts as strings, you’d be stuck with the outdated prompts that were actually used at inference time. However, if you had access to the input variables in a structured dataset, you’d easily be able to counterfactually swap new prompts into your training data before fine-tuning. This is particularly important when experimenting with new models, because prompts don’t always translate well between them.
Implement model-specific prompts. We often find that the best prompt for one model is different from the best prompt for another. As you try out different models, you’ll need to be able to independently vary the prompt and the model and try different combinations thereof. This is commonly challenging to implement in application code, but trivial in TensorZero.

Scenario

In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence. But what if we wanted to generate haikus about different topics?

The naive solution is to parametrize the prompt in your application.

# Naive Solution (Not Recommended)

from tensorzero import TensorZeroGateway


def generate_haiku(topic):
    with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
        return client.inference(
            function_name="generate_haiku",
            input={
                "messages": [
                    {
                        "role": "user",
                        "content": f"Write a haiku about: {topic}",
                    }
                ]
            },
        )


print(generate_haiku("artificial intelligence"))

This works fine, and it’s typically how most people tackle it today. But there’s room for improvement!

For this function, what we really care about here is the topic -> haiku mapping. The rest of the prompt is a detail of the current implemention, and it might evolve over time.

Instead, let’s move the boilerplate for this user message to our configuration.

Prompt Templates

TensorZero uses the MiniJinja templating language. MiniJinja is mostly compatible with Jinja2, which is used by many popular projects like Flask and Django.

We’ll save the template in a separate file and later reference it in a variant in our main configuration file, tensorzero.toml.

Write a haiku about: {{ topic }}

If your template includes any variables, you must also provide a schema that fits the template.

Prompt Schemas

Schemas ensure that different templates for a function share a consistent interface and validate inputs before inference.

TensorZero uses the JSON Schema format. Similar to templates, we’ll specify it in a separate file and reference it in our configuration.

JSON Schemas are a bit cumbersome to write, but luckily LLMs are great at doing it!

Let’s give Claude (Sonnet 3.5) the following query:

Generate a JSON schema with a single field: `topic`.
The `topic` field is required. No additional fields are allowed.

It correctly generates the following schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "topic": {
      "type": "string"
    }
  },
  "required": ["topic"],
  "additionalProperties": false
}

Putting It All Together

Let’s incorporate our template and our schema in our configuration file.

In TensorZero, schemas belong to functions and templates belong to variants. Since a function can have multiple variants, you’ll be able to experiment with different prompts for a given function, but you’ll still ensure they have a consistent interface for your application. If you have multiple templates, you’ll need a single schema that accounts for the variables in all of them. In other words, your schema should contain all the variables you might want for your LLM message, but a particular template doesn’t need to use every variable defined in your schema.

# We define a function and a variant, just like in our Quick Start...
# ... but this time we include a schema and a template.
[functions.generate_haiku_with_topic]
type = "chat"
user_schema = "functions/generate_haiku_with_topic/user_schema.json" # relative to tensorzero.toml
# system_schema = "..."
# assistant_schema = "..."

[functions.generate_haiku_with_topic.variants.gpt_4o_mini]
type = "chat_completion"
model = "openai::gpt-4o-mini"
user_template = "functions/generate_haiku_with_topic/gpt_4o_mini/user_template.minijinja" # relative to tensorzero.toml
# system_template = "..."
# assistant_template = "..."

You can use any file structure with TensorZero. We recommend the following structure to keep things organized:

Directoryconfig/
- Directoryfunctions/
  - Directorygenerate_haiku_with_topic/
    Directorygpt_4o_mini/
    user_template.minijinja
    user_schema.json
- tensorzero.toml
docker-compose.yml see below
run.py see below

With everything in place, launch the TensorZero Gateway using this configuration. You can use the same Docker Compose configuration as the Quick Start (available below for convenience).

Docker Compose Configuration

# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment

services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.12-alpine
    environment:
      - CLICKHOUSE_USER=chuser
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
      - CLICKHOUSE_PASSWORD=chpassword
    ports:
      - "8123:8123"
    healthcheck:
      test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping
      start_period: 30s
      start_interval: 1s
      timeout: 1s

  gateway:
    image: tensorzero/gateway
    volumes:
      # Mount our tensorzero.toml file into the container
      - ./config:/app/config:ro
    command: --config-file /app/config/tensorzero.toml
    environment:
      - TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero
      - OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
    ports:
      - "3000:3000"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      clickhouse:
        condition: service_healthy

Let’s launch everything.

docker compose up

Structured Inference

Let’s update our original Python script to leverage our schema.

Instead of sending the entire prompt in our inference request, now we only need to provide an object with the variables we need.

from tensorzero import TensorZeroGateway


def generate_haiku(topic):
    with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
        return client.inference(
            function_name="generate_haiku_with_topic",
            input={
                "messages": [
                    {
                        "role": "user",
                        "content": [{"type": "text", "arguments": {"topic": topic}}],
                    }
                ],
            },
        )


print(generate_haiku("artificial intelligence"))

Sample Output

ChatInferenceResponse(
    inference_id=UUID('019224df-c073-7981-af9e-5a9c91533eae'),
    episode_id=UUID('019224df-bdcf-71e1-9fa4-db87fda4c632'),
    variant_name='gpt_4o_mini',
    content=[
        Text(
            type='text',
            text='Wires hum with knowledge,  \nSilent thoughts in circuits flow,  \nDreams of steel and code.'
        )
    ],
    usage=Usage(
        input_tokens=15,
        output_tokens=21
    )
)

import openai


def generate_haiku(topic):
    with openai.OpenAI(base_url="http://localhost:3000/openai/v1") as client:
        return client.chat.completions.create(
            model="tensorzero::function_name::generate_haiku_with_topic",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "tensorzero::arguments": {"topic": topic}}
                    ],
                },
            ],
        )


print(generate_haiku("artificial intelligence"))

curl -X POST http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "function_name": "generate_haiku_with_topic",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "arguments": {
                "topic": "artificial intelligence"
              }
            }
          ]
        }
      ]
    }
  }'

Like in the Quick Start, the gateway will store inference data in our database. But this time, the input field will be structured according to our schema.

Let’s check our database.

curl "http://localhost:8123/" \
  -d "SELECT * FROM tensorzero.ChatInference
      WHERE function_name = 'generate_haiku_with_topic'
      ORDER BY timestamp DESC
      LIMIT 1
      FORMAT Vertical"

Sample Output

Row 1:
──────
id:                 019224df-c073-7981-af9e-5a9c91533eae
function_name:      generate_haiku_with_topic
variant_name:       gpt_4o_mini
episode_id:         019224df-bdcf-71e1-9fa4-db87fda4c632
input:              {"messages":[{"role":"user","content":[{"type":"text","value":{"topic":"artificial intelligence"}}]}]}
output:             [{"type":"text","text":"Wires hum with knowledge,  \nSilent thoughts in circuits flow,  \nDreams of steel and code."}]
tool_params:
inference_params:   {"chat_completion":{}}
processing_time_ms: 782

Conclusion & Next Steps

Now we can manage our prompts as configuration files, and get structured inference data from the gateway!

As discussed, it’s helpful to manage prompts in a centralized way. With TensorZero’s approach, these prompts still live in your repository, which simplifies versioning, access control, GitOps, and more. This setup also let us easily benefit from more advanced features like A/B testing different prompts or fine-tuning a model.