Episodes

An episode is a sequence of inferences associated with a common downstream outcome.

For example, an episode could refer to a sequence of LLM calls associated with:

Resolving a support ticket
Preparing an insurance claim
Completing a phone call
Extracting data from a document
Drafting an email

An episode will include one or more functions, and sometimes multiple calls to the same function. Your application can run arbitrary actions (e.g. interact with users, retrieve documents, actuate robotics) between function calls within an episode. Though these are outside the scope of TensorZero, it is fine (and encouraged) to build your LLM systems this way.

The /inference endpoint accepts an optional episode_id field. When you make the first inference request, you don’t have to provide an episode_id. The gateway will create a new episode for you and return the episode_id in the response. When you make the second inference request, you must provide the episode_id you received in the first response. The gateway will use the episode_id to associate the two inference requests together.

Scenario

In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence.

Imagine we want to separately generate some commentary about the haiku, and present both pieces of content to users. We can associate both inferences with the same episode.

Let’s define an additional function in our configuration file.

[functions.analyze_haiku]
type = "chat"

[functions.analyze_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"

Full Configuration

[models.gpt_4o_mini]
routing = ["openai"]

[models.gpt_4o_mini.providers.openai]
type = "openai"
model_name = "gpt-4o-mini"

[functions.generate_haiku]
type = "chat"

[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"

[functions.analyze_haiku]
type = "chat"

[functions.analyze_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "gpt_4o_mini"

Inferences & Episodes

This time, we’ll create a multi-step workflow that first generates a haiku and then analyzes it. We won’t provide an episode_id in the first inference request, so the gateway will generate a new one for us. We’ll then use that value in our second inference request.

from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    haiku_response = client.inference(
        function_name="generate_haiku",
        # We don't provide an episode_id for the first inference in the episode
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "Write a haiku about artificial intelligence.",
                }
            ]
        },
    )

    print(haiku_response)

    # When we don't provide an episode_id, the gateway will generate a new one for us
    episode_id = haiku_response.episode_id

    # In a production application, we'd first validate the response to ensure the model returned the correct fields
    haiku = haiku_response.content[0].text

    analysis_response = client.inference(
        function_name="analyze_haiku",
        # For future inferences in that episode, we provide the episode_id that we received
        episode_id=episode_id,
        input={
            "messages": [
                {
                    "role": "user",
                    "content": f"Write a one-paragraph analysis of the following haiku:\n\n{haiku}",
                }
            ]
        },
    )

    print(analysis_response)

Sample Output

ChatInferenceResponse(
    inference_id=UUID('01921116-0fff-7272-8245-16598966335e'),
    episode_id=UUID('01921116-0cd9-7d10-a9a6-d5c8b9ba602a'),
    variant_name='gpt_4o_mini',
    content=[
        Text(
            type='text',
            text='Silent circuits pulse,\nWhispers of thought in code bloom,\nMachines dream of us.',
        ),
    ],
    usage=Usage(
        input_tokens=15,
        output_tokens=20,
    ),
)

ChatInferenceResponse(
    inference_id=UUID('01921116-1862-7ea1-8d69-131984a4625f'),
    episode_id=UUID('01921116-0cd9-7d10-a9a6-d5c8b9ba602a'),
    variant_name='gpt_4o_mini',
    content=[
        Text(
            type='text',
            text='This haiku captures the intricate and intimate relationship between technology and human consciousness. '
                 'The phrase "Silent circuits pulse" evokes a sense of quiet activity within machines, suggesting that '
                 'even in their stillness, they possess an underlying vibrancy. The imagery of "Whispers of thought in '
                 'code bloom" personifies the digital realm, portraying lines of code as organic ideas that grow and '
                 'evolve, hinting at the potential for artificial intelligence to derive meaning or understanding from '
                 'human input. Finally, "Machines dream of us" introduces a poignant juxtaposition between human '
                 'creativity and machine logic, inviting contemplation about the nature of thought and consciousness '
                 'in both realms. Overall, the haiku encapsulates a profound reflection on the emergent sentience of '
                 'technology and the deeply interwoven future of humanity and machines.',
        ),
    ],
    usage=Usage(
        input_tokens=39,
        output_tokens=155,
    ),
)

Conclusion & Next Steps

Episodes are first-class citizens in TensorZero that enable powerful workflows for multi-step LLM systems. You can use them alongside other features like experimentation, metrics & feedback, and tool use (function calling). For example, you can track KPIs for entire episodes instead of individual inferences, and later jointly optimize your LLMs to maximize these metrics.