An episode is a sequence of inferences associated with a common downstream outcome.
For example, an episode could refer to a sequence of LLM calls associated with:
Resolving a support ticket
Preparing an insurance claim
Completing a phone call
Extracting data from a document
Drafting an email
An episode will include one or more functions, and sometimes multiple calls to the same function.
Your application can run arbitrary actions (e.g. interact with users, retrieve documents, actuate robotics) between function calls within an episode.
Though these are outside the scope of TensorZero, it is fine (and encouraged) to build your LLM systems this way.
The /inference endpoint accepts an optional episode_id field.
When you make the first inference request, you don’t have to provide an episode_id.
The gateway will create a new episode for you and return the episode_id in the response.
When you make the second inference request, you must provide the episode_id you received in the first response.
The gateway will use the episode_id to associate the two inference requests together.
Scenario
In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence.
Imagine we want to separately generate some commentary about the haiku, and present both pieces of content to users.
We can associate both inferences with the same episode.
Let’s define an additional function in our configuration file.
Full Configuration
Inferences & Episodes
This time, we’ll create a multi-step workflow that first generates a haiku and then analyzes it.
We won’t provide an episode_id in the first inference request, so the gateway will generate a new one for us.
We’ll then use that value in our second inference request.
Sample Output
Conclusion & Next Steps
Episodes are first-class citizens in TensorZero that enable powerful workflows for multi-step LLM systems.
You can use them alongside other features like experimentation, metrics & feedback, and tool use (function calling).
For example, you can track KPIs for entire episodes instead of individual inferences, and later jointly optimize your LLMs to maximize these metrics.