Metrics & Feedback
The TensorZero Gateway allows you to assign feedback to inferences or sequences of inferences (episodes).
Feedback captures the downstream outcomes of your LLM application, and drive the experimentation and optimization workflows in TensorZero. For example, you can fine-tune models using data from inferences that led to positive downstream behavior.
Feedback
TensorZero currently supports the following types of feedback:
Feedback Type | Examples |
---|---|
Boolean Metric | Thumbs up, task success |
Float Metric | Star rating, clicks, number of mistakes made |
Comment | Natural-language feedback from users or developers |
Demonstration | Edited drafts, labels, human-generated content |
You can send feedback data to the gateway by using the /feedback
endpoint.
Metrics
You can define metrics in your tensorzero.toml
configuration file.
The skeleton of a metric looks like the following configuration entry.
Rating Haikus
In the Quick Start, we built a simple LLM application that writes haikus about artificial intelligence.
Imagine we wanted to assign 👍 or 👎 to these haikus. Later, we can use this data to fine-tune a model using only haikus that match our tastes.
We should use a metric of type boolean
to capture this behavior since we’re optimizing for a binary outcome: whether we liked the haikus or not.
The metric applies to individual inference requests, so we’ll set level = "inference"
.
And finally, we’ll set optimize = "max"
because we want to maximize this metric.
Our metric configuration should look like this:
Full Configuration
Let’s make an inference call like we did in the Quick Start, and then assign some (positive) feedback to it.
We’ll use the inference response’s inference_id
we receive from the first API call to link the two.
Sample Output
Querying Feedback Data
The TensorZero Gateway stores feedback data in the database, just like with inferences. Let’s query it!
Sample Output
You can easily join feedback data with inference data (using the inference ID or episode ID) in ClickHouse. That’s how TensorZero Recipes collect the data for optimization.
Conclusion & Next Steps
Feedback unlocks powerful workflows in observability, optimization, and experimentation. For example, you might want to fine-tune a model with inference data from haikus that receive positive ratings.
This is exactly what we demonstrate in Writing Haikus to Satisfy a Judge with Hidden Preferences! This complete runnable example fine-tunes GPT-4o Mini to generate haikus tailored to an AI judge with hidden preferences. Continuous improvement over successive fine-tuning runs demonstrates TensorZero’s data and learning flywheel.