You can query historical inferences to analyze model behavior, debug issues, export data for fine-tuning, and more.
The TensorZero UI provides an interface to browse and filter historical inferences.
You can also query historical inferences programmatically using the TensorZero Gateway.
Query historical inferences by ID
HTTP POST /v1/inferences/get_inferences
TensorZero SDK client.get_inferences(...)
Retrieve specific inferences when you know their IDs.
Request
List of inference IDs (UUIDs) to retrieve.
Filter by function name. Including this improves query performance since
function_name is the first column in the ClickHouse primary key.
output_source
string
default: "inference"
Source of the output to return:
"inference": Returns the original model output
"demonstration": Returns human-curated feedback output (ignores inferences without one)
"none": Returns the inference without output
TensorZero Python SDK
HTTP
You can retrieve inferences by ID using the TensorZero Python SDK. from tensorzero import TensorZeroGateway
t0 = TensorZeroGateway.build_http( gateway_url = "http://localhost:3000" )
t0.get_inferences( ids = [ "00000000-0000-0000-0000-000000000000" ])
You can retrieve inferences by ID using the HTTP API. curl -X POST http://localhost:3000/v1/inferences/get_inferences \
-H "Content-Type: application/json" \
-d '{"ids": ["00000000-0000-0000-0000-000000000000"]}'
Response
Hide StoredInference properties
Outputs marked as dispreferred via feedback. This field is only available
if you set output_source to demonstration. It is primarily used for
preference-based optimization (e.g. DPO).
Episode (UUID) this inference belongs to.
Name of the function called.
Unique identifier (UUID) for the inference.
Parameters like temperature, max_tokens, etc.
The input provided (system prompt, messages).
The inference output (content blocks for chat, JSON for json).
Total processing time in milliseconds.
Key-value tags associated with the inference.
When the inference was made (RFC 3339 format).
Time to first token in milliseconds.
Name of the variant used.
Query historical inferences with filters
List inferences with filtering, pagination, and sorting.
HTTP POST /v1/inferences/list_inferences
TensorZero SDK
client.list_inferences(request=ListInferencesRequest(...))
Request
Cursor pagination: get inferences after this ID (exclusive). Cannot be used
with before or offset.
Cursor pagination: get inferences before this ID (exclusive). Cannot be used
with after or offset.
Filter by episode ID (UUID).
Advanced filtering by metrics, tags, time, and demonstration feedback. Filters can be combined using logical operators (and, or, not). Logical AND of multiple filters. children
InferenceFilter[]
required
Array of filters to AND together.
Filter by boolean metrics. Must be "boolean_metric".
Value to match (true or false).
Filter by whether demonstration feedback exists. Whether the inference has demonstration feedback.
Must be "demonstration_feedback".
Filter by numeric metric values. One of <, <=, =, >, >=, !=.
Value to compare against.
Logical OR of multiple filters. children
InferenceFilter[]
required
Array of filters to OR together.
Filter by timestamp. One of <, <=, =, >, >=, !=.
Timestamp in RFC 3339 format.
Maximum number of results to return.
Sort criteria. You can specify multiple sort criteria. Sort by a metric value. Name of the metric to sort by.
direction
string
default: "descending"
"ascending" or "descending".
Sort by search relevance (requires search_query_experimental). Must be "search_relevance".
direction
string
default: "descending"
"ascending" or "descending".
Sort by creation timestamp. direction
string
default: "descending"
"ascending" or "descending".
output_source
string
default: "inference"
Source of the output to return:
"inference": Returns the original model output
"demonstration": Returns human-curated feedback output (ignores inferences without one)
"none": Returns the inference without output
search_query_experimental
Full-text search query (experimental, may cause full table scans).
TensorZero Python SDK
HTTP
You can list inferences with filters using the TensorZero Python SDK. from tensorzero import TensorZeroGateway, ListInferencesRequest, InferenceFilterTag
t0 = TensorZeroGateway.build_http( gateway_url = "http://localhost:3000" )
t0.list_inferences(
request = ListInferencesRequest(
filters = InferenceFilterTag(
key = "my_tag" ,
value = "my_value" ,
comparison_operator = "=" ,
),
limit = 10 ,
)
)
You can list inferences with filters using the HTTP API. curl -X POST http://localhost:3000/v1/inferences/list_inferences \
-H "Content-Type: application/json" \
-d '{
"filters": {
"type": "tag",
"key": "my_tag",
"value": "my_value",
"comparison_operator": "="
},
"limit": 10
}'
Response
Hide StoredInference properties
Outputs marked as dispreferred via feedback. This field is only available
if you set output_source to demonstration. It is primarily used for
preference-based optimization (e.g. DPO).
Episode (UUID) this inference belongs to.
Name of the function called.
Unique identifier (UUID) for the inference.
Parameters like temperature, max_tokens, etc.
The input provided (system prompt, messages).
The inference output (content blocks for chat, JSON for json).
Total processing time in milliseconds.
Key-value tags associated with the inference.
When the inference was made (RFC 3339 format).
Time to first token in milliseconds.
Name of the variant used.