How to query historical inferences

You can query historical inferences to analyze model behavior, debug issues, export data for fine-tuning, and more. The TensorZero UI provides an interface to browse and filter historical inferences. You can also query historical inferences programmatically using the TensorZero Gateway.

You can find a complete runnable example of this guide on GitHub.

Query historical inferences by ID

HTTP POST /v1/inferences/get_inferences TensorZero SDK client.get_inferences(...) Retrieve specific inferences when you know their IDs.

Request

ids

string[]

required

List of inference IDs (UUIDs) to retrieve.

function_name

string

Filter by function name. Including this improves query performance since function_name is the first column in the ClickHouse primary key.

output_source

string

default:"inference"

Source of the output to return:

"inference": Returns the original model output
"demonstration": Returns human-curated feedback output (ignores inferences without one)
"none": Returns the inference without output

Example

TensorZero Python SDK
HTTP

You can retrieve inferences by ID using the TensorZero Python SDK.

from tensorzero import TensorZeroGateway

t0 = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")

t0.get_inferences(ids=["00000000-0000-0000-0000-000000000000"])

You can retrieve inferences by ID using the HTTP API.

curl -X POST http://localhost:3000/v1/inferences/get_inferences \
  -H "Content-Type: application/json" \
  -d '{"ids": ["00000000-0000-0000-0000-000000000000"]}'

Response

inferences

StoredInference[]

Hide StoredInference properties

dispreferred_outputs

array

Outputs marked as dispreferred via feedback. This field is only available if you set output_source to demonstration. It is primarily used for preference-based optimization (e.g. DPO).

episode_id

string

Episode (UUID) this inference belongs to.

function_name

string

Name of the function called.

inference_id

string

Unique identifier (UUID) for the inference.

inference_params

InferenceParams

Parameters like temperature, max_tokens, etc.

input

StoredInput

The input provided (system prompt, messages).

output

varies

The inference output (content blocks for chat, JSON for json).

processing_time_ms

integer

Total processing time in milliseconds.

Query historical inferences with filters

List inferences with filtering, pagination, and sorting. HTTP POST /v1/inferences/list_inferences TensorZero SDK client.list_inferences(request=ListInferencesRequest(...))

Request

after

string

Cursor pagination: get inferences after this ID (exclusive). Cannot be used with before or offset.

before

string

Cursor pagination: get inferences before this ID (exclusive). Cannot be used with after or offset.

episode_id

string

Filter by episode ID (UUID).

filters

InferenceFilter

Advanced filtering by metrics, tags, time, and demonstration feedback. Filters can be combined using logical operators (and, or, not).

Show filter types

and

object

Logical AND of multiple filters.

Show properties

children

InferenceFilter[]

required

Array of filters to AND together.

type

string

required

Must be "and".

boolean_metric

object

Filter by boolean metrics.

Show properties

metric_name

string

required

Name of the metric.

type

string

required

Must be "boolean_metric".

value

boolean

required

Value to match (true or false).

demonstration_feedback

object

Filter by whether demonstration feedback exists.

Show properties

has_demonstration

boolean

required

Whether the inference has demonstration feedback.

type

string

required

Must be "demonstration_feedback".

float_metric

object

Filter by numeric metric values.

Show properties

comparison_operator

string

required

One of <, <=, =, >, >=, !=.

metric_name

string

required

Name of the metric.

type

string

required

Must be "float_metric".

value

number

required

Value to compare against.

not

object

Logical NOT of a filter.

Show properties

child

InferenceFilter

required

Filter to negate.

type

string

required

Must be "not".

object

Logical OR of multiple filters.

Show properties

children

InferenceFilter[]

required

Array of filters to OR together.

type

string

required

Must be "or".

tag

object

Filter by tags.

Show properties

comparison_operator

string

required

One of =, !=.

key

string

required

Tag key.

type

string

required

Must be "tag".

value

string

required

Tag value.

time

object

Filter by timestamp.

Show properties

comparison_operator

string

required

One of <, <=, =, >, >=, !=.

time

string

required

Timestamp in RFC 3339 format.

type

string

required

Must be "time".

function_name

string

Filter by function name.

limit

integer

default:20

Maximum number of results to return.

offset

integer

default:0

Pagination offset.

order_by

OrderBy[]

Sort criteria. You can specify multiple sort criteria.

Show sort options

metric

object

Sort by a metric value.

Show properties

string

required

Must be "metric".

name

string

required

Name of the metric to sort by.

direction

string

default:"descending"

"ascending" or "descending".

search_relevance

object

Sort by search relevance (requires search_query_experimental).

Show properties

string

required

Must be "search_relevance".

direction

string

default:"descending"

"ascending" or "descending".

timestamp

object

Sort by creation timestamp.

Show properties

string

required

Must be "timestamp".

direction

string

default:"descending"

"ascending" or "descending".

output_source

string

default:"inference"

Source of the output to return:

"inference": Returns the original model output
"demonstration": Returns human-curated feedback output (ignores inferences without one)
"none": Returns the inference without output

search_query_experimental

string

Full-text search query (experimental, may cause full table scans).

variant_name

string

Filter by variant name.

Example

TensorZero Python SDK
HTTP

You can list inferences with filters using the TensorZero Python SDK.

from tensorzero import TensorZeroGateway, ListInferencesRequest, InferenceFilterTag

t0 = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")

t0.list_inferences(
    request=ListInferencesRequest(
        filters=InferenceFilterTag(
            key="my_tag",
            value="my_value",
            comparison_operator="=",
        ),
        limit=10,
    )
)

You can list inferences with filters using the HTTP API.

curl -X POST http://localhost:3000/v1/inferences/list_inferences \
  -H "Content-Type: application/json" \
  -d '{
    "filters": {
      "type": "tag",
      "key": "my_tag",
      "value": "my_value",
      "comparison_operator": "="
    },
    "limit": 10
  }'

Response

inferences

StoredInference[]

Hide StoredInference properties

dispreferred_outputs

array

Outputs marked as dispreferred via feedback. This field is only available if you set output_source to demonstration. It is primarily used for preference-based optimization (e.g. DPO).

episode_id

string

Episode (UUID) this inference belongs to.

function_name

string

Name of the function called.

inference_id

string

Unique identifier (UUID) for the inference.

inference_params

InferenceParams

Parameters like temperature, max_tokens, etc.

input

StoredInput

The input provided (system prompt, messages).

output

varies

The inference output (content blocks for chat, JSON for json).

processing_time_ms

integer

Total processing time in milliseconds.

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

How to query historical inferences

Query historical inferences by ID

Request

Response

Query historical inferences with filters

Request

Response

Introduction

Gateway

Observability

Optimization

Evaluations

Experimentation

Deployment

Operations

​Query historical inferences by ID

​Request

​Response

​Query historical inferences with filters

​Request

​Response

Query historical inferences by ID

Request

Response

Query historical inferences with filters

Request

Response