API Reference: Dynamic Evaluations

Endpoints & Methods
Starting a dynamic evaluation run
Starting an episode in a dynamic evaluation run
Making inference and feedback calls during a dynamic evaluation run

Dynamic Evaluations focus on evaluating complex workflows that might include multiple TensorZero inference calls, arbitrary application logic, and more. You can initialize and run dynamic evaluations using the TensorZero Gateway, either through the TensorZero client or the gateway’s HTTP API. Unlike static evaluations, dynamic evaluations are not defined in the TensorZero configuration file. See the Dynamic Evaluations Tutorial for a step-by-step guide.

Endpoints & Methods

Starting a dynamic evaluation run

Gateway Endpoint: POST /dynamic_evaluation_run
Client Method: dynamic_evaluation_run
Parameters:
- variants: an object (dictionary) mapping function names to variant names
- project_name (string, optional): the name of the project to associate the run with
- display_name (string, optional): the display (human-readable) name of the run
- tags (dictionary, optional): a dictionary of key-value pairs to tag the run’s inferences with
Returns:
- run_id (UUID): the ID of the run

Starting an episode in a dynamic evaluation run

Gateway Endpoint: POST /dynamic_evaluation_run/{run_id}/episode
Client Method: dynamic_evaluation_run_episode
Parameters:
- run_id (UUID): the ID of the run generated by the dynamic_evaluation_run method
- task_name (string, optional): the name of the task to associate the episode with
- tags (dictionary, optional): a dictionary of key-value pairs to tag the episode’s inferences with
Returns:
- episode_id (UUID): the ID of the episode

Making inference and feedback calls during a dynamic evaluation run

After initializing a run and an episode, you can make inference and feedback API calls like you normally would. By providing the special episode_id parameter generated by the dynamic_evaluation_run_episode method , the TensorZero Gateway will associate the inference and feedback with the evaluation run, handle variant pinning, and more.

Tutorial Run A/B tests

⌘I

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

API Reference: Dynamic Evaluations

Endpoints & Methods

Starting a dynamic evaluation run

Starting an episode in a dynamic evaluation run

Making inference and feedback calls during a dynamic evaluation run

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

​Endpoints & Methods

​Starting a dynamic evaluation run

​Starting an episode in a dynamic evaluation run

​Making inference and feedback calls during a dynamic evaluation run

Endpoints & Methods

Starting a dynamic evaluation run

Starting an episode in a dynamic evaluation run

Making inference and feedback calls during a dynamic evaluation run