Usage
We provide atensorzero/evaluations
Docker image for easy usage.
We strongly recommend using TensorZero Evaluations CLI with Docker Compose to keep things simple.
docker-compose.yml
Building from Source
Building from Source
You can build the TensorZero Evaluations CLI from source if necessary. See our GitHub repository for instructions.
Inference Caching
TensorZero Evaluations uses Inference Caching to improve inference speed and cost. By default, it will read from and write to the inference cache. Soon, you’ll be able to customize this behavior.Environment Variables
TENSORZERO_CLICKHOUSE_URL
- Example:
TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@localhost:8123/database_name
- Required: yes
Model Provider Credentials
- Example:
OPENAI_API_KEY=sk-...
- Required: no
--gateway-url
flag below), you don’t need to provide these credentials to the evaluations tool.
If you’re using a built-in gateway (no --gateway-url
flag), you must provide same credentials the gateway would use.
See Integrations for more information.
CLI Flags
--config-file PATH
- Example:
--config-file /path/to/tensorzero.toml
- Required: no (default:
./config/tensorzero.toml
)
--concurrency N
- Example:
--concurrency 5
- Required: no (default:
1
)
--dataset-name NAME
(-d
)
- Example:
--dataset-name my_dataset
- Required: yes
--evaluation-name NAME
(-e
)
- Example:
--evaluation-name my_evaluation
- Required: yes
--format FORMAT
(-f
)
- Options:
pretty
,jsonl
- Example:
--format jsonl
- Required: no (default:
pretty
)
jsonl
format if you want to programatically process the evaluation results.
--gateway-url URL
- Example:
--gateway-url http://localhost:3000
- Required: no (default: none)
--inference-cache MODE
- Options:
on
,read_only
,write_only
,off
- Example:
--inference-cache read_only
- Required: no (default:
on
)
--variant-name NAME
(-v
)
This flag specifies the variant to evaluate.
The variant name should be present in your TensorZero configuration file.
Exit Status
The evaluations process exits with a status code of0
if the evaluation was successful, and a status code of 1
if the evaluation failed.
If you configure a cutoff
for any of your evaluators, the evaluation will fail if the average score for any evaluator is below its cutoff.
The exit status code is helpful for integrating TensorZero Evaluations into your CI/CD pipeline.You can define sanity checks for your variants with
cutoff
to detect performance regressions early before shipping to production.