Track token usage
TensorZero automatically collects, normalizes, and reports token usage for every inference. The normalized figures follow the OpenAI behavior (e.g. output tokens include reasoning tokens). If you’re using Inference Caching, the gateway will report usage as 0 for cached inferences. If you need additional usage information from model providers (e.g. prompt caching), you can enableinclude_raw_usage in the inference request.
In that case, the gateway will additionally report provider-specific usage fields without preprocessing.
You can browse usage data for individual inferences as well as aggregated usage statistics per model provider in the TensorZero UI.
Track cost
TensorZero can compute and report cost for LLM inferences with additional configuration.Configure cost tracking for model providers
You can configure cost information by adding acost section to the model provider configuration:
tensorzero.toml
The
pointer is a JSON Pointer into the provider’s response.You can use both pointer_nonstreaming and pointer_streaming instead of pointer if the usage format is different for streaming inferences.You can specify cost using
cost_per_million or cost_per_unit. The latter is useful for features like web search.
You can set negative cost values. This is useful for subtracting discounts (e.g. prompt caching).You can mark an entry as
required. If a provider does not report that field, the gateway will report the cost for that inference as null and log a warning.Cost tracking is not available for short-hand models (e.g.
openai::gpt-5).
Instead, you must explicitly configure the model and the model provider in your configuration, as above.Configure cost tracking for batch inference
You can configure cost for batch inference withbatch_cost alongside the cost field:
tensorzero.toml
Configure cost tracking for embedding models
You can also configurecost for embedding model providers:
tensorzero.toml
Run an inference with cost tracking
Once you’ve configuredcost for a model provider, inference responses will include cost (TensorZero SDK) or tensorzero_cost (OpenAI SDK) in the usage object: