The TensorZero Gateway is designed from the ground up with performance in mind. Even with default settings, the gateway is fast and lightweight enough to be unnoticeable in most applications. The best practices below are designed to help you optimize the performance of the TensorZero Gateway for production deployments requiring maximum performance.Documentation Index
Fetch the complete documentation index at: https://www.tensorzero.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Best practices
Observability data collection strategy
By default, the gateway usesasync_writes to write observability data asynchronously, returning the response to the client immediately without waiting for database writes to complete.
Each database insert is handled immediately in separate background tasks.
For high-throughput applications, you can use gateway.observability.batch_writes instead, which collects multiple records and writes them together in batches for more efficient writes.
If you need strict data durability guarantees (ensuring data is persisted in the database before sending a response), you can disable async writes by setting gateway.observability.async_writes = false.
As a rule of thumb, consider the following decision matrix:
| High throughput | Low throughput | |
|---|---|---|
| Latency is critical | batch_writes | async_writes (default) |
| Latency is not critical | batch_writes | Synchronous writes |
Other recommendations
- Ensure your application, the TensorZero Gateway, and database are deployed in the same region to minimize network latency.
- Initialize the client once and reuse it as much as possible, to avoid initialization overhead and to keep the connection alive.