The TensorZero Gateway can achieve <1ms P99 latency overhead at 10,000+ QPS. See Benchmarks for details.
Best practices
Observability data collection strategy
By default, the gateway takes a conservative approach to observability data durability, ensuring that data is persisted in ClickHouse before sending a response to the client. This strategy provides a consistent and reliable experience but can introduce latency overhead. For scenarios where latency and throughput are critical, the gateway can be configured to sacrifice data durability guarantees for better performance. If latency is critical for your application, you can enablegateway.observability.async_writes
or gateway.observability.batch_writes
.
With either of these settings, the gateway will return the response to the client immediately and asynchronously insert data into ClickHouse.
The former will immediately insert each row individually, while the latter will batch multiple rows together for more efficient writes.
As a rule of thumb, consider the following decision matrix:
High throughput | Low throughput | |
---|---|---|
Latency is critical | batch_writes | async_writes |
Latency is not critical | batch_writes | Default strategy |
Other recommendations
- Ensure your application, the TensorZero Gateway, and ClickHouse are deployed in the same region to minimize network latency.
- Initialize the client once and reuse it as much as possible, to avoid initialization overhead and to keep the connection alive.