The TensorZero Gateway is designed from the ground up with performance in mind. Even with default settings, the gateway is fast and lightweight enough to be unnoticeable in most applications. The best practices below are designed to help you optimize the performance of the TensorZero Gateway for production deployments requiring maximum performance.
The TensorZero Gateway can achieve <1ms P99 latency overhead at 10,000+ QPS. See Benchmarks for details.

Best practices

Observability data collection strategy

By default, the gateway takes a conservative approach to observability data durability, ensuring that data is persisted in ClickHouse before sending a response to the client. This strategy provides a consistent and reliable experience but can introduce latency overhead. For scenarios where latency and throughput are critical, the gateway can be configured to sacrifice data durability guarantees for better performance. If latency is critical for your application, you can enable gateway.observability.async_writes or gateway.observability.batch_writes. With either of these settings, the gateway will return the response to the client immediately and asynchronously insert data into ClickHouse. The former will immediately insert each row individually, while the latter will batch multiple rows together for more efficient writes. As a rule of thumb, consider the following decision matrix:
High throughputLow throughput
Latency is criticalbatch_writesasync_writes
Latency is not criticalbatch_writesDefault strategy
See the Configuration Reference for more details.

Other recommendations

  • Ensure your application, the TensorZero Gateway, and ClickHouse are deployed in the same region to minimize network latency.
  • Initialize the client once and reuse it as much as possible, to avoid initialization overhead and to keep the connection alive.