Skip to main content
Most TensorZero deployments will not require Valkey or Redis.
TensorZero can use a Redis-compatible data store like Valkey as a high-performance backend for rate limiting and inference caching.
We recommend Valkey over Postgres if you’re handling 100+ QPS or have extreme latency requirements. TensorZero’s rate limiting implementation can achieve sub-millisecond P99 latency at 10k+ QPS using Valkey.

Deploy

You can self-host Valkey or use a managed Redis-compatible service (e.g. AWS ElastiCache, GCP Memorystore).
Add Valkey to your docker-compose.yml:
docker-compose.yml
services:
  valkey:
    image: valkey/valkey:8
    ports:
      - "6379:6379"
    volumes:
      - valkey-data:/data

volumes:
  valkey-data:
If you find any compatibility issues, please open a detailed GitHub Discussion.

Configure

To configure TensorZero to use Valkey, set the TENSORZERO_VALKEY_URL environment variable with your Valkey connection details.
.env
TENSORZERO_VALKEY_URL="redis://[hostname]:[port]"

# Example:
TENSORZERO_VALKEY_URL="redis://localhost:6379"
The following URL schemes are supported:
SchemeDescription
valkey://Standard Valkey/Redis connection
valkeys://TLS-encrypted Valkey/Redis connection
redis://Alias for valkey://
rediss://Alias for valkeys://
TensorZero automatically loads the required Lua functions into Valkey on startup. No manual setup is required.
If both TENSORZERO_VALKEY_URL and TENSORZERO_POSTGRES_URL are set, the gateway uses Valkey for rate limiting.

Best Practices

Eviction Policy

Rate limiting keys and cache entries have different eviction tolerance:
  • Rate limiting is a correctness concern — silent eviction resets rate limit state and allows clients to exceed their limits.
  • Caching is a performance concern — eviction simply causes a cache miss and the next request re-fetches from the provider.
If you use a single Valkey instance for both, we recommend the volatile-ttl eviction policy. TensorZero sets rate limiting key TTLs to a minimum of 48 hours (compared to the default 24-hour cache TTL), so rate limiting keys are evicted last under memory pressure. Please configure cache TTL to be less than 48 hours to ensure correct eviction ordering. For stronger guarantees, use a dedicated Valkey instance for caching with its own eviction policy, while keeping the primary instance configured with noeviction.

Dedicated Valkey Instance for Caching (Optional)

By default, TensorZero uses a single Valkey instance for both rate limiting and model inference caching. If your deployment is under memory pressure, you may want to separate these workloads to prevent cache eviction from disrupting rate limiting state. Set TENSORZERO_VALKEY_CACHE_URL to point caching at a dedicated Valkey instance:
.env
# Primary instance for rate limiting (and caching if no separate cache URL)
TENSORZERO_VALKEY_URL="redis://valkey:6379"

# Dedicated instance for model inference caching
TENSORZERO_VALKEY_CACHE_URL="redis://valkey-cache:6379"
If TENSORZERO_VALKEY_CACHE_URL is not set, caching uses the TENSORZERO_VALKEY_URL instance. This separation lets you configure different eviction policies per instance:
  • Primary instance (rate limiting): Use noeviction policy to ensure rate limiting keys are never silently evicted.
  • Cache instance: Use volatile-lru policy for efficient cache management.

Durability

A critical failure of Valkey (e.g. server crash) may result in loss of rate limiting data since the last backup. This is generally tolerable if your rate limiting windows are short (e.g. minutes), but if you require precise limits or longer time windows, we recommend configuring recurring RDB (point-in-time) snapshots for improved durability.