Most TensorZero deployments will not require Valkey or Redis.
Deploy
You can self-host Valkey or use a managed Redis-compatible service (e.g. AWS ElastiCache, GCP Memorystore).- Docker Compose
- Docker
Add Valkey to your
docker-compose.yml:docker-compose.yml
If you find any compatibility issues, please open a detailed GitHub Discussion.
Configure
To configure TensorZero to use Valkey, set theTENSORZERO_VALKEY_URL environment variable with your Valkey connection details.
.env
| Scheme | Description |
|---|---|
valkey:// | Standard Valkey/Redis connection |
valkeys:// | TLS-encrypted Valkey/Redis connection |
redis:// | Alias for valkey:// |
rediss:// | Alias for valkeys:// |
Best Practices
Eviction Policy
Rate limiting keys and cache entries have different eviction tolerance:- Rate limiting is a correctness concern — silent eviction resets rate limit state and allows clients to exceed their limits.
- Caching is a performance concern — eviction simply causes a cache miss and the next request re-fetches from the provider.
volatile-ttl eviction policy.
TensorZero sets rate limiting key TTLs to a minimum of 48 hours (compared to the default 24-hour cache TTL), so rate limiting keys are evicted last under memory pressure.
Please configure cache TTL to be less than 48 hours to ensure correct eviction ordering.
For stronger guarantees, use a dedicated Valkey instance for caching with its own eviction policy, while keeping the primary instance configured with noeviction.
Dedicated Valkey Instance for Caching (Optional)
By default, TensorZero uses a single Valkey instance for both rate limiting and model inference caching. If your deployment is under memory pressure, you may want to separate these workloads to prevent cache eviction from disrupting rate limiting state. SetTENSORZERO_VALKEY_CACHE_URL to point caching at a dedicated Valkey instance:
.env
TENSORZERO_VALKEY_CACHE_URL is not set, caching uses the TENSORZERO_VALKEY_URL instance.
This separation lets you configure different eviction policies per instance:
- Primary instance (rate limiting): Use
noevictionpolicy to ensure rate limiting keys are never silently evicted. - Cache instance: Use
volatile-lrupolicy for efficient cache management.