Learn how to use inference caching with TensorZero Gateway.
write_only
(default): Only write to cache but don’t serve cached responsesread_only
: Only read from cache but don’t write new entrieson
: Both read from and write to cacheoff
: Disable caching completelymax_age_s
parameter applies to the retrieval of cached responses.
The cache does not automatically delete old entries (i.e. not a TTL).