- Send images and files to any LLM. TensorZero supports multimodal inputs like images, PDFs, and more across all major providers.
- Use remote file URLs or base64-encoded data. You can reference files hosted online or embed them directly in requests.
- Configure file storage for observability. Store files in S3-compatible object storage, the local filesystem, or disable storage entirely.
Configure file storage
TensorZero can store files used during multimodal inference for observability and other downstream workflows.You can configure the object storage service in the See Configuration Reference for more details.
object_storage section of the configuration file.
For production, we recommend using an S3-compatible object storage service.
For local development, you can also use the filesystem.
If you don’t need to store files, you can disable object storage entirely.- Object Storage (Recommended)
- Filesystem
- Disabled
TensorZero supports any S3-compatible object storage service, including AWS S3, GCP Cloud Storage, Cloudflare R2, and many more.The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:
S3_ACCESS_KEY_IDandS3_SECRET_ACCESS_KEYenvironment variablesAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables- Credentials from the AWS SDK (default profile)
Deploy TensorZero
We’ll use Docker Compose to deploy the TensorZero Gateway, ClickHouse, and MinIO (an open-source S3-compatible object storage service).
See Deploy the TensorZero Gateway and Deploy ClickHouse for production deployment instructions.
docker-compose.yml
docker-compose.yml
Call LLMs with file inputs
The TensorZero Gateway accepts both embedded files (encoded as base64 strings) and remote files (specified by a URL).See Integrations for a list of providers that support multimodal inference.
- Python (OpenAI SDK)
- Node (OpenAI SDK)
- HTTP
You can use the OpenAI Python SDK to send images to the TensorZero Gateway.
Advanced
Tune the image detail level
When working with image files, you can optionally specify adetail parameter to control the fidelity of image processing.
This parameter accepts three values: low, high, or auto.
The detail parameter only applies to image files and is ignored for other file types like PDFs or audio files.
Using low detail reduces token consumption and processing time at the cost of image quality, while high detail provides better image quality but consumes more tokens.
The auto setting allows the model provider to automatically choose the appropriate detail level based on the image characteristics.
Ensure reproducibility by fetching input files before inference
By default, the TensorZero Gateway forwards remote file URLs directly to the model provider and fetches them separately for observability in parallel with inference. This means that in rare cases, the file the model provider fetches may differ from the one TensorZero stores (e.g. if the file at the URL changes between the two fetches). To ensure that TensorZero and the model provider see identical inputs, you can setgateway.fetch_and_encode_input_files_before_inference = true in your configuration.
When enabled, the gateway will fetch remote input files and send them as base64-encoded payloads in the prompt.
This is recommended for if you require strict observability and reproducibility.
See Configuration Reference for more details.