Skip to content

Multimodal Inference

TensorZero Gateway supports multimodal inference (e.g. image and PDF inputs).

See Integrations for a list of supported models.

Setup

Object Storage

TensorZero uses object storage to store files (e.g. images, PDFs) used during multimodal inference. It supports any S3-compatible object storage service, including AWS S3, GCP Cloud Storage, Cloudflare R2, and many more. You can configure the object storage service in the object_storage section of the configuration file.

In this example, we’ll use a local deployment of MinIO, an open-source S3-compatible object storage service.

[object_storage]
type = "s3_compatible"
endpoint = "http://minio:9000" # optional: defaults to AWS S3
# region = "us-east-1" # optional: depends on your S3-compatible storage provider
bucket_name = "tensorzero" # optional: depends on your S3-compatible storage provider
# IMPORTANT: for production environments, remove the following setting and use a secure method of authentication in
# combination with a production-grade object storage service.
allow_http = true

You can also store files in a local directory (type = "filesystem") or disable file storage (type = "disabled"). See Configuration Reference for more details.

The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:

  1. S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY environment variables
  2. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
  3. Credentials from the AWS SDK (default profile)

Docker Compose

We’ll use Docker Compose to deploy the TensorZero Gateway, ClickHouse, and MinIO.

docker-compose.yml
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
clickhouse:
image: clickhouse/clickhouse-server:24.12-alpine
environment:
- CLICKHOUSE_USER=chuser
- CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
- CLICKHOUSE_PASSWORD=chpassword
ports:
- "8123:8123"
healthcheck:
test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping
start_period: 30s
start_interval: 1s
timeout: 1s
gateway:
image: tensorzero/gateway
volumes:
# Mount our tensorzero.toml file into the container
- ./config:/app/config:ro
command: --config-file /app/config/tensorzero.toml
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
- S3_ACCESS_KEY_ID=miniouser
- S3_SECRET_ACCESS_KEY=miniopassword
- TENSORZERO_CLICKHOUSE_URL=http://chuser:chpassword@clickhouse:8123/tensorzero
ports:
- "3000:3000"
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
clickhouse:
condition: service_healthy
minio:
condition: service_healthy
# For a production deployment, you can use AWS S3, GCP Cloud Storage, Cloudflare R2, etc.
minio:
image: bitnami/minio:2025.4.22
ports:
- "9000:9000" # API port
- "9001:9001" # Console port
environment:
- MINIO_ROOT_USER=miniouser
- MINIO_ROOT_PASSWORD=miniopassword
- MINIO_DEFAULT_BUCKETS=tensorzero
healthcheck:
test: "mc ls local/tensorzero || exit 1"
start_period: 30s
start_interval: 1s
timeout: 1s

Inference

With the setup out of the way, you can now use the TensorZero Gateway to perform multimodal inference.

The TensorZero Gateway accepts both embedded files (encoded as base64 strings) and remote files (specified by a URL).

from tensorzero import TensorZeroGateway
with TensorZeroGateway.build_http(
gateway_url="http://localhost:3000",
) as client:
response = client.inference(
model_name="openai::gpt-4o-mini",
input={
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Do the images share any common features?",
},
# Remote image of Ferris the crab
{
"type": "file",
"url": "https://raw.githubusercontent.com/tensorzero/tensorzero/ff3e17bbd3e32f483b027cf81b54404788c90dc1/tensorzero-internal/tests/e2e/providers/ferris.png",
},
# One-pixel orange image encoded as a base64 string
{
"type": "file",
"mime_type": "image/png",
"data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
},
],
}
],
},
)
print(response)