Multimodal Inference

TensorZero Gateway supports multimodal inference (e.g. image and PDF inputs). See Integrations for a list of supported models.

You can also find the runnable code for this example on GitHub.

Setup

Object Storage

TensorZero uses object storage to store files (e.g. images, PDFs) used during multimodal inference. It supports any S3-compatible object storage service, including AWS S3, GCP Cloud Storage, Cloudflare R2, and many more. You can configure the object storage service in the object_storage section of the configuration file. In this example, we’ll use a local deployment of MinIO, an open-source S3-compatible object storage service.

[object_storage]
type = "s3_compatible"
endpoint = "http://minio:9000"  # optional: defaults to AWS S3
# region = "us-east-1"  # optional: depends on your S3-compatible storage provider
bucket_name = "tensorzero"  # optional: depends on your S3-compatible storage provider
# IMPORTANT: for production environments, remove the following setting and use a secure method of authentication in
# combination with a production-grade object storage service.
allow_http = true

You can also store files in a local directory (type = "filesystem") or disable file storage (type = "disabled"). See Configuration Reference for more details. The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:

S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY environment variables
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
Credentials from the AWS SDK (default profile)

Docker Compose

We’ll use Docker Compose to deploy the TensorZero Gateway, ClickHouse, and MinIO.

`docker-compose.yml`

# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/deployment/tensorzero-gateway

services:
  clickhouse:
    image: clickhouse:24.12-alpine
    environment:
      CLICKHOUSE_USER: chuser
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
      CLICKHOUSE_PASSWORD: chpassword
    ports:
      - "8123:8123"
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    healthcheck:
      test: wget --spider --tries 1 http://chuser:chpassword@clickhouse:8123/ping
      start_period: 30s
      start_interval: 1s
      timeout: 1s

  gateway:
    image: tensorzero/gateway
    volumes:
      # Mount our tensorzero.toml file into the container
      - ./config:/app/config:ro
    command: --config-file /app/config/tensorzero.toml
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
      S3_ACCESS_KEY_ID: miniouser
      S3_SECRET_ACCESS_KEY: miniopassword
      TENSORZERO_CLICKHOUSE_URL: http://chuser:chpassword@clickhouse:8123/tensorzero
    ports:
      - "3000:3000"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      clickhouse:
        condition: service_healthy
      minio:
        condition: service_healthy

  # For a production deployment, you can use AWS S3, GCP Cloud Storage, Cloudflare R2, etc.
  minio:
    image: bitnamilegacy/minio:2025.7.23
    ports:
      - "9000:9000" # API port
      - "9001:9001" # Console port
    environment:
      MINIO_ROOT_USER: miniouser
      MINIO_ROOT_PASSWORD: miniopassword
      MINIO_DEFAULT_BUCKETS: tensorzero
    healthcheck:
      test: "mc ls local/tensorzero || exit 1"
      start_period: 30s
      start_interval: 1s
      timeout: 1s

volumes:
  clickhouse-data:

Inference

With the setup out of the way, you can now use the TensorZero Gateway to perform multimodal inference. The TensorZero Gateway accepts both embedded files (encoded as base64 strings) and remote files (specified by a URL).

Python
Python (OpenAI)
HTTP

from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(
    gateway_url="http://localhost:3000",
) as client:
    response = client.inference(
        model_name="openai::gpt-4o-mini",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "Do the images share any common features?",
                        },
                        # Remote image of Ferris the crab
                        {
                            "type": "file",
                            "file_type": "url",
                            "url": "https://raw.githubusercontent.com/tensorzero/tensorzero/eac2a230d4a4db1ea09e9c876e45bdb23a300364/tensorzero-core/tests/e2e/providers/ferris.png",
                        },
                        # One-pixel orange image encoded as a base64 string
                        {
                            "type": "file",
                            "file_type": "base64",
                            "mime_type": "image/png",
                            "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
                        },
                    ],
                }
            ],
        },
    )

    print(response)

Image Detail Parameter

When working with image files, you can optionally specify a detail parameter to control the fidelity of image processing. This parameter accepts three values: low, high, or auto. The detail parameter only applies to image files and is ignored for other file types like PDFs or audio files. Using low detail reduces token consumption and processing time at the cost of image quality, while high detail provides better image quality but consumes more tokens. The auto setting allows the model provider to automatically choose the appropriate detail level based on the image characteristics.

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

Setup

Object Storage

Docker Compose

Inference

Image Detail Parameter

Introduction

Gateway

Optimization

Evaluations

Experimentation

Deployment

Operations

​Setup

​Object Storage

​Docker Compose

​Inference

​Image Detail Parameter

Setup

Object Storage

Docker Compose

Inference

Image Detail Parameter