POST /openai/v1/chat/completions
The /openai/v1/chat/completions endpoint allows TensorZero users to make TensorZero inferences with the OpenAI client.
The gateway translates the OpenAI request parameters into the arguments expected by the inference endpoint and calls the same underlying implementation.
This endpoint supports most of the features supported by the inference endpoint, but there are some limitations.
Most notably, this endpoint doesn’t support dynamic credentials, so they must be specified with a different method.
Request
The OpenAI-compatible inference endpoints translate the OpenAI request parameters into the arguments expected by theinference endpoint.
TensorZero-specific parameters are prefixed with tensorzero:: (e.g. tensorzero::episode_id).
These fields should be provided as extra body parameters in the request body.
tensorzero::cache_options
- Type: object
- Required: no
enabled(string): The cache mode. Can be one of:"write_only"(default): Only write to cache but don’t serve cached responses"read_only": Only read from cache but don’t write new entries"on": Both read from and write to cache"off": Disable caching completely
max_age_s(integer or null): Maximum age in seconds for cache entries to be considered valid when reading from cache. Does not set a TTL for cache expiration. Default isnull(no age limit).
extra_body.
See the Inference Caching guide for more details.
tensorzero::credentials
- Type: object (a map from dynamic credential names to API keys)
- Required: no (default: no credentials)
dynamic location (e.g. dynamic::my_dynamic_api_key_name).
See the configuration reference for more details.
The gateway expects the credentials to be provided in the credentials field of the request body as specified below.
The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.
Example
Example
tensorzero::deny_unknown_fields
- Type: boolean
- Required: no (default:
false)
true, the gateway will return an error if the request contains any unknown or unrecognized fields.
By default, unknown fields are ignored with a warning logged.
This field does not affect the tensorzero::extra_body field, only unknown fields at the root of the request body.
This field should be provided as an extra body parameter in the request body.
tensorzero::dryrun
- Type: boolean
- Required: no
true, the inference request will be executed but won’t be stored to the database.
The gateway will still call the downstream model providers.
This field is primarily for debugging and testing, and you should generally not use it in production.
This field should be provided as an extra body parameter in the request body.
tensorzero::episode_id
- Type: UUID
- Required: no
tensorzero::extra_body
- Type: array of objects (see below)
- Required: no
tensorzero::extra_body field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have three fields:
variant_nameormodel_provider_name: The modification will only be applied to the specified variant or model providerpointer: A JSON Pointer string specifying where to modify the request body- One of the following:
value: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.
Example
Example
If TensorZero would normally send this request body to the provider……then the following …overrides the request body (for
extra_body in the inference request…my_variant only) to:tensorzero::extra_headers
- Type: array of objects (see below)
- Required: no
tensorzero::extra_headers field allows you to modify the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
Each object in the array must have three fields:
variant_nameormodel_provider_name: The modification will only be applied to the specified variant or model providername: The name of the header to modifyvalue: The value to set the header to
Example
Example
If TensorZero would normally send the following request headers to the provider……then the following …overrides the request headers to:
extra_headers…tensorzero::params
- Type: object
- Required: no
chat_completion field containing any of the following parameters:
frequency_penalty(float): Penalizes tokens based on their frequencyjson_mode(object): Controls JSON output formattingmax_tokens(integer): Maximum number of tokens to generatepresence_penalty(float): Penalizes tokens based on their presencereasoning_effort(string): Effort level for reasoning modelsseed(integer): Random seed for deterministic outputsservice_tier(string): Service tier for the requeststop_sequences(list of strings): Sequences that stop generationtemperature(float): Controls randomness in the outputthinking_budget_tokens(integer): Token budget for thinking/reasoningtop_p(float): Nucleus sampling parameterverbosity(string): Output verbosity level
When using the OpenAI-compatible endpoint, parameters provided directly in the request body (e.g., top-level
temperature, max_tokens) take precedence over values specified in tensorzero::params.Example
Example
tensorzero::provider_tools
- Type: array of objects
- Required: no (default:
[])
scope(object, optional): Limits which model/provider combination can use this tool. If omitted, the tool is available to all compatible providers.model_name(string): The model name as defined in your configurationmodel_provider_name(string): The provider name for that model
tool(object, required): The provider-specific tool configuration as defined by the provider’s API
extra_body.
This field allows for dynamic provider tool configuration at runtime.
You should prefer to define provider tools in the configuration file if possible (see Configuration Reference).
Only use this field if dynamic provider tool configuration is necessary for your use case.
Example: OpenAI Web Search (Unscoped)
Example: OpenAI Web Search (Unscoped)
Example: OpenAI Web Search (Scoped)
Example: OpenAI Web Search (Scoped)
gpt-5-mini model.tensorzero::tags
- Type: flat JSON object with string keys and values
- Required: no
{"user_id": "123"} or {"author": "Alice"}.
frequency_penalty
- Type: float
- Required: no (default:
null)
frequency_penalty setting for any chat completion variants being used.
max_completion_tokens
- Type: integer
- Required: no (default:
null)
max_tokens are set, the smaller value is used.
max_tokens
- Type: integer
- Required: no (default:
null)
max_completion_tokens are set, the smaller value is used.
messages
- Type: list
- Required: yes
role(required): The role of the message sender in an OpenAI message (assistant,system,tool, oruser).content(required foruserandsystemmessages and optional forassistantandtoolmessages): The content of the message. The content must be either a string or an array of content blocks (see below).tool_calls(optional forassistantmessages, otherwise disallowed): A list of tool calls. Each tool call is an object with the following fields:id: A unique identifier for the tool calltype: The type of tool being called (currently only"function"is supported)function: An object containing:name: The name of the function to callarguments: A JSON string containing the function arguments
tool_call_id(required fortoolmessages, otherwise disallowed): The ID of the tool call to associate with the message. This should be one that was originally returned by the gateway in a tool callidfield.
text, image_url, or TensorZero-specific types.
If the content block has type text, it must have either of the following additional fields:
text: The text for the content block.tensorzero::arguments: A JSON object containing the function arguments for TensorZero functions with templates and schemas (see Create a prompt template for details).
image_url, it must have the following additional fields:
"image_url": A JSON object with the following fields:url: The URL for a remote image (e.g."https://example.com/image.png") or base64-encoded data for an embedded image (e.g."data:image/png;base64,...").detail(optional): Controls the fidelity of image processing. Only applies to image files; ignored for other file types. Can below,high, orauto. Affects token consumption and image quality.
tensorzero::raw_text: Bypasses templates and schemas, sending text directly to the model. Useful for testing prompts or dynamic injection without configuration changes. Must have avaluefield containing the text.tensorzero::template: Explicitly specify a template to use. Must havenameandargumentsfields.
model
- Type: string
- Required: yes
| To call… | Use this format… |
A function defined as [functions.my_function] in your
tensorzero.toml configuration file | tensorzero::function_name::my_function |
A model defined as [models.my_model] in your tensorzero.toml
configuration file | tensorzero::model_name::my_model |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see below) | tensorzero::model_name::{provider_type}::{model_name} |
tensorzero.toml
tensorzero::function_name::extract-datacalls theextract-datafunction defined above.tensorzero::model_name::gpt-4ocalls thegpt-4omodel in your configuration, which supports fallback fromopenaitoazure. See Retries & Fallbacks for details.tensorzero::model_name::openai::gpt-4ocalls the OpenAI API directly for thegpt-4omodel, ignoring thegpt-4omodel defined above.
parallel_tool_calls
- Type: boolean
- Required: no (default:
null)
parallel_tool_calls setting for the function being called.
presence_penalty
- Type: float
- Required: no (default:
null)
presence_penalty setting for any chat completion variants being used.
response_format
- Type: either a string or an object
- Required: no (default:
null)
"text", "json_object", and "{"type": "json_schema", "schema": ...}", where the schema field contains a valid JSON schema.
This field is not actually respected except for the "json_schema" variant, in which the schema field can be used to dynamically set the output schema for a json function.
seed
- Type: integer
- Required: no (default:
null)
seed setting for any chat completion variants being used.
stop_sequences
- Type: list of strings
- Required: no (default:
null)
stop_sequences setting for any chat completion variants being used.
stream
- Type: boolean
- Required: no (default:
false)
stream_options
- Type: object with field
"include_usage" - Required: no (default:
null)
"include_usage" is true, the gateway will include usage information in the response.
Example
Example
If the following …then the gateway will include usage information in the response.
stream_options is provided…temperature
- Type: float
- Required: no (default:
null)
temperature setting for any chat completion variants being used.
tools
- Type: list of
toolobjects (see below) - Required: no (default:
null)
tool object has the following structure:
type: Must be"function"function: An object containing:name: The name of the function (string, required)description: A description of what the function does (string, optional)parameters: A JSON Schema object describing the function’s parameters (required)strict: Whether to enforce strict schema validation (boolean, defaults to false)
tool_choice
- Type: string or object
- Required: no (default:
"none"if no tools are present,"auto"if tools are present)
"none": The model will not call any tool and instead generates a message"auto": The model can pick between generating a message or calling one or more tools"required": The model must call one or more tools{"type": "function", "function": {"name": "my_function"}}: Forces the model to call the specified tool{"type": "allowed_tools", "allowed_tools": {"tools": [...], "mode": "auto"|"required"}}: Restricts which tools can be called
top_p
- Type: float
- Required: no (default:
null)
top_p setting for any chat completion variants being used.
tensorzero::variant_name
- Type: string
- Required: no
Response
- Regular
- Streaming
In regular (non-streaming) mode, the response is a JSON object with the following fields:
choices
- Type: list of
choiceobjects, where each choice contains:index: A zero-based index indicating the choice’s position in the list (integer)finish_reason: Always"stop".message: An object containing:content: The message content (string, optional)tool_calls: List of tool calls made by the model (optional). The format is the same as in the request.role: The role of the message sender (always"assistant").
created
- Type: integer
episode_id
- Type: UUID
id
- Type: UUID
model
- Type: string
object
- Type: string
"chat.completion").system_fingerprint
- Type: string
usage
- Type: object
prompt_tokens: Number of tokens in the prompt (integer)completion_tokens: Number of tokens in the completion (integer)total_tokens: Total number of tokens used (integer)
Examples
Chat Function with Structured System Prompt
Chat Function with Structured System Prompt
Chat Function with Structured System Prompt
Configuration
Request
- Python
- HTTP
POST /inference
Response
- Regular
- Streaming
POST /inference
Chat Function with Dynamic Tool Use
Chat Function with Dynamic Tool Use
Chat Function with Dynamic Tool Use
Configuration
Request
- Python
- HTTP
POST /inference
Response
- Regular
- Streaming
POST /inference
Json Function with Dynamic Output Schema
JSON Function with Dynamic Output Schema
JSON Function with Dynamic Output Schema
Configuration
Request
- Python
- HTTP
POST /inference
Response
- Regular
- Streaming
POST /inference