API reference for the /openai/v1/chat/completions
endpoint.
POST /openai/v1/chat/completions
/openai/v1/chat/completions
endpoint allows TensorZero users to make TensorZero inferences with the OpenAI client.
The gateway translates the OpenAI request parameters into the arguments expected by the inference
endpoint and calls the same underlying implementation.
This endpoint supports most of the features supported by the inference
endpoint, but there are some limitations.
Most notably, this endpoint doesn’t support dynamic credentials, so they must be specified with a different method.
POST /inference
for more details on inference with the native TensorZero API.inference
endpoint.
TensorZero-specific parameters are prefixed with tensorzero::
(e.g. tensorzero::episode_id
).
These fields should be provided as extra body parameters in the request body.
tensorzero.toml
file.
In most cases, these credentials will be environment variables available to the TensorZero gateway — not your OpenAI client.API keys sent from the OpenAI client will be ignored.tensorzero::credentials
dynamic
location (e.g. dynamic::my_dynamic_api_key_name
).
See the configuration reference for more details.
The gateway expects the credentials to be provided in the credentials
field of the request body as specified below.
The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.
Example
tensorzero::dryrun
true
, the inference request will be executed but won’t be stored to the database.
The gateway will still call the downstream model providers.
This field is primarily for debugging and testing, and you should generally not use it in production.
This field should be provided as an extra body parameter in the request body.
tensorzero::episode_id
episode_id
.
If null, the gateway will generate a new episode ID and return it in the response.
Only use episode IDs that were returned by the TensorZero gateway.
This field should be provided as an extra body parameter in the request body.
tensorzero::extra_body
tensorzero::extra_body
field allows you to modify the request body that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
extra_body
field, it will override the request from the client to the gateway.
If you use tensorzero::extra_body
, it will override the request from the gateway to the model provider.variant_name
or model_provider_name
: The modification will only be applied to the specified variant or model providerpointer
: A JSON Pointer string specifying where to modify the request bodyvalue
: The value to insert at that location; it can be of any type including nested typesdelete = true
: Deletes the field at the specified location, if present.extra_body
in the configuration file.
The values provided at inference-time take priority over the values in the configuration file.Example: `tensorzero::extra_body`
extra_body
in the inference request…my_variant
only) to:tensorzero::extra_headers
tensorzero::extra_headers
field allows you to modify the request headers that TensorZero sends to a model provider.
This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.
extra_headers
field, it will override the request from the client to the gateway.
If you use tensorzero::extra_headers
, it will override the request from the gateway to the model provider.variant_name
or model_provider_name
: The modification will only be applied to the specified variant or model providername
: The name of the header to modifyvalue
: The value to set the header toextra_headers
in the configuration file.
The values provided at inference-time take priority over the values in the configuration file.Example: `tensorzero::extra_headers`
extra_headers
…tensorzero::tags
{"user_id": "123"}
or {"author": "Alice"}
.
frequency_penalty
null
)frequency_penalty
setting for any chat completion variants being used.
max_completion_tokens
null
)max_tokens
are set, the smaller value is used.
max_tokens
null
)max_completion_tokens
are set, the smaller value is used.
messages
role
(required): The role of the message sender in an OpenAI message (assistant
, system
, tool
, or user
).content
(required for user
and system
messages and optional for assistant
and tool
messages): The content of the message.
The content must be either a string or an array of content blocks (see below).tool_calls
(optional for assistant
messages, otherwise disallowed): A list of tool calls. Each tool call is an object with the following fields:
id
: A unique identifier for the tool calltype
: The type of tool being called (currently only "function"
is supported)function
: An object containing:
name
: The name of the function to callarguments
: A JSON string containing the function argumentstool_call_id
(required for tool
messages, otherwise disallowed): The ID of the tool call to associate with the message. This should be one that was originally returned by the gateway in a tool call id
field.text
or image_url
.
If the content block has type text
, it must have either of the following additional fields:
text
: The text for the content block.tensorzero::arguments
: A JSON object containing the function arguments for TensorZero functions with templates and schemas (see Prompt Templates & Schemas for details).image_url
, it must have the following additional fields:
"image_url"
: A JSON object with the following field:
url
: The URL for a remote image (e.g. "https://example.com/image.png"
) or base64-encoded data for an embedded image (e.g. "data:image/png;base64,..."
).raw_text
or unknown
.
See the Inference API Reference for details on how to provide such content blocks as input.model
To call… | Use this format… |
A function defined as [functions.my_function] in your
tensorzero.toml configuration file | tensorzero::function_name::my_function |
A model defined as [models.my_model] in your tensorzero.toml
configuration file | tensorzero::model_name::my_model |
A model offered by a model provider, without defining it in your
tensorzero.toml configuration file (if supported, see below) | tensorzero::model_name::{provider_type}::{model_name} |
anthropic
, deepseek
, fireworks
, gcp_vertex_anthropic
, gcp_vertex_gemini
, google_ai_studio_gemini
, groq
, hyperbolic
, mistral
, openai
, openrouter
, together
, and xai
.tensorzero::function_name::extract-data
calls the extract-data
function defined above.tensorzero::model_name::gpt-4o
calls the gpt-4o
model in your configuration, which supports fallback from openai
to azure
. See Retries & Fallbacks for details.tensorzero::model_name::openai::gpt-4o
calls the OpenAI API directly for the gpt-4o
model, ignoring the gpt-4o
model defined above.tensorzero::model_name::gpt-4o
will use the [models.gpt-4o]
model defined in the tensorzero.toml
file, whereas tensorzero::model_name::openai::gpt-4o
will call the OpenAI API directly for the gpt-4o
model.parallel_tool_calls
null
)parallel_tool_calls
setting for the function being called.
presence_penalty
null
)presence_penalty
setting for any chat completion variants being used.
response_format
null
)"text"
, "json_object"
, and "{"type": "json_schema", "schema": ...}"
, where the schema field contains a valid JSON schema.
This field is not actually respected except for the "json_schema"
variant, in which the schema
field can be used to dynamically set the output schema for a json
function.
seed
null
)seed
setting for any chat completion variants being used.
stop_sequences
null
)stop_sequences
setting for any chat completion variants being used.
stream
false
)stream_options
"include_usage"
null
)"include_usage"
is true
, the gateway will include usage information in the response.
Example: `stream_options`
stream_options
is provided…temperature
null
)temperature
setting for any chat completion variants being used.
tools
tool
objects (see below)null
)tool
object has the following structure:
type
: Must be "function"
function
: An object containing:
name
: The name of the function (string, required)description
: A description of what the function does (string, optional)parameters
: A JSON Schema object describing the function’s parameters (required)strict
: Whether to enforce strict schema validation (boolean, defaults to false)tool_choice
"none"
if no tools are present, "auto"
if tools are present)"none"
: The model will not call any tool and instead generates a message"auto"
: The model can pick between generating a message or calling one or more tools"required"
: The model must call one or more tools{"type": "function", "function": {"name": "my_function"}}
: Forces the model to call the specified tooltop_p
null
)top_p
setting for any chat completion variants being used.
tensorzero::variant_name
choices
choice
objects, where each choice contains:
index
: A zero-based index indicating the choice’s position in the list (integer)finish_reason
: Always "stop"
.message
: An object containing:
content
: The message content (string, optional)tool_calls
: List of tool calls made by the model (optional). The format is the same as in the request.role
: The role of the message sender (always "assistant"
).created
episode_id
id
model
object
"chat.completion"
).system_fingerprint
usage
prompt_tokens
: Number of tokens in the prompt (integer)completion_tokens
: Number of tokens in the completion (integer)total_tokens
: Total number of tokens used (integer)Chat Function with Structured System Prompt
Chat Function with Dynamic Tool Use
JSON Function with Dynamic Output Schema