- Generate embeddings with a unified API. TensorZero unifies many LLM APIs (e.g. OpenAI) and inference servers (e.g. Ollama).
- Use any programming language. You can use any OpenAI SDK (Python, Node, Go, etc.) or the OpenAI-compatible HTTP API.
Generate embeddings from OpenAI
- Python (OpenAI SDK)
The TensorZero Python SDK integrates with the OpenAI Python SDK to provide a unified API for calling any LLM.
1
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.2
Install the OpenAI and TensorZero Python SDKs
You can install the OpenAI and TensorZero SDKs with a Python package manager like
pip.3
Initialize the OpenAI client
Let’s initialize the TensorZero Gateway and patch the OpenAI client to use it.
For simplicity, we’ll use an embedded gateway without observability or custom configuration.
4
Call the LLM
Sample Response
Sample Response
Define a custom embedding model
You can define a custom embedding model in your TensorZero configuration file. For example, let’s define a custom embedding model fornomic-embed-text served locally by Ollama.
1
Deploy the Ollama embedding model
Download the embedding model and launch the Ollama server:We assume that Ollama is available on
http://localhost:11434.2
Define your custom embedding model
Add your custom model and model provider to your configuration file:
tensorzero.toml
3
Call your custom embedding model
Use your custom model by referencing it with
tensorzero::embedding_model_name::nomic-embed-text.For example, using the OpenAI Python SDK:Sample Response
Sample Response