- Generate embeddings with a unified API. TensorZero unifies many LLM APIs (e.g. OpenAI) and inference servers (e.g. Ollama).
- Use any programming language. You can use any OpenAI SDK (Python, Node, Go, etc.) or the OpenAI-compatible HTTP API.
Generate embeddings from OpenAI
- Python (OpenAI SDK)
You can point the OpenAI Python SDK to a TensorZero Gateway to generate embeddings with a unified API.
Set up the credentials for your LLM provider
For example, if you’re using OpenAI, you can set the
OPENAI_API_KEY environment variable with your API key.Install the OpenAI Python SDK
You can install the OpenAI SDK with a Python package manager like
pip.Deploy the TensorZero Gateway
Let’s deploy the TensorZero Gateway using Docker.
For simplicity, we’ll use the gateway without observability or custom configuration.
Initialize the OpenAI client
Let’s initialize the OpenAI SDK and point it to the gateway we just launched.
Define a custom embedding model
You can define a custom embedding model in your TensorZero configuration file. For example, let’s define a custom embedding model fornomic-embed-text served locally by Ollama.
Deploy the Ollama embedding model
Download the embedding model and launch the Ollama server:We assume that Ollama is running on your host machine at
http://localhost:11434.Define your custom embedding model
Add your custom model and model provider to your configuration file:
tensorzero.toml
Deploy the TensorZero Gateway with your configuration
Deploy the TensorZero Gateway with your configuration file.
Make sure that the container has access to the Ollama server running on the host.