- You collect examples of good LLM behavior (demonstrations or inferences with good metrics).
- TensorZero renders these examples using your prompt templates into a training dataset.
- TensorZero uploads the dataset and launches a fine-tuning job on your provider (OpenAI, GCP Vertex AI, Fireworks, or Together).
- The provider trains a custom model and returns a model identifier.
- You update your configuration to use the fine-tuned model.
When should you use supervised fine-tuning (SFT)?
Supervised fine-tuning is particularly useful when you have substantial high-quality data and want to improve model behavior beyond what prompting alone can achieve.| Criterion | Impact | Details |
|---|---|---|
| Complexity | Low | Requires data curation; few parameters |
| Data Efficiency | Moderate | Requires hundreds to thousands of high-quality examples |
| Optimization Ceiling | High | Can significantly improve model behavior beyond prompting |
| Optimization Cost | Moderate | More expensive than DICL, but relatively cost effective |
| Inference Cost | Low | Fine-tuned models typically cost the same as the base model |
| Inference Latency | Low | No runtime overhead |
Fine-tune your LLM with Supervised Fine-Tuning
Configure your LLM application
Define a function with a baseline variant for your application.
tensorzero.toml
Example: Data Extraction (Named Entity Recognition) — Configuration
Example: Data Extraction (Named Entity Recognition) — Configuration
system_template.minijinja
Collect your optimization data
- TensorZero Dataset
- Historical Inferences
After deploying the TensorZero Gateway with ClickHouse, build a dataset of good examples for the
extract_entities function you configured.
You can create datapoints from historical inferences or external/synthetic datasets.Split data for training and validation
SFT providers use a validation set to monitor training progress and prevent overfitting.
Split your data into training and validation sets:
Configure SFT optimization
Configure SFT by specifying the base model to fine-tune and any hyperparameters.OpenAI uses credentials from the
- OpenAI
- GCP Vertex AI Gemini
- Fireworks
- Together
OPENAI_API_KEY environment variable by default.Launch the SFT job
Launch the SFT job using the TensorZero Gateway:The job handle contains URLs for monitoring progress on the provider’s dashboard.
Poll for completion
SFT jobs run asynchronously on the provider’s infrastructure.
Poll for completion:
Provider Configuration Reference
OpenAISFTConfig
Configure OpenAI supervised fine-tuning by creating an OpenAISFTConfig object with the following parameters:
The base model to fine-tune. See OpenAI’s supported
models for available
options.
Batch size for training. If not specified, OpenAI chooses automatically.
Learning rate multiplier. Values between 0.5 and 2.0 are typical.
Number of training epochs. If not specified, OpenAI chooses automatically
based on dataset size.
Random seed for reproducibility.
Suffix to add to the fine-tuned model name for identification.
GCPVertexGeminiSFTConfig
Configure GCP Vertex AI Gemini supervised fine-tuning by creating a GCPVertexGeminiSFTConfig object with the following parameters:
The base model to fine-tune. See Vertex AI’s supported
models
for available options.
Adapter size for parameter-efficient tuning.
Whether to export only the final checkpoint instead of all checkpoints.
Learning rate multiplier for training.
Number of training epochs.
Random seed for reproducibility.
Display name for the tuned model in the Vertex AI console.
FireworksSFTConfig
Configure Fireworks supervised fine-tuning by creating a FireworksSFTConfig object with the following parameters:
The base model to fine-tune. See Fireworks’ supported
models for
available options.
Batch size in tokens for training.
Whether to automatically deploy the model after training completes.
Display name for the fine-tuning job.
Whether to enable early stopping based on validation loss.
Number of training epochs.
Whether to automatically carve out a portion of training data for evaluation.
Whether to enable turbo mode for faster training.
Learning rate for training.
LoRA rank for parameter-efficient fine-tuning.
Maximum context length for training examples.
Whether to enable Multi-Token Prediction.
Whether to freeze the base model when using MTP.
Number of draft tokens for Multi-Token Prediction.
Number of nodes for distributed training.
Custom model ID for the fine-tuned model. Defaults to the job ID.
PEFT addon model to start from. Mutually exclusive with
model.TogetherSFTConfig
Configure Together supervised fine-tuning by creating a TogetherSFTConfig object with the following parameters:
The base model to fine-tune. See Together’s supported
models for available
options.
Batch size for training. Can be an integer or
"max" for automatic
optimization.Job ID of a previous fine-tuning job to continue from.
Hugging Face model to start from instead of a Together model.
Hugging Face model revision/commit to use.
Hugging Face repository name for uploading the fine-tuned model.
Learning rate for training.
Learning rate scheduler configuration. Supports
"linear" and "cosine"
types.Maximum gradient norm for gradient clipping. Set to 0 to disable.
Number of intermediate checkpoints to save during training.
Number of training epochs.
Number of evaluations to run on the validation set during training.
Suffix for the fine-tuned model name.
Training method configuration. Supports SFT with options like
train_on_inputs.Training type configuration. Supports
"full" and "lora" with parameters
like lora_r, lora_alpha, lora_dropout.Weights & Biases run name for experiment tracking.
Warmup ratio as a percentage of total training steps.
Weight decay regularization parameter.