Skip to main content
Supervised Fine-Tuning (SFT) trains a language model on curated examples of good behavior, resulting in a custom model that performs better on your specific use case. TensorZero simplifies the SFT workflow by helping you curate training data from your historical inferences and feedback, then launching fine-tuning jobs on your preferred provider. Here’s how it works:
  1. You collect examples of good LLM behavior (demonstrations or inferences with good metrics).
  2. TensorZero renders these examples using your prompt templates into a training dataset.
  3. TensorZero uploads the dataset and launches a fine-tuning job on your provider (OpenAI, GCP Vertex AI, Fireworks, or Together).
  4. The provider trains a custom model and returns a model identifier.
  5. You update your configuration to use the fine-tuned model.

When should you use supervised fine-tuning (SFT)?

Supervised fine-tuning is particularly useful when you have substantial high-quality data and want to improve model behavior beyond what prompting alone can achieve.
CriterionImpactDetails
ComplexityLowRequires data curation; few parameters
Data EfficiencyModerateRequires hundreds to thousands of high-quality examples
Optimization CeilingHighCan significantly improve model behavior beyond prompting
Optimization CostModerateMore expensive than DICL, but relatively cost effective
Inference CostLowFine-tuned models typically cost the same as the base model
Inference LatencyLowNo runtime overhead
SFT tends to work best when:
  • You have hundreds to thousands of high-quality examples.
  • Inference cost and latency are important. Unlike DICL, SFT shifts the cost to a one-time optimization workflow.
    • If inference cost matters: SFT is often more economical than DICL at scale.
  • You want to improve model behavior beyond what prompting can achieve.
    • If prompts are sufficient: consider GEPA for automated prompt engineering.

Fine-tune your LLM with Supervised Fine-Tuning

You can find a complete runnable example of this guide on GitHub.
1

Configure your LLM application

Define a function with a baseline variant for your application.
tensorzero.toml
[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"

[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-4o-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"
system_template.minijinja
You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.

The entities you are extracting are:

- people
- organizations
- locations
- miscellaneous other entities

Please return the entities in the following JSON format:

{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}
2

Collect your optimization data

After deploying the TensorZero Gateway with ClickHouse, build a dataset of good examples for the extract_entities function you configured. You can create datapoints from historical inferences or external/synthetic datasets.
from tensorzero import ListDatapointsRequest

datapoints = t0.list_datapoints(
    dataset_name="extract_entities_dataset",
    request=ListDatapointsRequest(
        function_name="extract_entities",
    ),
)

rendered_samples = t0.experimental_render_samples(
    stored_samples=datapoints.datapoints,
    variants={"extract_entities": "baseline"},
)
3

Split data for training and validation

SFT providers use a validation set to monitor training progress and prevent overfitting. Split your data into training and validation sets:
import random

random.shuffle(rendered_samples)
split_idx = int(len(rendered_samples) * 0.8)  # 80% training, 20% validation
train_samples = rendered_samples[:split_idx]
val_samples = rendered_samples[split_idx:]

print(f"Training samples: {len(train_samples)}")
print(f"Validation samples: {len(val_samples)}")
A typical split is 80% training and 20% validation. For smaller datasets, you may want to use a larger training proportion (e.g. 90/10).
4

Configure SFT optimization

Configure SFT by specifying the base model to fine-tune and any hyperparameters.
from tensorzero import OpenAISFTConfig

optimization_config = OpenAISFTConfig(
    model="gpt-4.1-2025-04-14",
)
OpenAI uses credentials from the OPENAI_API_KEY environment variable by default.
5

Launch the SFT job

Launch the SFT job using the TensorZero Gateway:
job_handle = t0.experimental_launch_optimization(
    train_samples=train_samples,
    val_samples=val_samples,
    optimization_config=optimization_config,
)

print(f"Job launched! Monitor at: {job_handle.job_url}")
The job handle contains URLs for monitoring progress on the provider’s dashboard.
6

Poll for completion

SFT jobs run asynchronously on the provider’s infrastructure. Poll for completion:
import asyncio
from tensorzero import OptimizationJobStatus

job_info = t0.experimental_poll_optimization(job_handle=job_handle)

# For long-running jobs, poll periodically:
while job_info.status == OptimizationJobStatus.Pending:
    print(f"Job status: {job_info.status}")
    await asyncio.sleep(60)  # wait 1 minute between polls
    job_info = t0.experimental_poll_optimization(job_handle=job_handle)

if job_info.status == OptimizationJobStatus.Completed:
    print("Fine-tuning complete!")
else:
    print(f"Job failed: {job_info.message}")
Fine-tuning typically takes 10-30 minutes for small datasets, but can take hours for large datasets. You can close your script and poll later using the job handle.
7

Update your configuration with the fine-tuned model

After optimization completes, extract the fine-tuned model name and update your configuration:
fine_tuned_model = job_info.output["routing"][0]
print(f"Fine-tuned model: {fine_tuned_model}")
Add the fine-tuned model and a new variant to your tensorzero.toml:
tensorzero.toml
[models.extract_entities_fine_tuned]
routing = ["openai"]

[models.extract_entities_fine_tuned.providers.openai]
type = "openai"
model_name = "ft:gpt-4.1-2025-04-14:org::xxxxx"  # from above

[functions.extract_entities.variants.fine_tuned]
type = "chat_completion"
model = "extract_entities_fine_tuned"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"
For most model providers, you can also use the shorthand syntax in your variant configuration:
model = "openai::ft:gpt-4.1-2025-04-14:org::xxxxx"
This avoids needing to define a separate [models.*] section.
That’s it! Your fine-tuned model is now ready to use.
You can run experiments comparing your baseline and fine-tuned variants using adaptive A/B testing.

Provider Configuration Reference

OpenAISFTConfig

Configure OpenAI supervised fine-tuning by creating an OpenAISFTConfig object with the following parameters:
model
str
required
The base model to fine-tune. See OpenAI’s supported models for available options.
batch_size
int
Batch size for training. If not specified, OpenAI chooses automatically.
learning_rate_multiplier
float
Learning rate multiplier. Values between 0.5 and 2.0 are typical.
n_epochs
int
Number of training epochs. If not specified, OpenAI chooses automatically based on dataset size.
seed
int
Random seed for reproducibility.
suffix
str
Suffix to add to the fine-tuned model name for identification.

GCPVertexGeminiSFTConfig

Configure GCP Vertex AI Gemini supervised fine-tuning by creating a GCPVertexGeminiSFTConfig object with the following parameters:
model
str
required
The base model to fine-tune. See Vertex AI’s supported models for available options.
adapter_size
int
Adapter size for parameter-efficient tuning.
export_last_checkpoint_only
bool
Whether to export only the final checkpoint instead of all checkpoints.
learning_rate_multiplier
float
Learning rate multiplier for training.
n_epochs
int
Number of training epochs.
seed
int
Random seed for reproducibility.
tuned_model_display_name
str
Display name for the tuned model in the Vertex AI console.

FireworksSFTConfig

Configure Fireworks supervised fine-tuning by creating a FireworksSFTConfig object with the following parameters:
model
str
required
The base model to fine-tune. See Fireworks’ supported models for available options.
batch_size
int
Batch size in tokens for training.
deploy_after_training
bool
default:false
Whether to automatically deploy the model after training completes.
display_name
str
Display name for the fine-tuning job.
early_stop
bool
Whether to enable early stopping based on validation loss.
epochs
int
Number of training epochs.
eval_auto_carveout
bool
Whether to automatically carve out a portion of training data for evaluation.
is_turbo
bool
Whether to enable turbo mode for faster training.
learning_rate
float
Learning rate for training.
lora_rank
int
LoRA rank for parameter-efficient fine-tuning.
max_context_length
int
Maximum context length for training examples.
mtp_enabled
bool
Whether to enable Multi-Token Prediction.
mtp_freeze_base_model
bool
Whether to freeze the base model when using MTP.
mtp_num_draft_tokens
int
Number of draft tokens for Multi-Token Prediction.
nodes
int
Number of nodes for distributed training.
output_model
str
Custom model ID for the fine-tuned model. Defaults to the job ID.
warm_start_from
str
PEFT addon model to start from. Mutually exclusive with model.

TogetherSFTConfig

Configure Together supervised fine-tuning by creating a TogetherSFTConfig object with the following parameters:
model
str
required
The base model to fine-tune. See Together’s supported models for available options.
batch_size
int | str
default:"max"
Batch size for training. Can be an integer or "max" for automatic optimization.
from_checkpoint
str
Job ID of a previous fine-tuning job to continue from.
from_hf_model
str
Hugging Face model to start from instead of a Together model.
hf_model_revision
str
Hugging Face model revision/commit to use.
hf_output_repo_name
str
Hugging Face repository name for uploading the fine-tuned model.
learning_rate
float
Learning rate for training.
lr_scheduler
dict
Learning rate scheduler configuration. Supports "linear" and "cosine" types.
max_grad_norm
float
Maximum gradient norm for gradient clipping. Set to 0 to disable.
n_checkpoints
int
default:1
Number of intermediate checkpoints to save during training.
n_epochs
int
default:1
Number of training epochs.
n_evals
int
Number of evaluations to run on the validation set during training.
suffix
str
Suffix for the fine-tuned model name.
training_method
dict
Training method configuration. Supports SFT with options like train_on_inputs.
training_type
dict
Training type configuration. Supports "full" and "lora" with parameters like lora_r, lora_alpha, lora_dropout.
wandb_name
str
Weights & Biases run name for experiment tracking.
warmup_ratio
float
Warmup ratio as a percentage of total training steps.
weight_decay
float
Weight decay regularization parameter.