Fine-tuning methods
Reinforcement Fine Tuning
Train models using custom reward functions for complex reasoning tasks
Supervised Fine Tuning - Text
Train text models with labeled examples of desired outputs
Supervised Fine Tuning - Vision
Train vision-language models with image and text pairs
Direct Preference Optimization
Align models with human preferences using pairwise comparisons
Supported models
Fireworks supports fine-tuning for most major open source models, including DeepSeek, Qwen, Kimi, and Llama model families, and supports fine-tuning large state-of-the-art models like Kimi K2 0905 and DeepSeek V3.1. To see all models that support fine-tuning, visit the Model Library for text models or vision models.Fireworks uses LoRA
Fireworks uses Low-Rank Adaptation (LoRA) to fine-tune models efficiently. The fine-tuning process generates a LoRA addon—a small adapter that modifies the base model’s behavior without retraining all its weights. This approach is:- Faster and cheaper - Train models in hours, not days
- Easy to deploy - Deploy LoRA addons instantly on Fireworks
- Flexible - Run multiple LoRAs on a single base model deployment
When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)
In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT). Supervised fine-tuning (SFT) works well for many common scenarios, especially when:- You have a sizable dataset (~1000+ examples) with high-quality, ground-truth lables.
- The dataset covers most possible input scenarios.
- Tasks are relatively straightforward, such as:
- Classification
- Content extraction
- Your dataset is small.
- You lack ground-truth outputs (a.k.a. “golden generations”).
- The task requires multi-step reasoning.
Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.