Fine-tuning adapts general-purpose models to domain-specific tasks, significantly improving performance in real-world applications. In particular, fine-tuning can offer you:
For example, we have seen fine-tuning be especially helpful in these tasks:
Fireworks supports both Supervised Fine-Tuning (SFT) and Reinforcement Fine Tuning (RFT). In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT).
To fine-tune a model efficiently, Fireworks uses a technique called Low-Rank Adaptation (LoRA). The fine-tuning process generates a LoRA addon that gets deployed onto a base model at inference time. The advantages of using LoRA are:
Supervised fine-tuning (SFT) works well for many common scenarios, especially when:
However, SFT may struggle in situations where:
Here is a simple decision tree:
Verifiable
refers to whether it is relatively easy to make a judgement on the quality of the model generation.