Fine Tuning Overview

Fireworks helps you fine-tune models to improve quality and performance for your product use cases, without the burden of building & maintaining your own training infrastructure.

Fine-tuning methods

Reinforcement Fine Tuning

Train models using custom reward functions for complex reasoning tasks

Supervised Fine Tuning - Text

Train text models with labeled examples of desired outputs

Supervised Fine Tuning - Vision

Train vision-language models with image and text pairs

Direct Preference Optimization

Align models with human preferences using pairwise comparisons

Supported models

Fireworks supports fine-tuning for most major open source models, including DeepSeek, Qwen, Kimi, and Llama model families, and supports fine-tuning large state-of-the-art models like Kimi K2 0905 and DeepSeek V3.1. To see all models that support fine-tuning, visit the Model Library for text models or vision models.

Fireworks uses LoRA

Fireworks uses Low-Rank Adaptation (LoRA) to fine-tune models efficiently. The fine-tuning process generates a LoRA addon—a small adapter that modifies the base model’s behavior without retraining all its weights. This approach is:

Faster and cheaper - Train models in hours, not days
Easy to deploy - Deploy LoRA addons instantly on Fireworks
Flexible - Run multiple LoRAs on a single base model deployment

When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)

In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT). Supervised fine-tuning (SFT) works well for many common scenarios, especially when:

You have a sizable dataset (~1000+ examples) with high-quality, ground-truth lables.
The dataset covers most possible input scenarios.
Tasks are relatively straightforward, such as:
- Classification
- Content extraction

However, SFT may struggle in situations where:

Your dataset is small.
You lack ground-truth outputs (a.k.a. “golden generations”).
The task requires multi-step reasoning.

Here is a simple decision tree:

Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Fine-tuning methods

Reinforcement Fine Tuning

Supervised Fine Tuning - Text

Supervised Fine Tuning - Vision

Direct Preference Optimization

Supported models

Fireworks uses LoRA

When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Fine-tuning methods

Reinforcement Fine Tuning

Supervised Fine Tuning - Text

Supervised Fine Tuning - Vision

Direct Preference Optimization

​Supported models

​Fireworks uses LoRA

​When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)

Fine-tuning methods

Supported models

Fireworks uses LoRA

When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)