Skip to main content
Fireworks helps you fine-tune models to improve quality and performance for your product use cases, without the burden of building & maintaining your own training infrastructure.

Two ways to fine-tune

Fireworks offers two fundamentally different approaches to fine-tuning. Choose the one that fits your needs:

Managed Fine-Tuning

Give Fireworks your data and configuration. The platform handles scheduling, training, checkpointing, and model output. No custom code required.Best for teams that want fast results with standard objectives (SFT, DPO, RFT).

Training SDK (Tinker compatible)

Write custom Python training loops. You control the loss function, optimizer step, checkpointing, and weight sync. Fireworks handles the distributed GPU infrastructure.Best for research teams needing custom objectives, full-parameter tuning, or inference-in-the-loop evaluation.
Managed Fine-TuningTraining SDK
ControlConfiguration-drivenFull Python loop control
ObjectivesBuilt-in SFT, DPO, RFTAny custom loss function
Tuning methodLoRAFull-parameter or LoRA
Inference during trainingNot availableHotload + sample mid-training
InterfaceUI, firectl, REST APIPython SDK
Best forProduction fine-tuning with standard methodsResearch, custom RL, hybrid losses

When to use SFT vs. RFT

In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. Supervised fine-tuning (SFT) works well for many common scenarios, especially when:
  • You have a sizable dataset (~1000+ examples) with high-quality, ground-truth labels.
  • The dataset covers most possible input scenarios.
  • Tasks are relatively straightforward, such as:
    • Classification
    • Content extraction
However, SFT may struggle in situations where:
  • Your dataset is small.
  • You lack ground-truth outputs (a.k.a. “golden generations”).
  • The task requires multi-step reasoning.
Here is a simple decision tree:
Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.

When to use the Training SDK instead

Move from managed fine-tuning to the Training SDK when you need:
  • Custom loss functions — hybrid GRPO + DPO, custom reward shaping, or any non-standard objective
  • Full-parameter tuning — update all model weights instead of a LoRA adapter
  • Inference-in-the-loop evaluation — hotload checkpoints onto a serving deployment and sample mid-training
  • Per-step control — custom gradient accumulation, dynamic learning rate schedules, or algorithm research