Training Overview

Fireworks helps you fine-tune models to improve quality and performance for your product use cases, without the burden of building & maintaining your own training infrastructure.

Coming from OpenAI? Fireworks uses the same OpenAI-compatible chat completion format for training data — the same messages array with role, content, tool_calls, and weight fields. You can use your existing SFT datasets with no conversion required. See our OpenAI compatibility guide for more details.

Two ways to fine-tune

Fireworks offers two fundamentally different approaches to fine-tuning. Choose the one that fits your needs:

Managed Fine-Tuning

Give Fireworks your data and configuration. The platform handles scheduling, training, checkpointing, and model output. No custom code required.Best for teams that want managed SFT, DPO, or RFT with LoRA or full-parameter tuning.

Training API (Tinker compatible)

Write custom Python training loops. You control the loss function, optimizer step, checkpointing, and weight sync. Fireworks handles the distributed GPU infrastructure.Best for research teams needing custom loops, custom rollout orchestration, or inference-in-the-loop evaluation.

	Managed Fine-Tuning	Training API
Control	Configuration-driven	Full Python loop control
Objectives	Built-in SFT, DPO, RFT	Recipe objectives or any custom loss function
Tuning method	Full-parameter or LoRA	Full-parameter or LoRA
Inference during training	Not available	Hotload + sample mid-training
Interface	UI, firectl, REST API	Python API
Best for	Production fine-tuning with standard methods	Research, custom RL, hybrid losses

When to use SFT vs. RFT

In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. Supervised fine-tuning (SFT) works well for many common scenarios, especially when:

You have a sizable dataset (~1000+ examples) with high-quality, ground-truth labels.
The dataset covers most possible input scenarios.
Tasks are relatively straightforward, such as:
- Classification
- Content extraction

However, SFT may struggle in situations where:

Your dataset is small.
You lack ground-truth outputs (a.k.a. “golden generations”).
The task requires multi-step reasoning.

Here is a simple decision tree:

Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.

When to use the Training API instead

Move from managed fine-tuning to the Training API when you need:

Custom training logic — hybrid objectives, custom reward shaping, or a non-standard algorithm beyond managed settings
Inference-in-the-loop evaluation — hotload checkpoints onto a serving deployment and sample mid-training
Per-step control — custom gradient accumulation, dynamic learning rate schedules, or algorithm research

Detailed capability comparison

Capability	Managed RFT	Training API
Launch training	CLI or UI	Python script
Loss functions	`grpo`, `dapo`, `gspo-token` (built-in)	Any custom loss via `forward_backward_custom`
Tuning modes	Full-parameter or LoRA	Full-parameter or LoRA
Context length	Full context length supported by the selected training shape	Full context length supported by the selected training shape
Training loop	Fully managed	You write the loop
Per-step diagnostics	Dashboard (reward, loss, rollouts)	Full Python access to all metrics
Zero-variance filtering	Automatic	You implement
Checkpoint management	Automatic	You control via `save_weights_for_sampler_ext`

Migrating from managed flow to Training API

If you’ve been using managed RFT and want more control — custom loss functions, richer diagnostics, or algorithm experimentation — the Training API lets you implement your own training loop while keeping the same GPU infrastructure. Managed jobs and cookbook recipes now use the same core tuning capabilities, including LoRA or full-parameter tuning and the full context length supported by the selected training shape.

MoE models and Routing Replay

For Mixture-of-Experts (MoE) models like Kimi K2 (384 experts), training stability benefits from Routing Replay — caching the expert routing assignments from the reference policy’s forward pass and replaying them during the training forward pass. This ensures that the same experts process the same tokens in both the reference and policy models, reducing gradient noise from routing changes. Routing Replay is available in the Training API via the loss_fn_inputs mechanism — you can pass routing matrices from the reference forward pass into the training datum. Use the Training API when you need to inspect or customize those forward-pass inputs directly.

Get Started

Serverless

Deployments

Models & Inference

Training

Fire Pass

Administration

Security & Compliance

Integrations

Reference

Two ways to fine-tune

Managed Fine-Tuning

Training API (Tinker compatible)

When to use SFT vs. RFT

When to use the Training API instead

Detailed capability comparison

Migrating from managed flow to Training API

MoE models and Routing Replay

Get Started

Serverless

Deployments

Models & Inference

Training

Fire Pass

Administration

Security & Compliance

Integrations

Reference

Documentation Index

​Two ways to fine-tune

Managed Fine-Tuning

Training API (Tinker compatible)

​When to use SFT vs. RFT

​When to use the Training API instead

​Detailed capability comparison

​Migrating from managed flow to Training API

​MoE models and Routing Replay

Two ways to fine-tune

When to use SFT vs. RFT

When to use the Training API instead

Detailed capability comparison

Migrating from managed flow to Training API

MoE models and Routing Replay