Cookbook: SFT

What this is

Supervised Fine-Tuning (SFT) trains the model to produce desired outputs by minimizing cross-entropy loss on (prompt, response) pairs. The cookbook’s sft_loop recipe handles data loading, tokenization, batching, gradient accumulation, and checkpointing automatically.

Using the recipe

from training.recipes.sft_loop import Config, main
from training.utils import InfraConfig

cfg = Config(
    log_path="./sft_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/training_data.jsonl",
    tokenizer_model="Qwen/Qwen3-8B",
    max_seq_len=4096,
    epochs=1,
    batch_size=4,
    learning_rate=1e-5,
    grad_accum=4,
    infra=InfraConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
)

main(cfg)

Dataset format

SFT datasets use the standard messages format (JSONL with one example per line):

{"messages": [
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."}
]}

Multi-turn conversations are supported:

{"messages": [
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi! How can I help?"},
  {"role": "user", "content": "What is 2+2?"},
  {"role": "assistant", "content": "2+2 = 4"}
]}

The recipe automatically tokenizes conversations using the chat template, setting token weights to 0.0 for prompt tokens and 1.0 for response tokens.

Checkpointing and resume

The current sft_loop recipe manages the trainer-side loop only. It does not create a deployment or run weight sync during training, but it does expose DCP checkpointing and resume controls:

from training.utils import InfraConfig, WandBConfig

cfg = Config(
    log_path="./sft_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/training_data.jsonl",
    tokenizer_model="Qwen/Qwen3-8B",
    max_seq_len=4096,
    epochs=1,
    batch_size=4,
    infra=InfraConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    dcp_save_interval=50,
    init_from_checkpoint="previous-job-id:step-100",  # optional
    wandb=WandBConfig(entity="my-team", project="sft-experiment"),
)

main(cfg)

Operational guidance

Set infra.training_shape_id — cookbook trainer launches use training shapes.
Only one trainer job needed — SFT does not require a reference trainer.
The current recipe does not provision a deployment — use the SDK directly if you want deployment-side evaluation or hotloading during SFT.
Use batch_size and grad_accum together to control effective batch size: effective = batch_size * grad_accum.
Gradient accumulation normalization defaults to None — the SFT loss is already normalized client-side, so adding server-side normalization would double-normalize gradients.
Resume: The recipe uses checkpoint_utils.resolve_resume() to automatically restore from the last saved state on restart.

Cookbook RL (GRPO) — reinforcement learning recipes
Cookbook DPO — preference optimization
Cookbook Reference — all config classes and parameters
Loss Functions — SDK-level SFT loss details

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

What this is

Using the recipe

Dataset format

Checkpointing and resume

Operational guidance

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​What this is

​Using the recipe

​Dataset format

​Checkpointing and resume

​Operational guidance

​Related guides

What this is

Using the recipe

Dataset format

Checkpointing and resume

Operational guidance

Related guides