Skip to main content

What this is

Supervised Fine-Tuning (SFT) trains the model to produce desired outputs by minimizing cross-entropy loss on (prompt, response) pairs. The cookbook’s sft_loop recipe handles data loading, tokenization, batching, gradient accumulation, and checkpointing automatically.

Using the recipe

from training.recipes.sft_loop import Config, main
from training.utils import InfraConfig

cfg = Config(
    log_path="./sft_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/training_data.jsonl",
    tokenizer_model="Qwen/Qwen3-8B",
    max_seq_len=4096,
    epochs=1,
    batch_size=4,
    learning_rate=1e-5,
    grad_accum=4,
    infra=InfraConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
)

main(cfg)

Dataset format

SFT datasets use the standard messages format (JSONL with one example per line):
{"messages": [
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "The capital of France is Paris."}
]}
Multi-turn conversations are supported:
{"messages": [
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi! How can I help?"},
  {"role": "user", "content": "What is 2+2?"},
  {"role": "assistant", "content": "2+2 = 4"}
]}
The recipe automatically tokenizes conversations using the chat template, setting token weights to 0.0 for prompt tokens and 1.0 for response tokens.

Checkpointing and resume

The current sft_loop recipe manages the trainer-side loop only. It does not create a deployment or run weight sync during training, but it does expose DCP checkpointing and resume controls:
from training.utils import InfraConfig, WandBConfig

cfg = Config(
    log_path="./sft_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/training_data.jsonl",
    tokenizer_model="Qwen/Qwen3-8B",
    max_seq_len=4096,
    epochs=1,
    batch_size=4,
    infra=InfraConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    dcp_save_interval=50,
    init_from_checkpoint="previous-job-id:step-100",  # optional
    wandb=WandBConfig(entity="my-team", project="sft-experiment"),
)

main(cfg)

Operational guidance

  • Set infra.training_shape_id — cookbook trainer launches use training shapes.
  • Only one trainer job needed — SFT does not require a reference trainer.
  • The current recipe does not provision a deployment — use the SDK directly if you want deployment-side evaluation or hotloading during SFT.
  • Use batch_size and grad_accum together to control effective batch size: effective = batch_size * grad_accum.
  • Gradient accumulation normalization defaults to None — the SFT loss is already normalized client-side, so adding server-side normalization would double-normalize gradients.
  • Resume: The recipe uses checkpoint_utils.resolve_resume() to automatically restore from the last saved state on restart.