What this is
Supervised Fine-Tuning (SFT) trains the model to produce desired outputs by minimizing cross-entropy loss on (prompt, response) pairs. The cookbook’ssft_loop recipe handles data loading, tokenization, batching, gradient accumulation, and checkpointing automatically.
Using the recipe
Dataset format
SFT datasets use the standard messages format (JSONL with one example per line):0.0 for prompt tokens and 1.0 for response tokens.
Checkpointing and resume
The currentsft_loop recipe manages the trainer-side loop only. It does not create a deployment or run weight sync during training, but it does expose DCP checkpointing and resume controls:
Operational guidance
- Set
infra.training_shape_id— cookbook trainer launches use training shapes. - Only one trainer job needed — SFT does not require a reference trainer.
- The current recipe does not provision a deployment — use the SDK directly if you want deployment-side evaluation or hotloading during SFT.
- Use
batch_sizeandgrad_accumtogether to control effective batch size:effective = batch_size * grad_accum. - Gradient accumulation normalization defaults to
None— the SFT loss is already normalized client-side, so adding server-side normalization would double-normalize gradients. - Resume: The recipe uses
checkpoint_utils.resolve_resume()to automatically restore from the last saved state on restart.
Related guides
- Cookbook RL (GRPO) — reinforcement learning recipes
- Cookbook DPO — preference optimization
- Cookbook Reference — all config classes and parameters
- Loss Functions — SDK-level SFT loss details