Use this page for training checkpoint and resume knobs and GRPO metric interpretation that are easy to miss when running reinforcement fine-tuning (RFT) and cookbook-driven training. For sampling and optimization hyperparameters (learning rate, epochs, temperature, KL targets, etc.), see Parameter tuning. The canonical cookbook reference for save, resume, and promote is Checkpoints and Resume. Low-level SDK APIs are documented in Saving and loading.Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
dcp_save_interval
Controls how often full training state (weights and optimizer) is checkpointed using DCP (Distributed Checkpoint) format.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Typical config — SFT / RL / DPO cookbooks | WeightSyncConfig(dcp_save_interval=N) on the recipe Config (see Cookbook: RL and Checkpoints) |
0 (the default), no periodic DCP checkpoints are written for resume. Only sampler and HuggingFace-format weight snapshots may be produced — these preserve model weights but not optimizer state.
When set to a positive integer N, a full DCP checkpoint is written every N steps.
Why this matters: If a training job is interrupted, optimizer state is lost unless dcp_save_interval is set. The model resumes from the last checkpoint, but the optimizer re-initializes from scratch — which can affect training stability and effective learning rate.
Example (cookbook Config)
dcp_save_interval; see your recipe’s Config dataclass for the exact attribute path.
Job recovery and preemption
For transient control-plane or worker interruptions, the trainer job manager exposesreconnect_and_wait so your driver can wait for a resumable state and resume cleanly.
load_state_with_optimizer() only restores optimizer state from DCP-format checkpoints. If you point it at an HF or sampler snapshot, optimizer state silently won’t be restored. Always load from the path returned by save_state() when you need full optimizer restore. See Saving and loading.Metrics reference
ppo_kl vs ref_kld
GRPO training logs two KL divergence metrics that measure different things:
| Metric | What it measures | Expected behavior |
|---|---|---|
ppo_kl | KL between the current policy and the previous policy (importance-sampling ratio inside the PPO clip objective) | Stays near 0 with one minibatch per rollout — this is correct, not a bug |
ref_kld | KL between the current policy and the reference (base) model | Starts near 0, increases gradually as the policy diverges from base during training |
ref_kld is the metric to watch for policy drift. A sudden large jump in ref_kld may indicate reward hacking or that the KL penalty coefficient needs tuning.
The cookbook does not always surface ref_kld by default. To add it, you can use the k3 unbiased estimator: