Overview
WeightSyncer coordinates saving sampler checkpoints and syncing them to a deployment, including automatic base/delta chain state tracking, session-scoped snapshot naming, and post-sync warmup.
For full-parameter training, only the first checkpoint (saved as
base) is promotable; subsequent delta checkpoints are not. LoRA checkpoints are always promotable (delta chain is disabled via lora_rank > 0). See Checkpoint kinds for the full promotability matrix.Constructor
| Field | Type | Default | Description |
|---|---|---|---|
policy_client | FiretitanTrainingClient | — | Training client for save operations |
deploy_mgr | DeploymentManager | None | None | Deployment manager for weight sync (None = no weight sync) |
deployment_id | str | None | None | Target deployment for weight sync |
base_model | str | "" | Model name for weight sync API calls |
hotload_timeout | int | 600 | Timeout in seconds for hotload_and_wait |
first_checkpoint_type | str | "base" | Type for the first checkpoint ("base" or "delta") |
compression_format | str | "arc_v2" | Delta compression format |
warmup_after_hotload | bool | True | Send a warmup request after each successful weight sync |
warmup_max_retries | int | 10 | Max retries for post-weight-sync warmup |
reset_prompt_cache | bool | True | Reset the deployment’s prompt cache after each weight sync |
lora_rank | int | 0 | When > 0, forces all checkpoints to base type (no delta chain). LoRA adapter exports are standalone PEFT artifacts that cannot use incremental delta compression. |
Methods
save_and_hotload(name, checkpoint_type=None)
Save sampler weights and sync to deployment. Automatically handles base (first) vs delta (subsequent) checkpoint types.
Returns the snapshot_name (str | None) on success or raises on failure:
save_only(name, checkpoint_type=None)
Save sampler weights without syncing to deployment:
snapshot_name or None.
hotload(snapshot_name, checkpoint_type)
Sync a previously saved snapshot to the deployment:
True on success, False on failure.
check_deployment_state()
Query the deployment’s current weight sync state:
wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)
Block until the deployment’s weight sync manager is initialized.
reset_delta_chain()
Force the next save to be treated as base. Call when the deployment’s bucket changes under you — otherwise the next delta references a base the deployment never loaded. Re-attaching a live deployment to a new trainer is not a user workflow; reach out to Fireworks support for that.
Usage patterns
On-policy weight sync (every step)
For on-policy training (e.g. GRPO), sync weights after every optimizer step:Interval weight sync (off-policy)
For off-policy training, sync weights every N steps:Split save and sync
Separate save from weight sync when you need intermediate steps (e.g. warmup):DCP checkpoints for resume
Save DCP checkpoints at intervals using the training client directly:Related guides
- DeploymentManager — deployment lifecycle and hotload API
- Saving and Loading — checkpoint concepts
- Training and Sampling — end-to-end workflow