Skip to main content

Overview

WeightSyncer coordinates saving sampler checkpoints and syncing them to a deployment, including automatic base/delta chain state tracking, session-scoped snapshot naming, and post-sync warmup.
from fireworks.training.sdk import WeightSyncer

Constructor

tracker = WeightSyncer(
    policy_client=training_client,
    deploy_mgr=deploy_mgr,
    deployment_id="my-deployment",
    base_model="accounts/fireworks/models/qwen3-8b",
    hotload_timeout=600,
    first_checkpoint_type="base",
    warmup_after_hotload=True,
)
FieldTypeDefaultDescription
policy_clientFiretitanTrainingClientTraining client for save operations
deploy_mgrDeploymentManager | NoneNoneDeployment manager for weight sync (None = no weight sync)
deployment_idstr | NoneNoneTarget deployment for weight sync
base_modelstr""Model name for weight sync API calls
hotload_timeoutint600Timeout in seconds for hotload_and_wait
first_checkpoint_typestr"base"Type for the first checkpoint ("base" or "delta")
dcp_timeoutint2700Timeout for DCP save operations
compression_formatstr"arc_v2"Delta compression format
warmup_after_hotloadboolTrueSend a warmup request after each successful weight sync
warmup_max_retriesint10Max retries for post-weight-sync warmup

Methods

save_and_hotload(name, checkpoint_type=None)

Save sampler weights and sync to deployment. Automatically handles base (first) vs delta (subsequent) checkpoint types. Returns the snapshot_name (str | None) on success or raises on failure:
tracker.save_and_hotload(f"step-{step:05d}")

save_only(name, checkpoint_type=None)

Save sampler weights without syncing to deployment:
snapshot = tracker.save_only("checkpoint-name", checkpoint_type="base")
Returns snapshot_name or None.

hotload(snapshot_name)

Sync a previously saved snapshot to the deployment:
tracker.hotload(snapshot)
Returns True on success, False on failure.

save_dcp(name)

Save a DCP checkpoint only (for resume). No sampler checkpoint, no weight sync. Returns bool (True on success):
tracker.save_dcp(f"step-{step}")
If you need direct control over DCP save futures, call training_client.save_state(...).result() on the raw SDK client instead.

check_deployment_state()

Query the deployment’s current weight sync state:
current = tracker.check_deployment_state()
print(current)  # current_snapshot_identity or None

wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)

Block until the deployment’s weight sync manager is initialized.

Usage patterns

On-policy weight sync (every step)

For on-policy training (e.g. GRPO), sync weights after every optimizer step:
import asyncio

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    completions = asyncio.run(
        sampler.sample_with_tokens(messages=input_messages, n=4)
    )

Interval weight sync (off-policy)

For off-policy training, sync weights every N steps:
for step in range(total_steps):
    # ... training step ...
    if step % weight_sync_interval == 0:
        tracker.save_and_hotload(f"step-{step:05d}")

Split save and sync

Separate save from weight sync when you need intermediate steps (e.g. warmup):
snapshot = tracker.save_only("resume-step-0", checkpoint_type="base")
deploy_mgr.warmup(model)
tracker.hotload(snapshot)

DCP checkpoints for resume

Save DCP checkpoints at intervals alongside weight sync saves:
for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    if step % dcp_interval == 0:
        tracker.save_dcp(f"step-{step}")