WeightSyncer

Overview

WeightSyncer coordinates saving sampler checkpoints and syncing them to a deployment, including automatic base/delta chain state tracking, session-scoped snapshot naming, and post-sync warmup.

from fireworks.training.sdk import WeightSyncer

Constructor

tracker = WeightSyncer(
    policy_client=training_client,
    deploy_mgr=deploy_mgr,
    deployment_id="my-deployment",
    base_model="accounts/fireworks/models/qwen3-8b",
    hotload_timeout=600,
    first_checkpoint_type="base",
    warmup_after_hotload=True,
)

Field	Type	Default	Description
`policy_client`	`FiretitanTrainingClient`	—	Training client for save operations
`deploy_mgr`	`DeploymentManager \| None`	`None`	Deployment manager for weight sync (`None` = no weight sync)
`deployment_id`	`str \| None`	`None`	Target deployment for weight sync
`base_model`	`str`	`""`	Model name for weight sync API calls
`hotload_timeout`	`int`	`600`	Timeout in seconds for `hotload_and_wait`
`first_checkpoint_type`	`str`	`"base"`	Type for the first checkpoint (`"base"` or `"delta"`)
`dcp_timeout`	`int`	`2700`	Timeout for DCP save operations
`compression_format`	`str`	`"arc_v2"`	Delta compression format
`warmup_after_hotload`	`bool`	`True`	Send a warmup request after each successful weight sync
`warmup_max_retries`	`int`	`10`	Max retries for post-weight-sync warmup

Methods

`save_and_hotload(name, checkpoint_type=None)`

Save sampler weights and sync to deployment. Automatically handles base (first) vs delta (subsequent) checkpoint types. Returns the snapshot_name (str | None) on success or raises on failure:

tracker.save_and_hotload(f"step-{step:05d}")

`save_only(name, checkpoint_type=None)`

Save sampler weights without syncing to deployment:

snapshot = tracker.save_only("checkpoint-name", checkpoint_type="base")

Returns snapshot_name or None.

`hotload(snapshot_name)`

Sync a previously saved snapshot to the deployment:

tracker.hotload(snapshot)

Returns True on success, False on failure.

`save_dcp(name)`

Save a DCP checkpoint only (for resume). No sampler checkpoint, no weight sync. Returns bool (True on success):

tracker.save_dcp(f"step-{step}")

If you need direct control over DCP save futures, call training_client.save_state(...).result() on the raw SDK client instead.

`check_deployment_state()`

Query the deployment’s current weight sync state:

current = tracker.check_deployment_state()
print(current)  # current_snapshot_identity or None

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

Block until the deployment’s weight sync manager is initialized.

Usage patterns

On-policy weight sync (every step)

For on-policy training (e.g. GRPO), sync weights after every optimizer step:

import asyncio

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    completions = asyncio.run(
        sampler.sample_with_tokens(messages=input_messages, n=4)
    )

Interval weight sync (off-policy)

For off-policy training, sync weights every N steps:

for step in range(total_steps):
    # ... training step ...
    if step % weight_sync_interval == 0:
        tracker.save_and_hotload(f"step-{step:05d}")

Split save and sync

Separate save from weight sync when you need intermediate steps (e.g. warmup):

snapshot = tracker.save_only("resume-step-0", checkpoint_type="base")
deploy_mgr.warmup(model)
tracker.hotload(snapshot)

DCP checkpoints for resume

Save DCP checkpoints at intervals alongside weight sync saves:

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    if step % dcp_interval == 0:
        tracker.save_dcp(f"step-{step}")

DeploymentManager — deployment lifecycle and hotload API
Saving and Loading — checkpoint concepts
Training and Sampling — end-to-end workflow

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Overview

Constructor

Methods

`save_and_hotload(name, checkpoint_type=None)`

`save_only(name, checkpoint_type=None)`

`hotload(snapshot_name)`

`save_dcp(name)`

`check_deployment_state()`

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

Usage patterns

On-policy weight sync (every step)

Interval weight sync (off-policy)

Split save and sync

DCP checkpoints for resume

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Overview

​Constructor

​Methods

​save_and_hotload(name, checkpoint_type=None)

​save_only(name, checkpoint_type=None)

​hotload(snapshot_name)

​save_dcp(name)

​check_deployment_state()

​wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)

​Usage patterns

​On-policy weight sync (every step)

​Interval weight sync (off-policy)

​Split save and sync

​DCP checkpoints for resume

​Related guides

Overview

Constructor

Methods

`save_and_hotload(name, checkpoint_type=None)`

`save_only(name, checkpoint_type=None)`

`hotload(snapshot_name)`

`save_dcp(name)`

`check_deployment_state()`

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

Usage patterns

On-policy weight sync (every step)

Interval weight sync (off-policy)

Split save and sync

DCP checkpoints for resume

Related guides