WeightSyncer (Legacy)

Overview

WeightSyncer is a legacy low-level helper kept only for backward compatibility in SDK API reference. Do not use it in new cookbook recipes or direct user loops. Use the SDK-managed service flow instead: training_client.save_weights_for_sampler(...).result() followed by service.create_sampling_client(model_path=saved.path) or service.create_deployment_sampler(model_path=saved.path).

WeightSyncer coordinates saving sampler checkpoints and syncing them to a deployment, including automatic base/delta chain state tracking, session-scoped snapshot naming, and post-sync warmup. The managed service client now owns this logic internally.

from fireworks.training.sdk import WeightSyncer

For full-parameter training, only the first checkpoint (saved as base) is promotable; subsequent delta checkpoints are not. LoRA checkpoints are always promotable (delta chain is disabled via lora_rank > 0). See Checkpoint kinds for the full promotability matrix.

Constructor

tracker = WeightSyncer(
    policy_client=training_client,
    deploy_mgr=deploy_mgr,
    deployment_id="my-deployment",
    base_model="accounts/fireworks/models/qwen3-8b",
    hotload_timeout=600,
    first_checkpoint_type="base",
    warmup_after_hotload=True,
    reset_prompt_cache=True,
    lora_rank=0,  # >0 for LoRA adapters (disables delta chain)
)

Field	Type	Default	Description
`policy_client`	`FiretitanTrainingClient`	—	Training client for save operations
`deploy_mgr`	`DeploymentManager \| None`	`None`	Deployment manager for weight sync (`None` = no weight sync)
`deployment_id`	`str \| None`	`None`	Target deployment for weight sync
`base_model`	`str`	`""`	Model name for weight sync API calls
`hotload_timeout`	`int`	`600`	Timeout in seconds for `hotload_and_wait`
`first_checkpoint_type`	`str`	`"base"`	Type for the first checkpoint (`"base"` or `"delta"`)
`compression_format`	`str`	`"arc_v2"`	Delta compression format
`warmup_after_hotload`	`bool`	`True`	Send a warmup request after each successful weight sync
`warmup_max_retries`	`int`	`10`	Max retries for post-weight-sync warmup
`reset_prompt_cache`	`bool`	`True`	Reset the deployment’s prompt cache after each weight sync. See KV cache behavior for RL rollouts for active stream, session ID, and reset-option semantics.
`lora_rank`	`int`	`0`	When > 0, forces all checkpoints to `base` type (no delta chain). LoRA adapter exports are standalone PEFT artifacts that cannot use incremental delta compression.

Methods

`save_and_hotload(name, checkpoint_type=None)`

Save sampler weights and sync to deployment. Automatically handles base (first) vs delta (subsequent) checkpoint types. Returns the snapshot_name (str | None) on success or raises on failure:

tracker.save_and_hotload(f"step-{step:05d}")

`save_only(name, checkpoint_type=None)`

Save sampler weights without syncing to deployment:

snapshot = tracker.save_only("checkpoint-name", checkpoint_type="base")

Returns snapshot_name or None.

`hotload(snapshot_name, checkpoint_type)`

Sync a previously saved snapshot to the deployment:

tracker.hotload(snapshot, checkpoint_type="base")

Returns True on success, False on failure.

`check_deployment_state()`

Query the deployment’s current weight sync state:

current = tracker.check_deployment_state()
print(current)  # current_snapshot_identity or None

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

Block until the deployment’s weight sync manager is initialized.

`reset_delta_chain()`

Force the next save to be treated as base. Call when the deployment’s bucket or trainer session changes — for example, after attaching an existing deployment to a new trainer job — otherwise the next delta could reference a base checkpoint the deployment never loaded.

Usage patterns

These patterns are for maintaining older integrations. New code should use the service-client sampler refresh pattern documented in Training and Sampling.

Sync weights every step

To minimize sampler staleness in a synchronous loop, sync a new sampler snapshot after every optimizer step before submitting the next rollout batch. This makes new rollout requests target the latest synced checkpoint, but the loop still owns draining or rejecting any stale in-flight requests before training on them:

import asyncio

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    completions = asyncio.run(
        sampler.sample_with_tokens(messages=input_messages, n=4)
    )

Interval weight sync

For throughput-oriented loops that tolerate stale sampler weights, sync a new sampler snapshot every N steps. This only controls when new sampler snapshots are saved and synced; it does not prove that already-submitted or in-flight requests were generated by the latest policy:

for step in range(total_steps):
    # ... training step ...
    if step % weight_sync_interval == 0:
        tracker.save_and_hotload(f"step-{step:05d}")

Split save and sync

Separate save from weight sync when you need intermediate steps (e.g. warmup):

snapshot = tracker.save_only("resume-step-0", checkpoint_type="base")
deploy_mgr.warmup(model)
tracker.hotload(snapshot, checkpoint_type="base")

DCP checkpoints for resume

Save DCP checkpoints at intervals using the training client directly:

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    if step % dcp_interval == 0:
        training_client.save_state(f"step-{step}")

DeploymentManager — deployment lifecycle and weight-sync API
Saving and Loading — checkpoint concepts
Training and Sampling — end-to-end workflow

Get Started

Serverless

Deployments

Models & Inference

Training

Fire Pass

Fireworks for Work

FireConnect

Administration

Security & Compliance

Integrations

Reference

Overview

Constructor

Methods

`save_and_hotload(name, checkpoint_type=None)`

`save_only(name, checkpoint_type=None)`

`hotload(snapshot_name, checkpoint_type)`

`check_deployment_state()`

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

`reset_delta_chain()`

Usage patterns

Sync weights every step

Interval weight sync

Split save and sync

DCP checkpoints for resume

​Overview

​Constructor

​Methods

​save_and_hotload(name, checkpoint_type=None)

​save_only(name, checkpoint_type=None)

​hotload(snapshot_name, checkpoint_type)

​check_deployment_state()

​wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)

​reset_delta_chain()

​Usage patterns

​Sync weights every step

​Interval weight sync

​Split save and sync

​DCP checkpoints for resume

​Related guides

Overview

Constructor

Methods

`save_and_hotload(name, checkpoint_type=None)`

`save_only(name, checkpoint_type=None)`

`hotload(snapshot_name, checkpoint_type)`

`check_deployment_state()`

`wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)`

`reset_delta_chain()`

Usage patterns

Sync weights every step

Interval weight sync

Split save and sync

DCP checkpoints for resume

Related guides