Skip to main content

Overview

WeightSyncer is a legacy low-level helper kept only for backward compatibility in SDK API reference. Do not use it in new cookbook recipes or direct user loops. Use the SDK-managed service flow instead: training_client.save_weights_for_sampler(...).result() followed by service.create_sampling_client(model_path=saved.path) or service.create_deployment_sampler(model_path=saved.path).
WeightSyncer coordinates saving sampler checkpoints and syncing them to a deployment, including automatic base/delta chain state tracking, session-scoped snapshot naming, and post-sync warmup. The managed service client now owns this logic internally.
from fireworks.training.sdk import WeightSyncer
For full-parameter training, only the first checkpoint (saved as base) is promotable; subsequent delta checkpoints are not. LoRA checkpoints are always promotable (delta chain is disabled via lora_rank > 0). See Checkpoint kinds for the full promotability matrix.

Constructor

tracker = WeightSyncer(
    policy_client=training_client,
    deploy_mgr=deploy_mgr,
    deployment_id="my-deployment",
    base_model="accounts/fireworks/models/qwen3-8b",
    hotload_timeout=600,
    first_checkpoint_type="base",
    warmup_after_hotload=True,
    reset_prompt_cache=True,
    lora_rank=0,  # >0 for LoRA adapters (disables delta chain)
)
FieldTypeDefaultDescription
policy_clientFiretitanTrainingClientTraining client for save operations
deploy_mgrDeploymentManager | NoneNoneDeployment manager for weight sync (None = no weight sync)
deployment_idstr | NoneNoneTarget deployment for weight sync
base_modelstr""Model name for weight sync API calls
hotload_timeoutint600Timeout in seconds for hotload_and_wait
first_checkpoint_typestr"base"Type for the first checkpoint ("base" or "delta")
compression_formatstr"arc_v2"Delta compression format
warmup_after_hotloadboolTrueSend a warmup request after each successful weight sync
warmup_max_retriesint10Max retries for post-weight-sync warmup
reset_prompt_cacheboolTrueReset the deployment’s prompt cache after each weight sync
lora_rankint0When > 0, forces all checkpoints to base type (no delta chain). LoRA adapter exports are standalone PEFT artifacts that cannot use incremental delta compression.

Methods

save_and_hotload(name, checkpoint_type=None)

Save sampler weights and sync to deployment. Automatically handles base (first) vs delta (subsequent) checkpoint types. Returns the snapshot_name (str | None) on success or raises on failure:
tracker.save_and_hotload(f"step-{step:05d}")

save_only(name, checkpoint_type=None)

Save sampler weights without syncing to deployment:
snapshot = tracker.save_only("checkpoint-name", checkpoint_type="base")
Returns snapshot_name or None.

hotload(snapshot_name, checkpoint_type)

Sync a previously saved snapshot to the deployment:
tracker.hotload(snapshot, checkpoint_type="base")
Returns True on success, False on failure.

check_deployment_state()

Query the deployment’s current weight sync state:
current = tracker.check_deployment_state()
print(current)  # current_snapshot_identity or None

wait_for_hotload_ready(timeout_s=300, poll_interval_s=5)

Block until the deployment’s weight sync manager is initialized.

reset_delta_chain()

Force the next save to be treated as base. Call when the deployment’s bucket or trainer session changes — for example, after attaching an existing deployment to a new trainer job — otherwise the next delta could reference a base checkpoint the deployment never loaded.

Usage patterns

These patterns are for maintaining older integrations. New code should use the service-client sampler refresh pattern documented in Training and Sampling.

Sync weights every step

To minimize sampler staleness in a synchronous loop, sync a new sampler snapshot after every optimizer step before submitting the next rollout batch. This makes new rollout requests target the latest synced checkpoint, but the loop still owns draining or rejecting any stale in-flight requests before training on them:
import asyncio

for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    completions = asyncio.run(
        sampler.sample_with_tokens(messages=input_messages, n=4)
    )

Interval weight sync

For throughput-oriented loops that tolerate stale sampler weights, sync a new sampler snapshot every N steps. This only controls when new sampler snapshots are saved and synced; it does not prove that already-submitted or in-flight requests were generated by the latest policy:
for step in range(total_steps):
    # ... training step ...
    if step % weight_sync_interval == 0:
        tracker.save_and_hotload(f"step-{step:05d}")

Split save and sync

Separate save from weight sync when you need intermediate steps (e.g. warmup):
snapshot = tracker.save_only("resume-step-0", checkpoint_type="base")
deploy_mgr.warmup(model)
tracker.hotload(snapshot, checkpoint_type="base")

DCP checkpoints for resume

Save DCP checkpoints at intervals using the training client directly:
for step in range(total_steps):
    # ... training step ...
    tracker.save_and_hotload(f"step-{step:05d}")
    if step % dcp_interval == 0:
        training_client.save_state(f"step-{step}")