During RL training the policy updates step by step, and the inference deployment needs those updated weights to generate the next batch of rollouts. The cookbook wires this as a shared GCS bucket:
- The trainer writes a fresh checkpoint to the bucket after each optimizer step (or on a configurable cadence).
- The deployment watches the same bucket and swaps in new weights without a pod restart.
Terminology. The internal Fireworks name for this mechanism is hotload. You’ll see that name in SDK field names (hot_load_trainer_job, hot_load_deployment_id, hot_load_bucket_url), methods (WeightSyncer.save_and_hotload), and server error messages. “Weight sync” and “hotload” refer to the same thing.
Normal flow
Use the cookbook’s setup_infra entrypoint — it creates the trainer, then creates the deployment pointing at it, with no extra wiring. The default DeployConfig(weight_sync_scope=WeightSyncScope.PER_TRAINER) is what you want for almost every run. If you misconfigure the pairing, the server rejects the CreateDeployment or CreateRlorTrainerJob call up front with an error that links back here.
WeightSyncScope: who owns the bucket
DeployConfig.weight_sync_scope controls which resource must be created first:
| Scope | Bucket owner | Use when |
|---|
PER_TRAINER (default) | Trainer — one bucket per run | Single run, or one trainer feeding multiple deployments (sampler + held-out eval) |
PER_DEPLOYMENT | Deployment — stable bucket across trainer runs | Long-lived deployment, many sequential trainers, can’t tolerate deployment restarts between runs |
setup_infra dispatches on this single field and wires the rest correctly. The two scopes are mutually exclusive for the same trainer ↔ deployment pair — don’t mix them.
Diagnosing errors
The control plane catches scope-mix mistakes at create time and returns an error that names both resources and suggests the fix. For the full list of server error strings and per-error recovery steps, see the cookbook’s dev skill: skills/dev/references/rl/hotload.md. It also covers trainer retention, the unified promote API, and runtime bucket-mismatch warnings.
See also