Skip to main content
Early Access Feature. External-bucket hot-load for RL rollouts is a private preview. Contact Fireworks to enable this path on your account before you use S3, MINIO, NEBIUS, or similar non-FW_HOSTED storage.
This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts. If you are using Fireworks-hosted RLOR trainers with FW_HOSTED, start from Tinker API Compatibility & Full Parameter Tuning instead. Fireworks manages the bucket plumbing in that path.

Architecture

You own: trainer, reward shaping, checkpoint cadence, rollout orchestration. Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.

End-to-end loop

  1. Create a hot-load deployment.
  2. Upload and hot-load an initial full snapshot.
  3. Run rollouts against that snapshot.
  4. Upload and hot-load the next incremental snapshot.
  5. Run rollouts again.
  6. Every 20th or 30th step, publish another full snapshot instead of an incremental one. Otherwise, repeat from step 4.

1. Create a hot-load deployment

Create the deployment that will serve rollouts. The --enable-hot-load family of flags is currently hidden during preview, so you may need to pass them explicitly.
firectl create deployment <base_model> \
    --deployment-shape <shape_name> \
    --deployment-id <deployment_id> \
    --enable-hot-load \
    --hot-load-bucket-type S3 \
    --hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
    --region US_OHIO_1
  • --deployment-shape is optional. If omitted, firectl will prompt you to pick one interactively. - --hot-load-bucket-type currently accepts MINIO, S3, NEBIUS, or FW_HOSTED. - FW_HOSTED is the Fireworks-managed trainer path. This guide focuses on external-bucket BYOT integrations. - --hot-load-bucket-url is required for external-bucket flows when --enable-hot-load is set. Format examples: s3://mybucket/path, gs://mybucket/path. No trailing slash. - --region picks where the deployment runs (for example US_OHIO_1, US_VIRGINIA_1). Keep the trainer and bucket geographically close for upload speed.
Take note of the account ID, deployment ID, and model ID from the output. You will use them in the hot-load and rollout calls below. If you do not set a shape, the CLI will show a shape picker:
firectl deployment shape picker

2. Upload and hot-load an initial full snapshot

For the first step, upload a full HuggingFace-format checkpoint and then signal Fireworks to load it.

Snapshot layout

Place each snapshot under its own subdirectory keyed by an opaque checkpoint_id:
s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...
  • checkpoint_id is any string you pick (for example version_001 or step_00100).
  • The checkpoint must look like the base model on HuggingFace: config.json, tokenizer, and safetensors weights.
  • Split weights into multiple safetensors files, each under about 5 GB.

Signal the snapshot is ready

Once all files for the snapshot are uploaded, signal Fireworks to begin loading:
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{ "identity": "version_001" }'

Wait until replicas are ready

Poll the system state until every replica reports readiness on the new snapshot:
curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"
Wait until:
  • every replica has readiness: true, and
  • every replica’s current_snapshot_identity equals the identity you just signaled.

3. Run rollouts

Once replicas are ready, call the regular OpenAI-compatible inference API. For RL rollouts you’ll usually want session-affinity headers so multi-turn trajectories reuse KV cache on the same replica:
curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "x-multi-turn-session-id: <trajectory_id>" \
  -H "x-session-affinity: <trajectory_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/<account_id>/models/<model_id>",
    "messages": [{"role": "user", "content": "..."}]
  }'
For rollout-time inference behavior such as session affinity, prompt-cache behavior during weight swaps, and MoE Router Replay, see Inference for RL rollouts.

4. Upload and hot-load incremental snapshots

For most intermediate training steps, publish an incremental snapshot against the currently loaded snapshot instead of another full snapshot. Fireworks supports the public ARC2 format (arc_v2) for this flow. Upload the next snapshot under a new checkpoint_id, then signal it with incremental_snapshot_metadata:
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "identity": "version_002",
    "incremental_snapshot_metadata": {
      "previous_snapshot_identity": "version_001",
      "compression_format": "arc_v2"
    }
  }'
Then poll the same status endpoint until every replica reports readiness: true and current_snapshot_identity == "version_002".

5. Repeat the loop

  • Use a new full snapshot for the first step and then every 20th or 30th step after that.
  • Use an incremental snapshot for the intermediate steps.
  • If an incremental hot-load fails or the chain gets into a bad state, fall back to a new full snapshot.
  • If you need lower-level recovery steps, see Ledger & debugging for RL rollouts.

Next steps

Ledger & debugging

Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.

Inference for RL rollouts

Session affinity headers, behavior during weight swap, and MoE Router Replay (R3).

Fireworks-hosted trainer

The alternative path where Fireworks runs the trainer via Tinker-compatible SDK.