Skip to main content
Early Access Feature. This page is part of the same private-preview external-bucket hot-load workflow for RL rollouts. Contact Fireworks to enable this path on your account before using non-FW_HOSTED storage.
If you are using Fireworks-managed RLOR trainers with FW_HOSTED, the ledger and checkpoint-swap behavior here still matter, but you can usually ignore the external-bucket setup and manual upload/signaling details from the BYOT integration guide.
A hot-load deployment maintains a ledger of every snapshot it has loaded, along with which replica finished which snapshot at what time. The ledger is the fastest way to answer “what weights is my deployment serving right now?” and to recover from a stuck state.

Inspect snapshot history

Dump the ledger, sorted by most recent snapshot first:
firectl get ledger <deployment_id>
Each row shows the identity you signaled, whether it was a full or delta snapshot, the per-replica readiness transition timestamps, and any load error.

Inspect deployment status and failures

If the deployment itself is unhealthy (crashlooping after a bad snapshot, out-of-memory on merge, etc.), the reason is on the deployment resource itself:
firectl deployment get <deployment_id>
Look at the status, latestStatus.reason, and the most recent ledger entry together to reason about whether the problem is load-side, weights-side, or infra-side.

Reset the ledger

If the delta chain is wedged or you want to force the deployment back to the base model, you can clear server-side ledger history. This preserves the deployment itself; it just forgets every hot-loaded snapshot.
curl -X DELETE \
  https://api.fireworks.ai/v1/accounts/<account_id>/deployments/<deployment_id>/ledger \
  -H "Authorization: Bearer <fireworks_api_key>"
After reset, your next signal must be a full snapshot (delta metadata will be rejected because there’s nothing to diff against).

Checkpoint-swap behavior

When you signal a new snapshot, Fireworks has to eventually swap weights on every replica. What happens to in-flight and new requests during the swap depends on which transition mode the deployment is configured with.
Both modes behave the same way for checkpoint download — it always starts immediately after the signal, in parallel with ongoing inference. The modes differ in how they handle the actual weight-swap moment.Which mode you get is preconfigured on the deployment template and is not yet surfaced through firectl. Ask your Fireworks contact if you need to change it.
This mode is similar in spirit to PipelineRL:
  • In-flight requests: paused for the duration of the swap, then resumed on the same HTTP connection. The active turn keeps its current KV state, so the request continues streaming instead of restarting.
  • New requests: queued until the swap finishes. Clients observe this as elevated time-to-first-token (TTFT).
  • No 4xx or 5xx is returned for the swap itself.

Synchronous transition

  • In-flight requests: the server waits for them to complete on the old weights before swapping.
  • New requests arriving during the swap are rejected with HTTP 425 Too Early. Your rollout client should back off and retry, ideally using the same session-affinity key so it lands on a replica that has already finished the swap.

Prompt cache reset behavior

reset_prompt_cache only affects what can be reused after the swap. It does not interrupt the active turn above.
  • all (current default): after the swap, later requests refill prompt cache broadly.
  • new_session: existing session IDs keep their current cache namespace, while new session IDs refill.
  • none: preserve prompt-cache state across the swap.
Configure this per snapshot by setting reset_prompt_cache in the POST /hot_load/v1/models/hot_load request body, for example { "identity": "version_002", "reset_prompt_cache": "new_session" }.

Need help?

If the ledger stops advancing, a snapshot never becomes ready, or the deployment stays unhealthy after you fall back to a full snapshot, contact Fireworks. Include the account ID, deployment ID, snapshot identity you tried to load, and the latest ledger output.

Full integration guide

The end-to-end BYOT flow.

Inference for RL rollouts

Session affinity and MoE Router Replay used alongside the swap behavior above.