Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt

Use this file to discover all available pages before exploring further.

Early Access Feature. External-bucket hot-load for RL rollouts is a private preview. Contact Fireworks to enable this path on your account before you use S3, MINIO, NEBIUS, or similar non-FW_HOSTED storage.
Using a code agent? Follow sections in order: PrerequisitesQuickstart checklistHot-load API. Required env: FIREWORKS_API_KEY. After your first full snapshot is serving, read Incremental snapshots before production training loops. For swap behavior and reset_prompt_cache, see Ledger & debugging.
This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts.

Is this the right guide?

PathYou ownFireworks owns
This guide (BYOT rollouts)Trainer, rewards, environment, checkpoint upload cadenceHot-load deployment, distributed weight swap, inference, KV cache across rollouts
Training APITraining logic (recipes or SDK)GPUs, trainer lifecycle, often FW_HOSTED bucket
Managed RFTDataset and evaluatorEnd-to-end hosted RL
Why BYOT rollout inference?
  • Disaggregated: Your trainer and rollout cluster can run in different regions or clouds; deployments can span multiple regions to pool capacity.
  • Full-parameter scale: Full (non-LoRA) tuning for large models supported on Fireworks inference shapes.
  • Fast checkpoint transfer: Lossless compressed incremental snapshots (arc_v2, typically 20×+ compression) over standard object storage—no special RDMA networking between trainer and inference.
  • Async / off-policy friendly: Background download during rollouts; configurable swap semantics similar in spirit to PipelineRL—see checkpoint-swap behavior.
For Online RL (live user traffic as rollouts with rolling per-replica updates), the same hot-load infrastructure applies; contact Fireworks for production Online RL setup.

Placeholders

Reuse these values in every command below:
PlaceholderExample
<account_id>my-team
<model_id>qwen3-30b-a3b
<deployment_id>rl-rollout-prod
<fireworks_api_key>From API keys
<your_bucket> / <your_upload_path>Parent prefix configured on the deployment (no trailing slash)
<checkpoint_id>Snapshot directory name, e.g. version_001 (no slashes)

Prerequisites

Complete this checklist before creating a deployment:
  1. Fireworks account and API keycreate a key and set export FIREWORKS_API_KEY="<key>".
  2. Account ID — In the dashboard, open your account settings or any resource URL; the account slug is the segment after /accounts/ (for example accounts/<account_id>/...).
  3. Feature enablement — Request external-bucket hot-load for RL rollouts on account <account_id>, including your bucket provider (S3, GCS/gs://, or NEBIUS).
  4. Object storage read access for Fireworks — Fireworks needs read-only access to the bucket prefix you will pass as --hot-load-bucket-url. At enablement, Fireworks shares the IAM principal to grant access. Typical setup:
    • Amazon S3: Grant the Fireworks principal s3:GetObject (and s3:ListBucket on the prefix) on s3://<your_bucket>/<your_upload_path>/*.
    • Google Cloud Storage: Grant roles/storage.objectViewer on the bucket or prefix to the Fireworks service account provided at onboarding.
    • Nebius / MinIO: Equivalent read-only credentials or access key scoped to the upload prefix.
  5. firectl installed — See firectl.
  6. Base model and deployment shape — An RL-capable shape for your model (GPU count, precision). If you omit --deployment-shape, firectl prompts you to pick one interactively.

Architecture

You own: trainer, reward shaping, checkpoint cadence, rollout orchestration. Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.

End-to-end loop

  1. Create a hot-load deployment.
  2. Upload and hot-load an initial full snapshot.
  3. Run rollouts against that snapshot.
  4. For each training step: upload and hot-load the next incremental snapshot (see Incremental snapshots).
  5. Run rollouts again.
  6. Every 20th or 30th step, publish a full snapshot instead of an incremental one. If the incremental chain fails, fall back to a full snapshot.

Quickstart checklist

Use this table for your first rollout end-to-end:
StepActionDone when
1Create hot-load deploymentfirectl deployment get <deployment_id> shows a healthy deployment
2Upload full HF snapshotAll files exist under .../<checkpoint_id>/ in object storage
3POST signal snapshotHTTP 200
4GET poll statusEvery replica has readiness: true and current_snapshot_identity matches your identity
5Run rolloutsChat/completions returns tokens

1. Create a hot-load deployment

Create the deployment that will serve rollouts. During preview, --enable-hot-load flags may be hidden from CLI help but can still be passed explicitly.
firectl create deployment accounts/<account_id>/models/<model_id> \
  --deployment-shape <shape_name> \
  --deployment-id <deployment_id> \
  --enable-hot-load \
  --hot-load-bucket-type S3 \
  --hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
  --hot-load-transition-type ASYNC \
  --region US_OHIO_1
Flags
  • --deployment-shape — Optional. If omitted, firectl prompts you to pick one.
  • --hot-load-bucket-typeMINIO, S3, NEBIUS, or FW_HOSTED. This guide focuses on external buckets (S3, gs://, etc.). FW_HOSTED is for Fireworks-managed trainers.
  • --hot-load-bucket-url — Required when --enable-hot-load is set. Examples: s3://mybucket/path, gs://mybucket/path. No trailing slash. This is the parent prefix; each snapshot is a subdirectory named by identity (see snapshot layout).
  • --hot-load-transition-typeASYNC (recommended for RL) or SYNC. Defaults to ASYNC when hot load is enabled. See checkpoint-swap behavior.
  • --region — Where the deployment runs (for example US_OHIO_1, US_VIRGINIA_1). Keep the trainer upload path geographically close to the bucket and deployment.
Save the account ID, deployment ID, and model ID from the output for hot-load and rollout calls. If you do not set a shape, the CLI shows a shape picker:
firectl deployment shape picker

2. Upload and hot-load an initial full snapshot

Upload a full HuggingFace-format checkpoint, then signal Fireworks to load it.

Snapshot layout

Place each snapshot under its own subdirectory. The identity you signal in the API must match the directory name (a single path segment—no slashes):
s3://<your_bucket>/<your_upload_path>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...
Example with the recommended path pattern:
s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/version_001/
  • identity / <checkpoint_id> — Any opaque string (for example version_001 or step_00100).
  • Format — Same layout as the base model on HuggingFace: config.json, tokenizer files, and safetensors weights. No tensor-parallel sharding in uploaded files.
  • File size — Split weights into multiple .safetensors files, each under about 5 GB. Group weights by layer when possible; putting one layer per file minimizes load time.
Optional: call the per-file hint API as each file lands to speed up loading on large models.

Signal and poll

Use the Hot-load API below with { "identity": "<checkpoint_id>" } and poll until all replicas are ready.

Hot-load API

All hot-load requests use these headers:
HeaderValue
AuthorizationBearer <fireworks_api_key>
fireworks-modelaccounts/<account_id>/models/<model_id>
fireworks-deploymentaccounts/<account_id>/deployments/<deployment_id>
Content-Typeapplication/json
OperationMethodURL
Signal snapshot readyPOSThttps://api.fireworks.ai/hot_load/v1/models/hot_load
Poll load statusGEThttps://api.fireworks.ai/hot_load/v1/models/hot_load
Per-file hint (optional)POSThttps://api.fireworks.ai/hot_load/v1/models/hot_load/hint

Signal snapshot ready

Full snapshot body:
{ "identity": "version_001" }
Incremental snapshot bodies, compression, hints, and checksum_format are documented in Incremental snapshots.
identity
string
required
Snapshot directory name under the configured bucket prefix. Must not contain /.
incremental_snapshot_metadata
object
Required for incremental snapshots. Includes previous_snapshot_identity, compression_format (arc_v2), and checksum_format (alder32). See the incremental snapshots guide.
reset_prompt_cache
string
Prompt-cache policy after the swap: all (default), none, or new_session. See prompt cache reset behavior.
validation.extra_fields_ignore
string[]
Top-level config.json fields to ignore during snapshot validation. Only use for known-safe metadata fields.
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{ "identity": "version_001" }'
import os
import requests

API_KEY = os.environ["FIREWORKS_API_KEY"]
ACCOUNT = "<account_id>"
MODEL = f"accounts/{ACCOUNT}/models/<model_id>"
DEPLOYMENT = f"accounts/{ACCOUNT}/deployments/<deployment_id>"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "fireworks-model": MODEL,
    "fireworks-deployment": DEPLOYMENT,
    "Content-Type": "application/json",
}

resp = requests.post(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    json={"identity": "version_001"},
    timeout=60,
)
resp.raise_for_status()

Poll load status

Poll until every replica has readiness: true and current_snapshot_identity equals the identity you signaled.
curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"
status = requests.get(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    timeout=30,
).json()

replicas = status.get("replicas", [])
ready = (
    replicas
    and all(r.get("readiness") for r in replicas)
    and all(r.get("current_snapshot_identity") == "version_001" for r in replicas)
)

When to start rollouts

  • Default (on-policy): Wait until all replicas report readiness on the new identity.
  • Off-policy / higher utilization: You may start sending rollouts when a subset of replicas is ready—inspect each entry in replicas in the GET response. Stale-policy rollouts are expected; use async transition mode and monitor policy version in streaming responses (see Policy version in responses).
Per-file hints are optional but recommended for large checkpoints—see Incremental snapshots.

3. Run rollouts

Call the OpenAI-compatible inference API. For multi-turn RL, set session headers so KV cache stays on one replica:
curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "x-multi-turn-session-id: <trajectory_id>" \
  -H "x-session-affinity: <trajectory_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/<account_id>/models/<model_id>",
    "messages": [{"role": "user", "content": "..."}]
  }'
See Inference for RL rollouts for session affinity, weight-swap behavior, MoE Router Replay (R3), and policy-version fields.

Steady-state training loop

After the first full snapshot:
  1. Intermediate steps — Build and upload an incremental snapshot (arc_v2), signal with incremental_snapshot_metadata, poll until ready, then run rollouts.
  2. Every 20th or 30th step — Publish a new full snapshot for faster recovery and chain reset.
  3. On failure — Fall back to a full snapshot; see Ledger & debugging.
Brief incremental signal example (full details on the incremental page):
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "identity": "version_002",
    "incremental_snapshot_metadata": {
      "previous_snapshot_identity": "version_001",
      "compression_format": "arc_v2",
      "checksum_format": "alder32"
    }
  }'

Numerics alignment

For best training–inference alignment:
  • Match quantization / precision between trainer checkpoints and the deployment shape (work with Fireworks if you need a custom shape).
  • Measure logprob divergence between trainer forward passes and rollout inference on the same tokens.
  • For MoE models, use Router Replay (R3) during rollouts—see MoE Router Replay.

Next steps

Incremental snapshots

Build ARC2 deltas, per-file hints, and incremental signal bodies.

Ledger & debugging

Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.

Inference for RL rollouts

Session affinity headers, policy version in streams, weight-swap behavior, and MoE Router Replay (R3).

Fireworks-hosted trainer

The alternative path where Fireworks runs the trainer through the Training API.