> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# RL Rollouts with Your Own Trainer

> Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.

<Warning>
  **Early Access Feature.** External-bucket hot-load for RL rollouts is a
  private preview. Contact Fireworks to enable this path on your account before
  you use `S3`, `MINIO`, `NEBIUS`, or similar non-`FW_HOSTED` storage.
</Warning>

This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts.

If you are using Fireworks-hosted trainers, start from the [Training API](/fine-tuning/training-api/introduction) instead. Fireworks manages the bucket plumbing in that path.

## Architecture

```mermaid theme={null}
flowchart LR
  trainer["Your RL Trainer"] -->|"1. Upload checkpoint"| bucket[("External bucket")]
  trainer -->|"2. Signal snapshot ready"| api["Fireworks Hot-Load API"]
  api -->|"3. Load weights"| deployment["Inference Deployment"]
  trainer -->|"4. Rollout via /v1/completions"| deployment
  deployment -->|"Tokens + optional routing_matrix"| trainer
```

**You own:** trainer, reward shaping, checkpoint cadence, rollout orchestration.

**Fireworks owns:** hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.

## End-to-end loop

1. Create a hot-load deployment.
2. Upload and hot-load an initial **full** snapshot.
3. Run rollouts against that snapshot.
4. Upload and hot-load the next **incremental** snapshot.
5. Run rollouts again.
6. Every 20th or 30th step, publish another **full** snapshot instead of an incremental one. Otherwise, repeat from step 4.

## 1. Create a hot-load deployment

Create the deployment that will serve rollouts. The `--enable-hot-load` family of flags is currently hidden during preview, so you may need to pass them explicitly.

```bash theme={null}
firectl create deployment <base_model> \
    --deployment-shape <shape_name> \
    --deployment-id <deployment_id> \
    --enable-hot-load \
    --hot-load-bucket-type S3 \
    --hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
    --region US_OHIO_1
```

<Note>
  * `--deployment-shape` is optional. If omitted, `firectl` will prompt you to
    pick one interactively. - `--hot-load-bucket-type` currently accepts `MINIO`,
    `S3`, `NEBIUS`, or `FW_HOSTED`. - `FW_HOSTED` is the Fireworks-managed trainer
    path. This guide focuses on external-bucket BYOT integrations. -
    `--hot-load-bucket-url` is required for external-bucket flows when
    `--enable-hot-load` is set. Format examples: `s3://mybucket/path`,
    `gs://mybucket/path`. No trailing slash. - `--region` picks where the
    deployment runs (for example `US_OHIO_1`, `US_VIRGINIA_1`). Keep the trainer
    and bucket geographically close for upload speed.
</Note>

Take note of the account ID, deployment ID, and model ID from the output. You will use them in the hot-load and rollout calls below.

If you do not set a shape, the CLI will show a shape picker:

<Frame>
  <img src="https://mintcdn.com/fireworksai/IioBV4ELl3VCyNkN/images/rl-rollout/deployment-shape-selector.png?fit=max&auto=format&n=IioBV4ELl3VCyNkN&q=85&s=c391df153f05650973568b0756051a82" alt="firectl deployment shape picker" width="2048" height="239" data-path="images/rl-rollout/deployment-shape-selector.png" />
</Frame>

## 2. Upload and hot-load an initial full snapshot

For the first step, upload a full HuggingFace-format checkpoint and then signal Fireworks to load it.

### Snapshot layout

Place each snapshot under its own subdirectory keyed by an opaque `checkpoint_id`:

```
s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...
```

* `checkpoint_id` is any string you pick (for example `version_001` or `step_00100`).
* The checkpoint must look like the base model on HuggingFace: `config.json`, tokenizer, and safetensors weights.
* Split weights into multiple safetensors files, each under about 5 GB.

### Signal the snapshot is ready

Once all files for the snapshot are uploaded, signal Fireworks to begin loading:

```bash theme={null}
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{ "identity": "version_001" }'
```

### Wait until replicas are ready

Poll the system state until every replica reports readiness on the new snapshot:

```bash theme={null}
curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"
```

Wait until:

* every replica has `readiness: true`, and
* every replica's `current_snapshot_identity` equals the `identity` you just signaled.

## 3. Run rollouts

Once replicas are ready, call the regular OpenAI-compatible inference API. For RL rollouts you'll usually want session-affinity headers so multi-turn trajectories reuse KV cache on the same replica:

```bash theme={null}
curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "x-multi-turn-session-id: <trajectory_id>" \
  -H "x-session-affinity: <trajectory_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/<account_id>/models/<model_id>",
    "messages": [{"role": "user", "content": "..."}]
  }'
```

For rollout-time inference behavior such as session affinity, prompt-cache behavior during weight swaps, and MoE Router Replay, see [Inference for RL rollouts](/guides/rollout-inference).

## 4. Upload and hot-load incremental snapshots

For most intermediate training steps, publish an incremental snapshot against the currently loaded snapshot instead of another full snapshot.

Fireworks supports the public ARC2 format (`arc_v2`) for this flow.

Upload the next snapshot under a new `checkpoint_id`, then signal it with `incremental_snapshot_metadata`:

```bash theme={null}
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "identity": "version_002",
    "incremental_snapshot_metadata": {
      "previous_snapshot_identity": "version_001",
      "compression_format": "arc_v2"
    }
  }'
```

Then poll the same status endpoint until every replica reports `readiness: true` and `current_snapshot_identity == "version_002"`.

## 5. Repeat the loop

* Use a new **full** snapshot for the first step and then every 20th or 30th step after that.
* Use an **incremental** snapshot for the intermediate steps.
* If an incremental hot-load fails or the chain gets into a bad state, fall back to a new full snapshot.
* If you need lower-level recovery steps, see [Ledger & debugging for RL rollouts](/fine-tuning/rl-rollout-debugging).

## Next steps

<CardGroup cols={2}>
  <Card title="Ledger & debugging" icon="bug" href="/fine-tuning/rl-rollout-debugging">
    Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.
  </Card>

  <Card title="Inference for RL rollouts" icon="bolt" href="/guides/rollout-inference">
    Session affinity headers, behavior during weight swap, and MoE Router Replay
    (R3).
  </Card>

  <Card title="Fireworks-hosted trainer" icon="flask" href="/fine-tuning/training-api/introduction">
    The alternative path where Fireworks runs the trainer through the Training API.
  </Card>
</CardGroup>
