InfraConfig
GPU, region, and training shape settings. WrapsTrainerJobConfig fields:
training_shape_id for every launched trainer in the cookbook. In normal usage, this is the only shape-specific value you set. In most cases, pass the full shared path accounts/fireworks/trainingShapes/<shape>. The fireworks account is the public shared shape catalog. Add ref_training_shape_id when the recipe also launches a reference trainer.
| Field | Type | Default | Description |
|---|---|---|---|
training_shape_id | str | None | None | Required full training-shape ID for the policy trainer, typically accounts/fireworks/trainingShapes/<shape>. The cookbook resolves the versioned reference for you and auto-populates shape-owned infra. |
ref_training_shape_id | str | None | None | Optional full training-shape ID for the reference trainer, also typically under accounts/fireworks/trainingShapes/<shape>. When unset, rl_loop skips reference-model provisioning. |
region | str | None | None | Region override. Usually resolved from the training shape in normal launches. |
accelerator_type | str | None | None | Accelerator type override. Shape-owned in normal shape-based launches; do not override it manually. |
accelerator_count | int | None | None | Accelerator count override. Shape-owned in normal shape-based launches; do not override it manually. |
node_count | int | None | None | Number of trainer nodes. Shape-owned in normal shape-based launches; do not override it manually. |
custom_image_tag | str | None | None | Override trainer image tag. Shape-owned in normal shape-based launches; do not override it manually. |
trainer_timeout_s | float | 3600 | Timeout for trainer provisioning / readiness waits |
extra_args | list[str] | None | None | Extra trainer arguments |
TrainerJobManager.
DeployConfig
Deployment settings for sampling and weight sync. WrapsDeploymentConfig fields:
deployment_shape is set, treat it as the source of truth for deployment hardware and serving configuration. In normal shape-based flows, do not combine it with manual hardware overrides.
| Field | Type | Default | Description |
|---|---|---|---|
deployment_id | str | None | None | Deployment identifier. If unset, the cookbook auto-derives one from the base model name. |
tokenizer_model | str | None | None | HuggingFace model name for client-side tokenization. Required for RL sampling. |
deployment_shape | str | None | None | Deployment shape resource name. In normal shape-based flows, this owns the deployment hardware and serving config. |
deployment_region | str | None | None | Region override for the deployment. Usually leave unset when the deployment shape already determines placement. |
deployment_accelerator_type | str | None | None | Accelerator-type override when not using a deployment shape. Leave unset in normal shape-based flows. |
hot_load_bucket_type | str | "FW_HOSTED" | Weight-sync storage backend |
deployment_timeout_s | float | 5400 | Timeout for deployment provisioning / readiness waits |
deployment_extra_args | list[str] | None | None | Extra serving arguments |
sample_timeout | int | 600 | HTTP read timeout for sampling completions |
disable_speculative_decoding | bool | True | Disable speculative decoding for hotload compatibility |
extra_values | dict[str, str] | None | None | Extra deployment Helm values |
WeightSyncConfig
Checkpoint and weight-sync intervals:| Field | Type | Default | Description |
|---|---|---|---|
weight_sync_interval | int | 1 | Save + sync sampler weights every N optimizer steps. 0 disables weight sync. |
dcp_save_interval | int | 0 | Save DCP checkpoints for resume every N steps. 0 disables DCP saves. |
dcp_timeout | int | 2700 | Timeout for DCP save/load operations |
first_checkpoint_type | str | "base" | First sampler checkpoint type passed to WeightSyncer |
weight_sync_before_training | bool | False | Save a base checkpoint and hotload it before the first training step |
weight_sync_timeout | int | 600 | Timeout for hotload_and_wait |
WandBConfig
Weights & Biases logging settings:| Field | Type | Default | Description |
|---|---|---|---|
entity | str | None | None | W&B team or user name |
project | str | None | None | W&B project name |
run_name | str | None | None | Run name (auto-generated if omitted) |
ReconnectableClient
Blocking convenience wrapper aroundFiretitanTrainingClient:
Checkpoint utilities
save_checkpoint
Saves a DCP checkpoint and appends a record to checkpoints.jsonl:
| Parameter | Type | Description |
|---|---|---|
client | training client | The training client to save state from |
name | str | Checkpoint name |
log_path | str | Directory where checkpoints.jsonl is written |
loop_state | dict | Arbitrary metadata persisted alongside the checkpoint |
kind | str | "state" (optimizer + weights), "sampler" (serving only), or "both" |
resolve_resume
On startup, reads checkpoints.jsonl and loads the last checkpoint:
None for a fresh start. When a checkpoint exists, it loads DCP weights + optimizer state before returning.
ResumeInfo fields:
| Field | Type | Description |
|---|---|---|
step | int | Last completed step |
data_consumed | int | Number of data examples consumed |
source_job_id | str | None | Originating trainer job ID |
init_from_checkpoint
Load pretrained DCP weights on a fresh dataset (step resets to 0):
Config dataclasses expose init_from_checkpoint as a field.
Gradient accumulation normalization
Recipe configs exposegrad_accumulation_normalization, which is passed to optim_step(...):
Recipe defaults
| Recipe | Default | Rationale |
|---|---|---|
| SFT | None | The SFT loss is already normalized client-side. |
| GRPO / RL | "num_loss_tokens" | RL losses use server-side per-token normalization by default. |
| DPO | None | The DPO loss is already normalized client-side. |
| ORPO | None | The ORPO loss is already normalized client-side. |
The cookbook reference documents the config surface and defaults. The conceptual guidance for loss reduction vs. server-side normalization now lives in Loss Functions.