Cookbook Reference

InfraConfig

GPU, region, and training shape settings. Wraps TrainerJobConfig fields:

from training.utils import InfraConfig

infra = InfraConfig(
    training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ref_training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200-forward",
)

Use training_shape_id for every launched trainer in the cookbook. In normal usage, this is the only shape-specific value you set. In most cases, pass the full shared path accounts/fireworks/trainingShapes/<shape>. The fireworks account is the public shared shape catalog. Add ref_training_shape_id when the recipe also launches a reference trainer.

Field	Type	Default	Description
`training_shape_id`	`str \| None`	`None`	Required full training-shape ID for the policy trainer, typically `accounts/fireworks/trainingShapes/<shape>`. The cookbook resolves the versioned reference for you and auto-populates shape-owned infra.
`ref_training_shape_id`	`str \| None`	`None`	Optional full training-shape ID for the reference trainer, also typically under `accounts/fireworks/trainingShapes/<shape>`. When unset, `rl_loop` skips reference-model provisioning.
`region`	`str \| None`	`None`	Region override. Usually resolved from the training shape in normal launches.
`accelerator_type`	`str \| None`	`None`	Accelerator type override. Shape-owned in normal shape-based launches; do not override it manually.
`accelerator_count`	`int \| None`	`None`	Accelerator count override. Shape-owned in normal shape-based launches; do not override it manually.
`node_count`	`int \| None`	`None`	Number of trainer nodes. Shape-owned in normal shape-based launches; do not override it manually.
`custom_image_tag`	`str \| None`	`None`	Override trainer image tag. Shape-owned in normal shape-based launches; do not override it manually.
`trainer_timeout_s`	`float`	`3600`	Timeout for trainer provisioning / readiness waits
`extra_args`	`list[str] \| None`	`None`	Extra trainer arguments

For the complete shape contract, including what the shape contains and which fields are locked, see Training Shapes and TrainerJobManager.

DeployConfig

Deployment settings for sampling and weight sync. Wraps DeploymentConfig fields:

from training.utils import DeployConfig

deploy_cfg = DeployConfig(
    deployment_id="grpo-serving",
    tokenizer_model="Qwen/Qwen3-8B",
)

When deployment_shape is set, treat it as the source of truth for deployment hardware and serving configuration. In normal shape-based flows, do not combine it with manual hardware overrides.

Field	Type	Default	Description
`deployment_id`	`str \| None`	`None`	Deployment identifier. If unset, the cookbook auto-derives one from the base model name.
`tokenizer_model`	`str \| None`	`None`	HuggingFace model name for client-side tokenization. Required for RL sampling.
`deployment_shape`	`str \| None`	`None`	Deployment shape resource name. In normal shape-based flows, this owns the deployment hardware and serving config.
`deployment_region`	`str \| None`	`None`	Region override for the deployment. Usually leave unset when the deployment shape already determines placement.
`deployment_accelerator_type`	`str \| None`	`None`	Accelerator-type override when not using a deployment shape. Leave unset in normal shape-based flows.
`hot_load_bucket_type`	`str`	`"FW_HOSTED"`	Weight-sync storage backend
`deployment_timeout_s`	`float`	`5400`	Timeout for deployment provisioning / readiness waits
`deployment_extra_args`	`list[str] \| None`	`None`	Extra serving arguments
`sample_timeout`	`int`	`600`	HTTP read timeout for sampling completions
`disable_speculative_decoding`	`bool`	`True`	Disable speculative decoding for hotload compatibility
`extra_values`	`dict[str, str] \| None`	`None`	Extra deployment Helm values

WeightSyncConfig

Checkpoint and weight-sync intervals:

from training.utils import WeightSyncConfig

weight_sync = WeightSyncConfig(
    weight_sync_interval=1,
    dcp_save_interval=10,
)

Field	Type	Default	Description
`weight_sync_interval`	`int`	`1`	Save + sync sampler weights every N optimizer steps. `0` disables weight sync.
`dcp_save_interval`	`int`	`0`	Save DCP checkpoints for resume every N steps. `0` disables DCP saves.
`dcp_timeout`	`int`	`2700`	Timeout for DCP save/load operations
`first_checkpoint_type`	`str`	`"base"`	First sampler checkpoint type passed to `WeightSyncer`
`weight_sync_before_training`	`bool`	`False`	Save a base checkpoint and hotload it before the first training step
`weight_sync_timeout`	`int`	`600`	Timeout for `hotload_and_wait`

WandBConfig

Weights & Biases logging settings:

from training.utils import WandBConfig

wandb = WandBConfig(
    entity="my-team",
    project="grpo-experiment",
    run_name="qwen3-8b-v1",
)

Field	Type	Default	Description
`entity`	`str \| None`	`None`	W&B team or user name
`project`	`str \| None`	`None`	W&B project name
`run_name`	`str \| None`	`None`	Run name (auto-generated if omitted)

ReconnectableClient

Blocking convenience wrapper around FiretitanTrainingClient:

from training.utils import ReconnectableClient

client = ReconnectableClient(
    rlor_mgr=rlor_mgr,
    job_id=endpoint.job_id,
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    fw_api_key=api_key,
)

result = client.forward_backward_custom(datums, loss_fn)
client.optim_step(tinker.AdamParams(...))

All methods (forward, forward_backward_custom, optim_step, save_state, etc.) dispatch once and block until the result is ready or the timeout expires. The wrapper does not auto-reconnect or retry failed calls; failures propagate to the caller so the loop can resume from the last checkpoint.

Checkpoint utilities

`save_checkpoint`

Saves a DCP checkpoint and appends a record to checkpoints.jsonl:

from training.utils.checkpoint_utils import save_checkpoint

save_checkpoint(client, f"step-{step}", log_path, {
    "step": step,
    "data_consumed": data_consumed,
    "source_job_id": job_id,
})

Parameter	Type	Description
`client`	training client	The training client to save state from
`name`	`str`	Checkpoint name
`log_path`	`str`	Directory where `checkpoints.jsonl` is written
`loop_state`	`dict`	Arbitrary metadata persisted alongside the checkpoint
`kind`	`str`	`"state"` (optimizer + weights), `"sampler"` (serving only), or `"both"`

`resolve_resume`

On startup, reads checkpoints.jsonl and loads the last checkpoint:

from training.utils.checkpoint_utils import resolve_resume

resume_info = resolve_resume(client, log_path, init_from_checkpoint=None)
step = resume_info.step if resume_info else 0
data_consumed = resume_info.data_consumed if resume_info else 0

Returns None for a fresh start. When a checkpoint exists, it loads DCP weights + optimizer state before returning. ResumeInfo fields:

Field	Type	Description
`step`	`int`	Last completed step
`data_consumed`	`int`	Number of data examples consumed
`source_job_id`	`str \| None`	Originating trainer job ID

`init_from_checkpoint`

Load pretrained DCP weights on a fresh dataset (step resets to 0):

resume_info = resolve_resume(client, log_path, init_from_checkpoint="prev-job-id:step-100")

All recipe Config dataclasses expose init_from_checkpoint as a field.

Gradient accumulation normalization

Recipe configs expose grad_accumulation_normalization, which is passed to optim_step(...):

client.optim_step(adam_params, grad_accumulation_normalization="num_loss_tokens")

See Loss Functions for how to choose the mode and avoid double-normalization.

Recipe defaults

Recipe	Default	Rationale
SFT	`None`	The SFT loss is already normalized client-side.
GRPO / RL	`"num_loss_tokens"`	RL losses use server-side per-token normalization by default.
DPO	`None`	The DPO loss is already normalized client-side.
ORPO	`None`	The ORPO loss is already normalized client-side.

The cookbook reference documents the config surface and defaults. The conceptual guidance for loss reduction vs. server-side normalization now lives in Loss Functions.

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

InfraConfig

DeployConfig

WeightSyncConfig

WandBConfig

ReconnectableClient

Checkpoint utilities

`save_checkpoint`

`resolve_resume`

`init_from_checkpoint`

Gradient accumulation normalization

Recipe defaults

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​InfraConfig

​DeployConfig

​WeightSyncConfig

​WandBConfig

​ReconnectableClient

​Checkpoint utilities

​save_checkpoint

​resolve_resume

​init_from_checkpoint

​Gradient accumulation normalization

​Recipe defaults

InfraConfig

DeployConfig

WeightSyncConfig

WandBConfig

ReconnectableClient

Checkpoint utilities

`save_checkpoint`

`resolve_resume`

`init_from_checkpoint`

Gradient accumulation normalization

Recipe defaults