Skip to main content

TrainerConfig

Training-client launch settings: which training shape to use, the optional reference trainer, region, and run-level knobs. Recipes take it as Config.trainer:
from training.utils import TrainerConfig

trainer = TrainerConfig(
    training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    reference_training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200-forward",
)
Use training_shape_id for explicit shape selection — this is the primary shape-specific value you set. Pass the full shared path accounts/fireworks/trainingShapes/<shape> (the fireworks account is the public shared shape catalog). If you leave it unset, supported recipes auto-select a validated shape from the control plane based on base_model, lora_rank, and max_seq_len.
FieldTypeDefaultDescription
training_shape_idstr | NoneNoneOptional full training-shape ID for the policy trainer, typically accounts/fireworks/trainingShapes/<shape>. When unset, supported recipes auto-select a validated shape.
reference_training_shape_idstr | NoneNoneOptional full training-shape ID for a separate reference trainer. For full-parameter runs that need a reference, leave unset to auto-select a validated forward-only shape; for LoRA runs, leave unset to use the shared-session reference on the policy trainer.
job_idstr | NoneNoneAttach to an existing trainer job (resume / reattach) instead of creating a new one.
reference_job_idstr | NoneNoneAttach to an existing forward-only reference trainer job.
cleanup_reference_on_closeboolTrueDelete the SDK-managed reference trainer when the service closes.
regionstr | NoneNoneRegion override (drives trainer + deployment colocation).
timeout_sfloat3600Timeout for trainer provisioning / readiness waits.
extra_argslist[str] | NoneNoneExtra trainer arguments.
replica_countint | NoneNoneData-parallel HSDP replica count for policy trainer launches. This is a run-level knob, not part of the validated training shape; reference trainers are launched without it.
skip_validationsboolFalseSkip server-side shape validation. Requires elevated permissions.
purposestr | NoneNoneOptional platform purpose enum name, such as "PURPOSE_PILOT".
To request replicated HSDP for a run:
trainer = TrainerConfig(
    training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    replica_count=2,
)
On the shape path (training_shape_id set or auto-selected), accelerator_type, accelerator_count, node_count, and custom_image_tag are derived from the training shape. TrainerConfig still exposes those fields for the advanced manual path (training_shape_id=None), where they are sent directly and shape validation is skipped.
Migrating from InfraConfig? See Deprecated managed infra (InfraConfig) for the field-rename table.

DeployConfig

Deployment settings for sampling and weight sync. Wraps DeploymentConfig fields:
from training.utils import DeployConfig

deploy_cfg = DeployConfig(
    deployment_id="grpo-serving",
    tokenizer_model="Qwen/Qwen3-8B",
)
When deployment_shape is set (the recommended path), the shape owns deployment hardware and serving configuration.
FieldTypeDefaultDescription
weight_sync_scopeWeightSyncScopeWeightSyncScope.PER_TRAINERControls whether the trainer bucket or deployment bucket owns weight sync state. See Weight sync.
deployment_idstr | NoneNoneDeployment identifier. If unset, the cookbook auto-derives one from the base model name.
tokenizer_modelstr | NoneNoneHuggingFace model name for client-side tokenization. Required for RL sampling.
tokenizer_revisionstr | NoneNoneOptional HuggingFace tokenizer revision.
deployment_shapestr | NoneNoneDeployment shape resource name. When set, the shape owns GPU type and serving config.
deployment_regionstr | NoneNoneRegion override for the deployment
hot_load_bucket_typestr"FW_HOSTED"Weight-sync storage backend
hot_load_trainer_jobstr | NoneNoneTrainer job name whose weight-sync bucket this deployment should use. Format: accounts/{account}/rlorTrainerJobs/{job_id}.
deployment_timeout_sfloat5400Timeout for deployment provisioning / readiness waits
reattach_settle_timeout_sint600Timeout for the serving pod to settle after re-attaching a deployment to a new trainer bucket.
deployment_extra_argslist[str] | NoneNoneExtra serving arguments
sample_timeoutint600HTTP read timeout for sampling completions
disable_speculative_decodingboolTrueDisable speculative decoding for weight-sync compatibility
extra_valuesdict[str, str] | NoneNoneExtra deployment Helm values
replica_countint | NoneNoneIf set, pin the deployment to a fixed replica count (sets both min and max).
deployment_accelerator_typestr | NoneNoneManual-path deployment GPU type used only when no deployment_shape is set.
When deployment_shape is set, the deployment shape owns GPU type and serving configuration. Use deployment_accelerator_type only for advanced manual deployments without a deployment shape.

ConcurrencyConfig

Rollout sampling concurrency settings used by RL-family recipes:
FieldTypeDefaultDescription
modestr | None"adaptive"Concurrency mode. RL recipes currently use adaptive concurrency.
initial_windowint | NoneNoneStarting adaptive concurrency window. When unset, recipes derive it from deployment capacity.
min_windowint1Minimum adaptive concurrency window.
max_windowint256Maximum adaptive concurrency window.
prefill_queue_targetfloat0.5Target prefill queue duration in seconds for AIMD adjustment.
max_concurrencyint | NoneNoneDeprecated fixed-concurrency compatibility field.

Checkpoint & weight-sync fields

Weight-sync and checkpoint cadence are top-level fields on the recipe Config (no nested config object). rl_loop and igpo_loop expose the full weight-sync cadence knobs; async_rl_loop pins sampler sync to every optimizer step and exposes only pre-training sync and timeout. Every recipe exposes dcp_save_interval:
cfg = Config(
    # ... base_model, dataset, trainer, deployment ...
    weight_sync_interval=1,               # rl_loop/igpo_loop: sync weights every N steps
    weight_sync_before_training=False,    # RL: sync a base checkpoint before step 1
    weight_sync_timeout=600,              # RL: per weight-sync timeout (seconds)
    dcp_save_interval=10,                  # all recipes: save resumable DCP checkpoints every N steps
)
dcp_save_interval defaults to 0 (off). Without setting it to a positive value, no DCP checkpoints are saved and training cannot be resumed. If you need checkpoint-based resume, explicitly set dcp_save_interval (e.g. dcp_save_interval=50).
FieldRecipesTypeDefaultDescription
dcp_save_intervalAllint0Save resumable DCP checkpoints every N steps. 0 disables DCP saves. Set to a positive value to enable resume.
weight_sync_intervalrl_loop, igpo_loopint1Save + sync weights to the deployment every N optimizer steps. 0 disables weight sync. async_rl_loop pins this internally to 1.
weight_sync_before_trainingRL familyboolFalseSave a base checkpoint and sync it to the deployment before the first training step.
weight_sync_timeoutRL familyint600Timeout for each weight sync (seconds).
The old nested WeightSyncConfig recipe field is gone. Recipe Config objects set the fields above directly, and the SDK-managed service owns the underlying save and weight-sync state.

WandBConfig

Weights & Biases logging settings:
from training.utils import WandBConfig

wandb = WandBConfig(
    entity="my-team",
    project="grpo-experiment",
    run_name="qwen3-8b-v1",
)
FieldTypeDefaultDescription
entitystr | NoneNoneW&B team or user name
projectstr | NoneNoneW&B project name
run_namestr | NoneNoneRun name (auto-generated if omitted)

ReconnectableClient

Blocking convenience wrapper around FiretitanTrainingClient. All cookbook recipes use this as their training client — it dispatches each call and blocks until the result is ready or the timeout expires. Failures propagate to the caller so the training loop can crash cleanly and resume from the last DCP checkpoint.
This is a recipe-internal wrapper. User code should not construct it with trainer managers. Recipes build it from the FiretitanTrainingClient returned by the SDK-managed service client.
from training.utils import ReconnectableClient

client = ReconnectableClient.from_training_client(
    training_client,
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    job_id=service.trainer_job_id,
    service=service,
)

result = client.forward_backward_custom(datums, loss_fn)
client.optim_step(tinker.AdamParams(...))
ParameterTypeDefaultDescription
clientFiretitanTrainingClientTraining client returned by service.create_training_client(...)
job_idstrRLOR trainer job ID
base_modelstrBase model name
lora_rankint0LoRA rank (0 for full-parameter)
serviceFiretitanServiceClient | NoneNoneManaged service that owns the trainer lifecycle
default_timeoutint3600Timeout in seconds for forward/backward/optim calls
Properties:
PropertyTypeDescription
job_idstrThe trainer job ID
Methods:
MethodDescription
forward(data, loss_fn)Forward pass, blocks until complete
forward_backward(data, loss_fn, loss_fn_config)Forward + backward pass
forward_backward_custom(data, loss_fn)Forward + backward with custom loss function
optim_step(params, grad_accumulation_normalization)Optimizer step
save_state(name, timeout)Save DCP checkpoint (default timeout: 2700s)
load_state_with_optimizer(path, timeout)Load DCP checkpoint (default timeout: 2700s)
save_weights_for_sampler_ext(name, checkpoint_type, timeout)Save sampler checkpoint for promotion
resolve_checkpoint_path(name, source_job_id)Resolve cross-job checkpoint path
list_checkpoints()List available DCP checkpoints

Checkpoint utilities

For checkpointing, resume, and promote — see the dedicated Checkpoints and Resume page.

Gradient accumulation normalization

Recipe configs expose grad_accumulation_normalization, which is passed to optim_step(...):
from fireworks.training.sdk import GradAccNormalization

client.optim_step(
    adam_params,
    grad_accumulation_normalization=GradAccNormalization.NUM_LOSS_TOKENS,
)
See Loss Functions for how to choose the mode and avoid double-normalization.

Recipe defaults

RecipeDefaultRationale
SFTNoneThe SFT loss is already normalized client-side.
GRPO / RLGradAccNormalization.NUM_LOSS_TOKENSRL losses use server-side per-token normalization by default.
DPONoneThe DPO loss is already normalized client-side.
ORPONoneThe ORPO loss is already normalized client-side.
The cookbook reference documents the config surface and defaults. The conceptual guidance for loss reduction vs. server-side normalization now lives in Loss Functions.

Deprecated managed infra (InfraConfig)

Earlier cookbook releases provisioned trainers and deployments from the recipe layer using InfraConfig, WeightSyncConfig, and the standalone helpers setup_infra / ResourceCleanup / make_reference_client / create_base_reference. Provisioning now lives entirely behind the SDK-managed service client (build_service_client(...)service.create_*), and recipes take trainer=TrainerConfig(...) plus deployment=DeployConfig(...).
This is a breaking change to the recipe-facing interface. The recipe Config no longer accepts infra= or weight_sync=, and setup_infra / ResourceCleanup have been removed. If you are not ready to migrate, simply do not upgrade the SDK + cookbook — pin your current versions and existing code keeps working. Upgrading is recommended (cleaner config, one provisioning path, SDK-owned lifecycle), but it is opt-in: the old and new surfaces do not coexist in one install.

What to change

Before (deprecated)After (current)
Config(infra=InfraConfig(...))Config(trainer=TrainerConfig(...))
InfraConfig.ref_training_shape_idTrainerConfig.reference_training_shape_id
InfraConfig.trainer_timeout_sTrainerConfig.timeout_s
InfraConfig.trainer_replica_countTrainerConfig.replica_count
Config(weight_sync=WeightSyncConfig(weight_sync_interval=N))Config(weight_sync_interval=N) (top-level, rl_loop / igpo_loop; async_rl_loop pins this to 1)
weight_sync.dcp_save_interval=NConfig(dcp_save_interval=N) (top-level, all recipes)
top-level policy_job_id=...TrainerConfig(job_id=...)
setup_infra(rlor_mgr, deploy_mgr, ...)build_service_client(...) (see the DPO API-level example)
create_base_reference() / make_reference_client()service.create_reference_client(...)
with ResourceCleanup(...)cleanup_trainer_on_close=True + service.close() (see Cleanup)
The InfraConfig dataclass is still importable for backward compatibility and now emits a DeprecationWarning when constructed; it is no longer accepted by recipe Config objects.

Get help migrating

The cookbook ships a debug-and-migrate skill at skills/dev/ that walks an agent through porting old InfraConfig / setup_infra scripts to the new TrainerConfig + build_service_client surface (in addition to its day-to-day debugging guidance for weight sync and checkpoint promotion). Point your coding agent at that skill to automate the migration.