Skip to main content

What this is

Fireworks Training SDK gives teams a flexibility ladder from managed jobs to fully custom training loops. Start managed for standard objectives, then move to Tinker-compatible loops when you need custom losses, full-parameter updates, and tighter experiment control.
ModeBest forObjective controlInfrastructure
Managed jobs (SFT, DPO, RFT)Standard objectives, fast iterationPlatform-definedFully managed
Service-mode SDK loopsCustom losses, algorithm researchFull Python controlYou drive the loop, platform runs GPUs
Full-parameter RFT via service-mode loops is currently in private preview.
New to custom training loops? Start with Core Concepts for an introduction to the architecture, then follow the Quickstart to run a minimal training loop.

Why this approach

  • One platform, two modes: Move from managed baselines to frontier research without rebuilding infrastructure.
  • When to use managed jobs: Standard objectives (SFT, DPO, managed GRPO) with minimal custom code.
  • When to use Training SDK loops: Custom objectives, algorithm research, or fine-grained control over training beyond built-in methods.
  • Production inference in the loop: Hotload checkpoints onto serving deployments for realistic evaluation during training.

System architecture

A control-plane API provisions trainer and deployment resources. Your local Python loop connects to the trainer service, runs custom train steps, and periodically exports checkpoints to a serving deployment for sampling and evaluation.

Installation

SDK (custom loops)

pip install --pre "fireworks-ai[training]" tinker-cookbook

Standalone cookbook recipes (optional)

The cookbook provides ready-to-run training recipes (GRPO, DPO, SFT, ORPO). Install it as a package:
git clone https://github.com/fw-ai/cookbook.git
cd cookbook
pip install -e .
The Training SDK provides FiretitanServiceClient (extends tinker’s ServiceClient with checkpoint_type and session-scoped snapshot naming) and DeploymentSampler (client-side tokenization for training-compatible sampling):
from fireworks.training.sdk import (
    FiretitanServiceClient,
    DeploymentSampler,
    DeploymentManager,
    TrainerJobManager,
    WeightSyncer,
)

Key APIs

Resource setup & teardown (Fireworks Training SDK)

APIPurpose
TrainerJobManager.create_and_wait(config)Create a service-mode trainer and poll until healthy
TrainerJobManager.wait_for_existing(job_id)Wait for an already-existing trainer job to reach RUNNING
TrainerJobManager.resume_and_wait(job_id)Resume a failed/cancelled/paused job and wait
TrainerJobManager.reconnect_and_wait(job_id)Reconnect to a preempted/failed job (handles transitional states)
TrainerJobManager.resolve_training_profile(shape_id)Fetch training shape config from the control plane
TrainerJobManager.delete(job_id)Delete a trainer job
DeploymentManager.create_or_get(config)Create or reuse an inference deployment for sampling/hotload
DeploymentManager.wait_for_ready(deployment_id)Poll until deployment is READY
DeploymentManager.scale_to_zero(deployment_id)Scale to zero replicas without deleting
DeploymentManager.delete(deployment_id)Delete a deployment

Training loop (Fireworks Training SDK + Tinker)

APIPurpose
FiretitanServiceClient(base_url, api_key)Connect to a trainer endpoint (extends tinker ServiceClient)
service.create_training_client(base_model, lora_rank)Create a FiretitanTrainingClient with checkpoint extensions
client.forward(datums, loss_type)Forward pass only (e.g. for reference logprobs)
client.forward_backward_custom(datums, loss_fn)Forward + backward with your custom loss
client.optim_step(tinker.AdamParams(...))Apply optimizer update
client.save_weights_for_sampler_ext(name, checkpoint_type)Export serving-compatible checkpoint with session-scoped naming
client.list_checkpoints()List available DCP checkpoints from the trainer
client.resolve_checkpoint_path(name, source_job_id)Resolve checkpoint input for cross-job resume
client.save_state(name, ttl_seconds)Save full train state (weights + optimizer) for resume
client.load_state_with_optimizer(name)Restore train state for resume
DeploymentSampler(inference_url, model, api_key, tokenizer)Client-side tokenized sampling from a deployment
WeightSyncer(policy_client, deploy_mgr, ...)Manages checkpoint + hotload lifecycle with delta chaining

Workflow

  1. Create resources: Provision a trainer job and (optionally) a hotload-enabled deployment.
  2. Connect a training client: Use FiretitanServiceClient to connect to the trainer endpoint.
  3. Build batches and compute objectives: Construct tinker.Datum objects and implement your loss function in Python.
  4. Iterate: Run forward_backward_custom + optim_step in a loop.
  5. Checkpoint and evaluate: Save checkpoints, hotload onto deployment, sample, and evaluate.

End-to-end example

Bootstrap trainer and deployment

import os
from fireworks.training.sdk import (
    FiretitanServiceClient,
    TrainerJobManager,
    TrainerJobConfig,
    DeploymentManager,
    DeploymentSampler,
    WeightSyncer,
)

api_key = os.environ["FIREWORKS_API_KEY"]
account_id = os.environ.get("FIREWORKS_ACCOUNT_ID", "")
base_url = os.environ.get("FIREWORKS_BASE_URL", "https://api.fireworks.ai")

# Create SDK managers
rlor_mgr = TrainerJobManager(api_key=api_key, account_id=account_id, base_url=base_url)
deploy_mgr = DeploymentManager(api_key=api_key, account_id=account_id, base_url=base_url)

# Create a service-mode RLOR trainer job (polls until healthy)
endpoint = rlor_mgr.create_and_wait(TrainerJobConfig(
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    max_context_length=4096,
    learning_rate=1e-5,
    gradient_accumulation_steps=4,
    display_name="my-experiment",
    hot_load_deployment_id="research-loop-serving",
))

# Connect training client (FiretitanServiceClient adds checkpoint_type + session ID support)
service = FiretitanServiceClient(
    base_url=endpoint.base_url,
    api_key=api_key,
)
training_client = service.create_training_client(
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
)

# Create a hotload-enabled inference deployment for sampling
from fireworks.training.sdk import DeploymentConfig
dep_info = deploy_mgr.create_or_get(DeploymentConfig(
    deployment_id="research-loop-serving",
    base_model="accounts/fireworks/models/qwen3-8b",
    min_replica_count=0,
    max_replica_count=1,
))
deploy_mgr.wait_for_ready("research-loop-serving")

Run a custom update and checkpoint

def objective(data, logprobs_list):
    loss = compute_custom_objective(logprobs_list)
    metrics = {"loss": float(loss.item())}
    return loss, metrics

# Train step
training_client.forward_backward_custom(batch, objective).result()
training_client.optim_step(
    tinker.AdamParams(
        learning_rate=1e-5,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.01,
    )
).result()

# Export checkpoint and hotload onto deployment
# save_weights_for_sampler_ext adds checkpoint_type support and session-scoped naming
result = training_client.save_weights_for_sampler_ext(
    "step_0010",
    checkpoint_type="base",  # "base" for full, "delta" for incremental
)
# result.snapshot_name is the session-qualified name to use for hotloading

TrainerJobConfig reference

TrainerJobManager.create_and_wait(...) accepts a TrainerJobConfig with these fields:
FieldTypeDefaultDescription
base_modelstrBase model name (e.g. "accounts/fireworks/models/qwen3-8b")
lora_rankint0LoRA rank. 0 for full-parameter tuning, or a positive integer (e.g. 16, 64) for LoRA
max_context_lengthint4096Maximum sequence length
learning_ratefloat1e-5Learning rate for the optimizer
gradient_accumulation_stepsint1Number of micro-batches before an optimizer step
node_countint1Number of trainer nodes
display_namestr | NoneNoneHuman-readable trainer name
hot_load_deployment_idstr | NoneNoneDeployment ID used for checkpoint hotload workflows
regionstr | NoneNoneRegion for the job (e.g. "US_VIRGINIA_1"). Auto-resolved when using training shapes.
custom_image_tagstr | NoneNoneOverride trainer image tag
extra_argslist[str] | NoneNoneExtra trainer arguments
accelerator_typestr | NoneNoneAccelerator type override
accelerator_countint | NoneNoneAccelerator count override
skip_validationsboolFalseBypass control-plane validation checks
forward_onlyboolFalseCreate a forward-only trainer (reference model pattern)

DeploymentConfig reference

DeploymentManager.create_or_get(...) accepts a DeploymentConfig with these fields:
FieldTypeDefaultDescription
deployment_idstrStable deployment identifier
base_modelstrBase model name. Must match the trainer’s base model for hotload compatibility.
deployment_shapestr | NoneNoneDeployment shape resource name (overrides accelerator/region)
regionstr"US_VIRGINIA_1"Region for the deployment
min_replica_countint0Minimum replicas (set 0 to scale to zero when idle)
max_replica_countint1Maximum replicas for autoscaling
accelerator_typestr"NVIDIA_H200_141GB"Accelerator type
hot_load_bucket_typestr | None"FW_HOSTED"Hotload storage backend
skip_shape_validationboolFalseBypass deployment shape validation
extra_argslist[str] | NoneNoneExtra serving arguments

W&B integration

For SDK/cookbook loops, configure W&B via cookbook config (WandBConfig) rather than TrainerJobConfig:
from training.recipes.rl_loop import Config, main
from training.utils import WandBConfig

cfg = Config(
    base_model="accounts/fireworks/models/qwen3-8b",
    max_rows=20,
    epochs=1,
    wandb=WandBConfig(
        entity="my-team",
        project="grpo-experiment",
        run_name="grpo-run-001",
    ),
)

main(cfg)

DeploymentManager constructor

DeploymentManager supports separate URLs for control-plane, inference, and hotload operations:
deploy_mgr = DeploymentManager(
    api_key=api_key,
    account_id=account_id,
    base_url="https://api.fireworks.ai",      # Control-plane URL (deployment CRUD)
    inference_url="https://api.fireworks.ai",  # Gateway URL for inference completions (defaults to base_url)
    hotload_api_url="https://api.fireworks.ai",# Gateway URL for hotload operations (defaults to base_url)
)
For most users, all three default to base_url. Separate URLs are useful when the control-plane and gateway have different endpoints (e.g. personal dev gateways).

Operational guidance

  • Service mode supports both full-parameter and LoRA tuning. Set lora_rank=0 for full-parameter or a positive integer (e.g. 16, 64) for LoRA, and match create_training_client(lora_rank=...) accordingly.
  • Starter loops: Use training.recipes.rl_loop (GRPO/DAPO/GSPO/CISPO), dpo_loop, orpo_loop, and sft_loop from the standalone cookbook repo as the current reference implementations.
  • Training shapes: Use TrainerJobManager.resolve_training_profile(shape_id) to auto-populate infra config (region, accelerator, image tag, node count) from the control plane instead of setting them manually.
  • Preemption handling: Use reconnect_and_wait(job_id) to resume preempted trainer jobs — it handles transitional states (CREATING, DELETING) by polling until the job reaches a resumable state.

Common pitfalls

  • Evaluating against stale deployments can hide regressions — always verify the hotloaded checkpoint identity.
  • Under-specified checkpoint metadata makes successful runs hard to reproduce — log step numbers, checkpoint names, and deployment revisions together.
  • Mixing managed-job fields (for example epochs, batch_size) into TrainerJobConfig — these are separate APIs and are ignored by the training SDK manager layer.