Training SDK Overview

What this is

Fireworks Training SDK gives teams a flexibility ladder from managed jobs to fully custom training loops. Start managed for standard objectives, then move to Tinker-compatible loops when you need custom losses, full-parameter updates, and tighter experiment control.

Mode	Best for	Objective control	Infrastructure
Managed jobs (SFT, DPO, RFT)	Standard objectives, fast iteration	Platform-defined	Fully managed
Service-mode SDK loops	Custom losses, algorithm research	Full Python control	You drive the loop, platform runs GPUs

Full-parameter RFT via service-mode loops is currently in private preview.

New to custom training loops? Start with Core Concepts for an introduction to the architecture, then follow the Quickstart to run a minimal training loop.

Why this approach

One platform, two modes: Move from managed baselines to frontier research without rebuilding infrastructure.
When to use managed jobs: Standard objectives (SFT, DPO, managed GRPO) with minimal custom code.
When to use Training SDK loops: Custom objectives, algorithm research, or fine-grained control over training beyond built-in methods.
Production inference in the loop: Hotload checkpoints onto serving deployments for realistic evaluation during training.

System architecture

A control-plane API provisions trainer and deployment resources. Your local Python loop connects to the trainer service, runs custom train steps, and periodically exports checkpoints to a serving deployment for sampling and evaluation.

Installation

SDK (custom loops)

pip install --pre "fireworks-ai[training]" tinker-cookbook

Standalone cookbook recipes (optional)

The cookbook provides ready-to-run training recipes (GRPO, DPO, SFT, ORPO). Install it as a package:

git clone https://github.com/fw-ai/cookbook.git
cd cookbook
pip install -e .

The Training SDK provides FiretitanServiceClient (extends tinker’s ServiceClient with checkpoint_type and session-scoped snapshot naming) and DeploymentSampler (client-side tokenization for training-compatible sampling):

from fireworks.training.sdk import (
    FiretitanServiceClient,
    DeploymentSampler,
    DeploymentManager,
    TrainerJobManager,
    WeightSyncer,
)

Key APIs

Resource setup & teardown (Fireworks Training SDK)

API	Purpose
`TrainerJobManager.create_and_wait(config)`	Create a service-mode trainer and poll until healthy
`TrainerJobManager.wait_for_existing(job_id)`	Wait for an already-existing trainer job to reach RUNNING
`TrainerJobManager.resume_and_wait(job_id)`	Resume a failed/cancelled/paused job and wait
`TrainerJobManager.reconnect_and_wait(job_id)`	Reconnect to a preempted/failed job (handles transitional states)
`TrainerJobManager.resolve_training_profile(shape_id)`	Fetch training shape config from the control plane
`TrainerJobManager.delete(job_id)`	Delete a trainer job
`DeploymentManager.create_or_get(config)`	Create or reuse an inference deployment for sampling/hotload
`DeploymentManager.wait_for_ready(deployment_id)`	Poll until deployment is READY
`DeploymentManager.scale_to_zero(deployment_id)`	Scale to zero replicas without deleting
`DeploymentManager.delete(deployment_id)`	Delete a deployment

Training loop (Fireworks Training SDK + Tinker)

API	Purpose
`FiretitanServiceClient(base_url, api_key)`	Connect to a trainer endpoint (extends tinker `ServiceClient`)
`service.create_training_client(base_model, lora_rank)`	Create a `FiretitanTrainingClient` with checkpoint extensions
`client.forward(datums, loss_type)`	Forward pass only (e.g. for reference logprobs)
`client.forward_backward_custom(datums, loss_fn)`	Forward + backward with your custom loss
`client.optim_step(tinker.AdamParams(...))`	Apply optimizer update
`client.save_weights_for_sampler_ext(name, checkpoint_type)`	Export serving-compatible checkpoint with session-scoped naming
`client.list_checkpoints()`	List available DCP checkpoints from the trainer
`client.resolve_checkpoint_path(name, source_job_id)`	Resolve checkpoint input for cross-job resume
`client.save_state(name, ttl_seconds)`	Save full train state (weights + optimizer) for resume
`client.load_state_with_optimizer(name)`	Restore train state for resume
`DeploymentSampler(inference_url, model, api_key, tokenizer)`	Client-side tokenized sampling from a deployment
`WeightSyncer(policy_client, deploy_mgr, ...)`	Manages checkpoint + hotload lifecycle with delta chaining

Workflow

Create resources: Provision a trainer job and (optionally) a hotload-enabled deployment.
Connect a training client: Use FiretitanServiceClient to connect to the trainer endpoint.
Build batches and compute objectives: Construct tinker.Datum objects and implement your loss function in Python.
Iterate: Run forward_backward_custom + optim_step in a loop.
Checkpoint and evaluate: Save checkpoints, hotload onto deployment, sample, and evaluate.

End-to-end example

Bootstrap trainer and deployment

import os
from fireworks.training.sdk import (
    FiretitanServiceClient,
    TrainerJobManager,
    TrainerJobConfig,
    DeploymentManager,
    DeploymentSampler,
    WeightSyncer,
)

api_key = os.environ["FIREWORKS_API_KEY"]
account_id = os.environ.get("FIREWORKS_ACCOUNT_ID", "")
base_url = os.environ.get("FIREWORKS_BASE_URL", "https://api.fireworks.ai")

# Create SDK managers
rlor_mgr = TrainerJobManager(api_key=api_key, account_id=account_id, base_url=base_url)
deploy_mgr = DeploymentManager(api_key=api_key, account_id=account_id, base_url=base_url)

# Create a service-mode RLOR trainer job (polls until healthy)
endpoint = rlor_mgr.create_and_wait(TrainerJobConfig(
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    max_context_length=4096,
    learning_rate=1e-5,
    gradient_accumulation_steps=4,
    display_name="my-experiment",
    hot_load_deployment_id="research-loop-serving",
))

# Connect training client (FiretitanServiceClient adds checkpoint_type + session ID support)
service = FiretitanServiceClient(
    base_url=endpoint.base_url,
    api_key=api_key,
)
training_client = service.create_training_client(
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
)

# Create a hotload-enabled inference deployment for sampling
from fireworks.training.sdk import DeploymentConfig
dep_info = deploy_mgr.create_or_get(DeploymentConfig(
    deployment_id="research-loop-serving",
    base_model="accounts/fireworks/models/qwen3-8b",
    min_replica_count=0,
    max_replica_count=1,
))
deploy_mgr.wait_for_ready("research-loop-serving")

Run a custom update and checkpoint

def objective(data, logprobs_list):
    loss = compute_custom_objective(logprobs_list)
    metrics = {"loss": float(loss.item())}
    return loss, metrics

# Train step
training_client.forward_backward_custom(batch, objective).result()
training_client.optim_step(
    tinker.AdamParams(
        learning_rate=1e-5,
        beta1=0.9,
        beta2=0.999,
        eps=1e-8,
        weight_decay=0.01,
    )
).result()

# Export checkpoint and hotload onto deployment
# save_weights_for_sampler_ext adds checkpoint_type support and session-scoped naming
result = training_client.save_weights_for_sampler_ext(
    "step_0010",
    checkpoint_type="base",  # "base" for full, "delta" for incremental
)
# result.snapshot_name is the session-qualified name to use for hotloading

`TrainerJobConfig` reference

TrainerJobManager.create_and_wait(...) accepts a TrainerJobConfig with these fields:

Field	Type	Default	Description
`base_model`	`str`	—	Base model name (e.g. `"accounts/fireworks/models/qwen3-8b"`)
`lora_rank`	`int`	`0`	LoRA rank. `0` for full-parameter tuning, or a positive integer (e.g. `16`, `64`) for LoRA
`max_context_length`	`int`	`4096`	Maximum sequence length
`learning_rate`	`float`	`1e-5`	Learning rate for the optimizer
`gradient_accumulation_steps`	`int`	`1`	Number of micro-batches before an optimizer step
`node_count`	`int`	`1`	Number of trainer nodes
`display_name`	`str \| None`	`None`	Human-readable trainer name
`hot_load_deployment_id`	`str \| None`	`None`	Deployment ID used for checkpoint hotload workflows
`region`	`str \| None`	`None`	Region for the job (e.g. `"US_VIRGINIA_1"`). Auto-resolved when using training shapes.
`custom_image_tag`	`str \| None`	`None`	Override trainer image tag
`extra_args`	`list[str] \| None`	`None`	Extra trainer arguments
`accelerator_type`	`str \| None`	`None`	Accelerator type override
`accelerator_count`	`int \| None`	`None`	Accelerator count override
`skip_validations`	`bool`	`False`	Bypass control-plane validation checks
`forward_only`	`bool`	`False`	Create a forward-only trainer (reference model pattern)

`DeploymentConfig` reference

DeploymentManager.create_or_get(...) accepts a DeploymentConfig with these fields:

Field	Type	Default	Description
`deployment_id`	`str`	—	Stable deployment identifier
`base_model`	`str`	—	Base model name. Must match the trainer’s base model for hotload compatibility.
`deployment_shape`	`str \| None`	`None`	Deployment shape resource name (overrides accelerator/region)
`region`	`str`	`"US_VIRGINIA_1"`	Region for the deployment
`min_replica_count`	`int`	`0`	Minimum replicas (set `0` to scale to zero when idle)
`max_replica_count`	`int`	`1`	Maximum replicas for autoscaling
`accelerator_type`	`str`	`"NVIDIA_H200_141GB"`	Accelerator type
`hot_load_bucket_type`	`str \| None`	`"FW_HOSTED"`	Hotload storage backend
`skip_shape_validation`	`bool`	`False`	Bypass deployment shape validation
`extra_args`	`list[str] \| None`	`None`	Extra serving arguments

W&B integration

For SDK/cookbook loops, configure W&B via cookbook config (WandBConfig) rather than TrainerJobConfig:

from training.recipes.rl_loop import Config, main
from training.utils import WandBConfig

cfg = Config(
    base_model="accounts/fireworks/models/qwen3-8b",
    max_rows=20,
    epochs=1,
    wandb=WandBConfig(
        entity="my-team",
        project="grpo-experiment",
        run_name="grpo-run-001",
    ),
)

main(cfg)

`DeploymentManager` constructor

DeploymentManager supports separate URLs for control-plane, inference, and hotload operations:

deploy_mgr = DeploymentManager(
    api_key=api_key,
    account_id=account_id,
    base_url="https://api.fireworks.ai",      # Control-plane URL (deployment CRUD)
    inference_url="https://api.fireworks.ai",  # Gateway URL for inference completions (defaults to base_url)
    hotload_api_url="https://api.fireworks.ai",# Gateway URL for hotload operations (defaults to base_url)
)

For most users, all three default to base_url. Separate URLs are useful when the control-plane and gateway have different endpoints (e.g. personal dev gateways).

Operational guidance

Service mode supports both full-parameter and LoRA tuning. Set lora_rank=0 for full-parameter or a positive integer (e.g. 16, 64) for LoRA, and match create_training_client(lora_rank=...) accordingly.
Starter loops: Use training.recipes.rl_loop (GRPO/DAPO/GSPO/CISPO), dpo_loop, orpo_loop, and sft_loop from the standalone cookbook repo as the current reference implementations.
Training shapes: Use TrainerJobManager.resolve_training_profile(shape_id) to auto-populate infra config (region, accelerator, image tag, node count) from the control plane instead of setting them manually.
Preemption handling: Use reconnect_and_wait(job_id) to resume preempted trainer jobs — it handles transitional states (CREATING, DELETING) by polling until the job reaches a resumable state.

Common pitfalls

Evaluating against stale deployments can hide regressions — always verify the hotloaded checkpoint identity.
Under-specified checkpoint metadata makes successful runs hard to reproduce — log step numbers, checkpoint names, and deployment revisions together.
Mixing managed-job fields (for example epochs, batch_size) into TrainerJobConfig — these are separate APIs and are ignored by the training SDK manager layer.

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

What this is

Why this approach

System architecture

Installation

SDK (custom loops)

Standalone cookbook recipes (optional)

Key APIs

Resource setup & teardown (Fireworks Training SDK)

Training loop (Fireworks Training SDK + Tinker)

Workflow

End-to-end example

Bootstrap trainer and deployment

Run a custom update and checkpoint

`TrainerJobConfig` reference

`DeploymentConfig` reference

W&B integration

`DeploymentManager` constructor

Operational guidance

Common pitfalls

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​What this is

​Why this approach

​System architecture

​Installation

​SDK (custom loops)

​Standalone cookbook recipes (optional)

​Key APIs

​Resource setup & teardown (Fireworks Training SDK)

​Training loop (Fireworks Training SDK + Tinker)

​Workflow

​End-to-end example

​Bootstrap trainer and deployment

​Run a custom update and checkpoint

​TrainerJobConfig reference

​DeploymentConfig reference

​W&B integration

​DeploymentManager constructor

​Operational guidance

​Common pitfalls

​Related Guides

What this is

Why this approach

System architecture

Installation

SDK (custom loops)

Standalone cookbook recipes (optional)

Key APIs

Resource setup & teardown (Fireworks Training SDK)

Training loop (Fireworks Training SDK + Tinker)

Workflow

End-to-end example

Bootstrap trainer and deployment

Run a custom update and checkpoint

`TrainerJobConfig` reference

`DeploymentConfig` reference

W&B integration

`DeploymentManager` constructor

Operational guidance

Common pitfalls

Related Guides