Skip to main content

What this is

Managed jobs let the platform handle the training lifecycle — scheduling, execution, checkpointing, and output model materialization. Use them when your objective fits a supported method and your priority is reliability and operational simplicity.

Managed job types

ResourceObjectiveAPI
Supervised Fine-Tuning (SFT)Cross-entropy on instruction/response pairsfw.supervised_fine_tuning_jobs.*
DPODirect preference optimization on chosen/rejected pairsfw.dpo_jobs.*
Managed RFTReinforcement fine-tuning with built-in RL lossesfw.reinforcement_fine_tuning_jobs.*
Service-mode RLORCustom objectives via Training SDK or Cookbook loopsTrainerJobManager + TrainerJobConfig (SDK) or cookbook recipes

Listing jobs

from fireworks import Fireworks

fw = Fireworks(api_key="<FIREWORKS_API_KEY>", account_id="<ACCOUNT_ID>")

# List all job types
sft_jobs = fw.supervised_fine_tuning_jobs.list()
rft_jobs = fw.reinforcement_fine_tuning_jobs.list()
dpo_jobs = fw.dpo_jobs.list()

for job in sft_jobs:
    print(f"SFT: {job.display_name}{job.state}")

Creating managed jobs

SFT (flat keyword arguments)

job = fw.supervised_fine_tuning_jobs.create(
    dataset="accounts/<ACCOUNT_ID>/datasets/my-dataset",
    base_model="accounts/fireworks/models/qwen3-8b",
    learning_rate=2e-5,
    epochs=3,
    max_context_length=4096,
    lora_rank=16,
    wandb_config={"entity": "my-team", "project": "sft"},
)

DPO (flat keyword arguments)

job = fw.dpo_jobs.create(
    dataset="accounts/<ACCOUNT_ID>/datasets/preference-data",
    base_model="accounts/fireworks/models/qwen3-8b",
    learning_rate=1e-5,
    max_context_length=4096,
)

Managed RFT (with training_config and loss_config)

job = fw.reinforcement_fine_tuning_jobs.create(
    dataset="accounts/<ACCOUNT_ID>/datasets/rl-data",
    training_config={
        "base_model": "accounts/fireworks/models/qwen3-8b",
        "max_context_length": 4096,
        "learning_rate": 1e-5,
    },
    loss_config={
        "method": "GRPO",
        "kl_beta": 0.01,
    },
)

Available loss_config methods

MethodDescription
GRPOGroup Relative Policy Optimization (default for RFT)
DAPODynamic Advantage Policy Optimization
DPODirect Preference Optimization (default for DPO API)
ORPOOdds Ratio Preference Optimization
GSPO_TOKENToken-level GSPO

W&B integration

All managed job types support wandb_config for native Weights & Biases logging:
wandb_config = {
    "entity": "my-team",
    "project": "training-experiment",
}

# Works with SFT, DPO, and RFT jobs
fw.supervised_fine_tuning_jobs.create(
    ...,
    wandb_config=wandb_config,
)

When to switch to service-mode loops

Move from managed jobs to service-mode RLOR loops when you need:
  • Custom loss functions (e.g. hybrid GRPO + DPO, custom reward shaping)
  • Full-parameter tuning with per-step metrics
  • Inference-in-the-loop evaluation via hotloading during training
  • Algorithm research beyond the built-in methods
For custom service-mode loops, prefer fireworks.training.sdk.TrainerJobManager + TrainerJobConfig (see Training SDK Overview).

Operational guidance

  • Use managed jobs when your objective fits supported methods and you want minimal code.
  • Monitor job state by polling fw.<job_type>.get(...) until the job reaches a terminal state.
  • Cancel stuck jobs with fw.<job_type>.cancel(...) to release resources.
  • Delete completed jobs when you no longer need them.