Skip to main content

What this is

Research teams move faster when they can iterate on objective functions in plain Python and validate each checkpoint in production-like serving conditions. Fireworks Training SDK is built for this workflow.

Why this approach

  • Full-parameter updates maximize headroom for difficult reasoning and alignment tasks where LoRA may underfit.
  • Custom losses in Python eliminate waiting for vendor-specific algorithm implementations — implement GRPO, DPO, or any hybrid objective directly.
  • Serving-integrated evaluation via checkpoint hotloading avoids divergence between offline metrics and user-facing behavior.
  • One platform, two modes: Start with managed jobs for standard objectives, then move to service-mode loops when you need per-step control — without rebuilding infrastructure.

Workflow

  1. Define objective and reward logic in your Python loop.
  2. Run short controlled experiments with frequent checkpoints.
  3. Hotload checkpoints into serving and evaluate with production-style prompts.
  4. Promote only checkpoints that pass both offline and serving evaluations.

Who this is for

  • Research teams doing alignment, RLHF, and reasoning improvement with custom reward functions.
  • ML engineers who want to iterate on training algorithms without managing GPU clusters.
  • Teams transitioning from managed fine-tuning to custom training loops as their requirements grow.

Operational guidance

  • Treat train-state checkpoints, sampler checkpoints, and deployment revisions as a single experiment bundle.
  • Run small regression suites on every hotload candidate before promoting.
  • Version your objective functions alongside training data for reproducibility.