What this is
Research teams move faster when they can iterate on objective functions in plain Python and validate each checkpoint in production-like serving conditions. Fireworks Training SDK is built for this workflow.Why this approach
- Full-parameter updates maximize headroom for difficult reasoning and alignment tasks where LoRA may underfit.
- Custom losses in Python eliminate waiting for vendor-specific algorithm implementations — implement GRPO, DPO, or any hybrid objective directly.
- Serving-integrated evaluation via checkpoint hotloading avoids divergence between offline metrics and user-facing behavior.
- One platform, two modes: Start with managed jobs for standard objectives, then move to service-mode loops when you need per-step control — without rebuilding infrastructure.
Workflow
- Define objective and reward logic in your Python loop.
- Run short controlled experiments with frequent checkpoints.
- Hotload checkpoints into serving and evaluate with production-style prompts.
- Promote only checkpoints that pass both offline and serving evaluations.
Who this is for
- Research teams doing alignment, RLHF, and reasoning improvement with custom reward functions.
- ML engineers who want to iterate on training algorithms without managing GPU clusters.
- Teams transitioning from managed fine-tuning to custom training loops as their requirements grow.
Operational guidance
- Treat train-state checkpoints, sampler checkpoints, and deployment revisions as a single experiment bundle.
- Run small regression suites on every hotload candidate before promoting.
- Version your objective functions alongside training data for reproducibility.