What is the Cookbook?
The Fireworks Cookbook is a collection of training recipes and utilities built on top of the Training API. It provides config-driven training loops that handle trainer provisioning, data loading, tokenization, gradient accumulation, checkpointing, and cleanup automatically. The cookbook is optional — everything it does can be done with the API directly. Use the cookbook when you want a working training loop quickly; use the API when you need full control.Installation
Available recipes
| Recipe | Module | Use case |
|---|---|---|
| RL (primary, experimental) | training.recipes.async_rl_loop | Reinforcement learning — you write a rollout function, the recipe owns the loop. Async rollout/training overlap by default; fully synchronous on-policy with synchronous_training=True. GRPO, importance sampling, DAPO, DRO, GSPO, CISPO. See Cookbook RL. No backward-compatibility guarantee. |
| RL (simpler, synchronous) | training.recipes.rl_loop | Synchronous on-policy GRPO scaffold — reach for it when you want the server-side fast loss path or don’t need rollout/train overlap |
| IGPO | training.recipes.igpo_loop | Information Gain-based Policy Optimization — turn-level IG rewards for multi-turn agent trajectories (extends GRPO) |
| DPO | training.recipes.dpo_loop | Direct preference optimization from chosen/rejected pairs |
| SFT | training.recipes.sft_loop | Supervised fine-tuning with cross-entropy loss |
| Distillation | training.recipes.distillation_loop | On-policy sampled-token distillation with one teacher or routed multi-teacher MOPD |
| ORPO | training.recipes.orpo_loop | Odds ratio preference optimization |
Config and main, set your config, and call main(cfg). Trainer and deployment provisioning is handled internally by the recipe — you describe what you want with TrainerConfig / DeployConfig, and the SDK attaches or creates the resources.
All launch examples below use trainer=TrainerConfig(training_shape_id=...) for explicit shape selection. Cookbook recipes can also auto-select validated shapes when training_shape_id is unset. The main run-level trainer knob you may set alongside a shape is replica_count for replicated HSDP launches; reference shapes can usually be left unset because the cookbook auto-selects or uses a shared-session reference when appropriate.
If you want field-level details about what a training shape controls and what stays configurable, see Training Shapes and the Cookbook Reference.
InfraConfig and the standalone setup_infra / ResourceCleanup helpers are deprecated and removed from the recipe surface. Recipes now take trainer=TrainerConfig(...) (and deployment=DeployConfig(...) for RL). See Migrating from the deprecated managed infra.Quick example: SFT
Quick example: GRPO
W&B logging
All cookbook recipes accept aWandBConfig to stream metrics to Weights & Biases:
Vision-language model support
All cookbook recipes support VLM fine-tuning. Use a VLM training shape and tokenizer, and provide multimodal datasets withimage_url content. See Vision Inputs for dataset format and examples.
Next steps
- Cookbook SFT — supervised fine-tuning
- Cookbook DPO — preference optimization with pairwise data
- Cookbook RL (GRPO) — full GRPO walkthrough with reward functions
- Cookbook Distillation — OPD and routed MOPD dataset format
- Vision Inputs — fine-tune VLMs with image and text data
- Cookbook Reference — all config classes and parameters