New to the training SDK? Read Core Concepts first for background on how the architecture works.
Prerequisites
Step 1: Provision a trainer
Create a service-mode RLOR trainer job. This allocates GPUs and loads the base model.Step 2: Connect the training client
FiretitanServiceClient connects your local Python process to the remote trainer.
Step 3: Build training data
Each training example is wrapped in a Datum — a tokenized sequence with per-token weights that tell the loss function which tokens to train on.Step 4: Write a loss function
The loss function receives the datums and per-token logprobs (as autograd tensors from the GPU). For SFT, we compute negative log-likelihood over response tokens.logprobs_list[i]hasrequires_grad=True— your loss must be differentiable through it- Use
torch.dot()for weighted sums — it correctly propagates gradients - Return
(scalar_loss, metrics_dict)
Step 5: Train
Step 6: Save a checkpoint
Export the trained weights for serving:Step 7: Clean up
Full script
Complete quickstart script
Complete quickstart script
Next steps
- Core Concepts — deeper explanation of the architecture and abstractions
- Custom Train Step — detailed API for
forward_backward_custom, datum construction, and gradient accumulation - GRPO Example — on-policy and off-policy reinforcement learning
- DPO Example — preference optimization with pairwise data
- Checkpointing and Hotload — base/delta checkpoints and live deployment updates