What this is
This is the default lifecycle for research loops: bootstrap a trainer, run iterative updates, export checkpoints, then sample through deployment endpoints for realistic evaluation.Key APIs
| API | Purpose |
|---|---|
FiretitanServiceClient | Connect your local loop to the trainer service |
TrainingClient.forward_backward_custom | Compute gradients with your custom objective |
TrainingClient.forward | Forward-only pass (e.g. for reference logprobs) |
TrainingClient.optim_step | Apply optimization update |
FiretitanTrainingClient.save_weights_for_sampler_ext | Export serving-compatible checkpoint (returns SaveSamplerResult) |
DeploymentManager + WeightSyncer | Hotload checkpoints and track base/delta state |
DeploymentSampler | Client-side tokenized sampling from deployment endpoints |
Workflow
- Create resources: a deployment (
DeploymentManager) and a service-mode trainer (TrainerJobManager). - Connect a Tinker training client from your Python loop.
- Run train steps:
forward_backward_custom+optim_stepin a loop. - Save checkpoints at regular intervals using base/delta pattern.
- Hotload the checkpoint onto your serving deployment.
- Sample and evaluate through the deployment endpoint (typically via
DeploymentSampler). - Record metrics and decide whether to continue or branch experiments.
End-to-end example
1. Bootstrap
2. Train step with custom objective
3. Checkpoint, hotload, and evaluate
Sampling with token IDs (for training)
For training scripts that need token IDs and logprobs (e.g. GRPO, DPO), useDeploymentSampler which handles client-side tokenization via a HuggingFace tokenizer and returns structured SampledCompletion objects:
SampledCompletion fields
| Field | Type | Description |
|---|---|---|
text | str | Decoded completion text |
full_tokens | List[int] | Prompt + completion token IDs |
prompt_len | int | Number of prompt tokens |
finish_reason | str | "stop", "length", etc. |
completion_len | int | Number of completion tokens |
inference_logprobs | List[float] | None | Per-token logprobs (when logprobs=True is passed) |
logprobs_echoed | bool | True when echo=True was used — logprobs are training-aligned (P+C-1 entries) |
routing_matrices | List[str] | None | Base64-encoded per-token routing matrices for MoE Router Replay (R3) |
logprobs=True:
Sequence length filtering
sample_with_tokens supports max_seq_len for automatic filtering:
- Prompt pre-filter: If the tokenized prompt already meets or exceeds
max_seq_len, the method returns an empty list immediately — no inference call is made. - Completion post-filter: After sampling, any completion whose full token sequence (prompt + completion) exceeds
max_seq_lenis silently dropped.
Operational guidance
- Service mode supports both full-parameter and LoRA tuning. Set
lora_rank=0for full-parameter or a positive integer (e.g.16,64) for LoRA, and matchcreate_training_client(lora_rank=...)accordingly. - Use
checkpoint_type="base"for the first checkpoint, then"delta"for subsequent ones to reduce save/transfer time. - Keep checkpoint intervals predictable so evaluation comparisons are stable.
- Store the exact prompt set used for each evaluation sweep for reproducibility.
Common pitfalls
- Sampling from trainer internals instead of deployment endpoints can skew results — always evaluate through the serving path.
- Missing checkpoint-to-deployment traceability makes rollback risky — log checkpoint names alongside metrics.
- Stale deployments: Always verify the hotloaded checkpoint identity matches what you expect before sampling.