What this is
Once a service-mode RLOR trainer job reachesRUNNING state, connect a FiretitanServiceClient to the trainer endpoint and create a FiretitanTrainingClient for train steps, checkpointing, and state management.
Setup
FiretitanServiceClient from the Training SDK instead of tinker.ServiceClient. It returns a FiretitanTrainingClient which adds save_weights_for_sampler_ext() (checkpoint_type support) and session-scoped snapshot naming:
Creating the training client
Parameters
| Parameter | Description |
|---|---|
base_url | Trainer endpoint URL (endpoint.base_url or TrainerJobManager.get(job_id)["directRouteHandle"]) |
api_key | Your Fireworks API key |
base_model | Must match the trainer job’s base_model (from TrainerJobConfig) |
lora_rank | Must match trainer creation config (0 for full-parameter tuning) |
user_metadata | Optional dict[str, str] of run metadata |
ValueError is raised if you attempt to create a second training client with the same (base_model, lora_rank) on the same FiretitanServiceClient instance. Create a new FiretitanServiceClient for a separate trainer.
What you can do with the client
Forward pass (get logprobs without training)
Custom forward-backward (train step)
Optimizer step
Save checkpoint for serving
List available checkpoints
Save and restore train state
save_state also accepts an optional ttl_seconds parameter for auto-expiring checkpoints.
Resolve cross-job checkpoint path
Cookbook users: If you are using cookbook recipes, prefer
checkpoint_utils.save_checkpoint and checkpoint_utils.resolve_resume which wrap these methods with structured persistence. See Checkpointing and Hotload.Connecting to an existing trainer
If you already have a running trainer (e.g. from a previous session), connect directly by URL:TrainerJobManager.get(job_id)["directRouteHandle"].
Operational guidance
- Service mode supports both full-parameter and LoRA tuning. Set
lora_rank=0for full-parameter or a positive integer for LoRA. - Use
FiretitanServiceClientinstead oftinker.ServiceClientto getFiretitanTrainingClientwithsave_weights_for_sampler_ext(). - Retry client creation if the trainer is still warming up — poll the job state first.
- All Tinker API calls return futures. Call
.result()to wait for completion.