What this is
This is the default lifecycle for research loops that need serving-quality evaluation during training: create an SDK-managed trainer and deployment, run iterative updates, save sampler weights, sync those weights to the deployment, then sample through the deployment. For production RL, prefer the cookbook recipes. They wrap this same SDK-managed service path and handle batching, reference clients, checkpoints, reconnect, and cleanup.Workflow
- Create the managed service with
FiretitanServiceClient.from_firetitan_config(...). - Create a training client with
service.create_training_client(...). - Create a deployment sampler with
service.create_deployment_sampler(...). - Run train steps:
forward_backward_custom(...)+optim_step(...). - Save sampler weights with
training_client.save_weights_for_sampler(...).result(). - Refresh the sampler with
service.create_deployment_sampler(model_path=saved.path, ...). - Sample and evaluate through the deployment endpoint.
TrainerJobManager, DeploymentManager, or WeightSyncer for the normal SDK flow.
End-to-end example
The only training-shape input you choose below is the shape ID. The SDK resolves the versioned trainer shape and linked deployment shape before launch.1. Bootstrap trainer and deployment
2. Train step with custom objective
3. Save, sync, sample, evaluate
save_weights_for_sampler(...) returns a future whose .result().path is a public sampler snapshot identity, not a raw storage URI. create_deployment_sampler(model_path=...) consumes that identity, syncs it to the deployment, and returns the FireTitan-native deployment sampler. Use service.create_sampling_client(model_path=...) instead if you need the Tinker-shaped sampling client wrapper.
Concurrency control
sample_with_tokens(n=K) fans out K concurrent requests. A concurrency controller prevents overloading the deployment:
AdaptiveConcurrencyController(recommended) — automatically adjusts the concurrency window based on the server’s prefill queue latency. Starts atinitial_windowand grows or shrinks between steps using AIMD.FixedConcurrencyController— a static semaphore with a fixed maximum. Use when you already know the right concurrency for your deployment.
Reference clients
For DPO, GRPO with KL, or any objective that needs frozen-reference logprobs, ask the service for a reference client:- LoRA policy with no explicit
reference_training_shape_idreuses the policy trainer session with adapters disabled. - Full-parameter policy, or any explicit
reference_training_shape_id, uses a separate forward-only reference trainer owned by the service.
Reconnecting to a running trainer
If your client disconnects, re-create the service with the existing trainer job ID. The SDK waits for the trainer, reconnects the training client, and can reuse or reattach the deployment:Cleanup
Close the service when the loop exits:cleanup_trainer_on_close=True deletes SDK-managed trainers. cleanup_deployment_on_close="scale_to_zero" releases deployment GPUs while keeping the deployment resource around for later reuse; use "delete" only when you want to remove the deployment entirely.
Operational guidance
- Start from cookbook recipes for SFT, DPO, ORPO, GRPO, IGPO, and async RL; fork them when you need custom loop behavior.
- Use the managed service as the provisioning boundary in direct SDK code. Manager classes are documented only for compatibility and advanced lifecycle debugging.
- Service mode supports both full-parameter and LoRA tuning. Set
lora_rank=0for full-parameter or a positive integer for LoRA. - Use
save_weights_for_sampler(...)for normal sampler refresh. The SDK tracks the base/delta chain and performs weight sync throughcreate_sampling_client(model_path=...)orcreate_deployment_sampler(model_path=...). - Use
save_state(...)for DCP resume checkpoints. Sampler checkpoints are for serving/evaluation and promotion; DCP checkpoints restore training state. - Store the exact prompt set and sampler snapshot path for every evaluation sweep.
Related guides
- Loss Functions — built-in and custom loss function patterns
- Vision Inputs — fine-tune VLMs with image and text data
- Saving and Loading — checkpoint types and weight sync details
- DeploymentSampler reference — sampling API details
- Cleanup and Teardown — managed service cleanup