Train frontier models with full-parameter reinforcement learning using Tinker-compatible APIs
Early Access FeatureFull parameter RL tuning is currently in private preview and available to select customers.
Join the waitlist to request access.
Full parameter RL tuning is designed for teams that need maximum control over reinforcement learning updates. Unlike LoRA-based RFT, this mode updates all model weights (loraRank=0) while keeping a familiar Tinker-style training loop.
Current preview scope is reinforcement training via RLOR trainer jobs.
Service-mode RLOR trainers currently support full-parameter tuning only. If serviceMode=true, set trainingConfig.loraRank (or SDK lora_rank) to 0; values greater than 0 are rejected.
Custom RL objectives: Implement GRPO, DPO, PPO, or custom reward shaping logic in Python
Tinker-compatible primitives: Use forward(), forward_backward_custom(), and optim_step() directly
Service-mode trainers: Run the trainer as an API service and iterate quickly from your own script
Checkpoint-to-serving path: Save checkpoints and optionally hot-load them into inference deployments
If LoRA-based RFT already meets your quality and latency targets, start there first. Use full parameter tuning when LoRA quality saturates or you need full-weight updates for your use case.
The workflow below reflects the latest cookbook-style setup: create serving infrastructure first, create the RLOR trainer, then connect with FiretitanServiceClient.
If you want the same flow in one file, jump to Single-file starter script.
1
1) Create an inference deployment (Fireworks SDK)
from fireworks.client import LLMbase_model = "accounts/fireworks/models/kimi-k2-5-instruct"deployment_id = "fp-rft-serving"# This deployment can be used as the optional hot-load target.serving_llm = LLM( model=base_model, id=deployment_id, deployment_type="on-demand", min_replica_count=0, max_replica_count=1,)serving_llm.apply()
You control: Data prep, reward/loss logic, sampling strategy, and experiment tracking.Fireworks handles: Distributed trainer orchestration, service endpoint management, checkpoint persistence, and deployment integration.