Train frontier models with full-parameter reinforcement learning using Tinker-compatible APIs
Early Access FeatureFull parameter RL tuning is currently in private preview and available to select customers.
Join the waitlist to request access.
Full parameter RL tuning is designed for teams that need maximum control over reinforcement learning updates. Unlike LoRA-based RFT, this mode updates all model weights (loraRank=0) while keeping a familiar Tinker-style training loop.
Current preview scope is reinforcement training via RLOR trainer jobs.
Service-mode RLOR trainers currently support full-parameter tuning only. If
serviceMode=true, set trainingConfig.loraRank (or SDK lora_rank) to 0;
values greater than 0 are rejected.
If you use hotLoadDeploymentId with Fireworks-hosted RLOR trainers, the
serving deployment typically uses FW_HOSTED hot-load plumbing under the
hood. Fireworks manages the bucket URL/path and trainer-side snapshot uploads,
so you usually do not need the manual external-bucket setup, upload
layout, or manual hot-load signaling steps from RL Rollouts with Your Own
Trainer unless you are bringing your own
trainer. If you do want the non-FW_HOSTED / external-bucket BYOT path
instead, treat that as a private-preview integration and coordinate enablement
with Fireworks first.
Custom RL objectives: Implement GRPO, DPO, PPO, or custom reward shaping logic in Python
Tinker-compatible primitives: Use forward(), forward_backward_custom(), and optim_step() directly
Service-mode trainers: Run the trainer as an API service and iterate quickly from your own script
Checkpoint-to-serving path: Save checkpoints and optionally hot-load them into inference deployments
If LoRA-based RFT already meets your quality and latency targets, start there
first. Use full parameter tuning when LoRA quality saturates or you need
full-weight updates for your use case.
The workflow below reflects the latest cookbook-style setup: create serving infrastructure first, create the RLOR trainer, then connect with FiretitanServiceClient.
If you want the same flow in one file, jump to Single-file starter script.
1
1) Create an inference deployment (Fireworks SDK)
from fireworks.client import LLMbase_model = "accounts/fireworks/models/kimi-k2-5-instruct"deployment_id = "fp-rft-serving"# This deployment can be used as the optional hot-load target.serving_llm = LLM( model=base_model, id=deployment_id, deployment_type="on-demand", min_replica_count=0, max_replica_count=1,)serving_llm.apply()
You control: Data prep, reward/loss logic, sampling strategy, and experiment tracking.Fireworks handles: Distributed trainer orchestration, service endpoint management, checkpoint persistence, and deployment integration.