What this is
RLOR trainer jobs and hotload-enabled deployments hold GPU resources. Always clean up after experiments — especially if jobs terminate unexpectedly.Cleaning up RLOR trainer jobs
Cleaning up deployments
minReplicaCount and maxReplicaCount to 0, releasing all accelerators while keeping the deployment available for future scale-up.
Automatic cleanup in training scripts
Usetry/finally (or atexit) so cleanup runs on Ctrl+C and exceptions:
Checking for leaked resources
Track the IDs you create (trainer job IDs + deployment ID) and clean those explicitly. For broad account-wide discovery, use the Fireworks console or the managedfw.*.list() APIs.
Operational guidance
- Delete both policy and reference trainers when running GRPO (which uses 2 RLOR jobs).
- Register cleanup on
atexitin your training scripts for automatic cleanup on Ctrl+C or exceptions. - Don’t delete a trainer while a
save_weights_for_sampler_extoperation is in progress — wait for it to complete first.