Skip to main content

What this is

RLOR trainer jobs and weight-sync-enabled deployments hold GPU resources. Always clean up after experiments — especially if jobs terminate unexpectedly. In new SDK and cookbook code, cleanup is owned by the SDK-managed service client.

Automatic cleanup via the SDK-managed service

Create the service with cleanup options, then close it in finally:
from fireworks.training.sdk import FiretitanServiceClient

service = FiretitanServiceClient.from_firetitan_config(
    api_key=api_key,
    base_url=base_url,
    base_model="accounts/fireworks/models/qwen3-8b",
    tokenizer_model="Qwen/Qwen3-8B",
    lora_rank=0,
    training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    deployment_id="research-serving",
    cleanup_trainer_on_close=True,
    cleanup_deployment_on_close="scale_to_zero",
)

try:
    run_training_loop()
finally:
    service.close()
cleanup_trainer_on_close=True deletes SDK-managed trainers. Separate reference trainers are governed by cleanup_reference_trainer_on_close (default True). cleanup_deployment_on_close="scale_to_zero" releases deployment GPUs while keeping the deployment resource around; use "delete" only when you want to remove the deployment entirely. Cookbook recipes use the same service-client lifecycle internally and close the service through an ExitStack.
The standalone ResourceCleanup context manager and setup_infra helper have been removed from the cookbook. Provisioning and teardown now live behind the SDK-managed service client. See Migrating from the deprecated managed infra.

Trainer inactivity cleanup

Long-running RLOR trainer jobs are automatically stopped after 60 minutes with no tracked activity. The trainer reports this activity to the control plane, and tracked activity includes trainer API operations and active-session heartbeats. When creating a trainer through the REST API (POST /v1/accounts/{account_id}/rlorTrainerJobs), set inactivityTimeout to a positive protobuf JSON duration to choose a different timeout:
{
  "inactivityTimeout": "1800s"
}
When creating a trainer through the legacy manager API, set TrainerJobConfig.inactivity_timeout and pass the config to TrainerJobManager.create(...) or TrainerJobManager.create_and_wait(...):
from datetime import timedelta
from fireworks.training.sdk import TrainerJobConfig

config = TrainerJobConfig(
    base_model="accounts/fireworks/models/qwen3-8b",
    training_shape_ref="accounts/fireworks/trainingShapes/<shape>/versions/<version>",
    inactivity_timeout=timedelta(minutes=30),
)
With firectl, use --inactivity-timeout 30m or --inactivity-timeout 2h. When the value is omitted or set to 0, Fireworks uses the 60-minute default. To disable automatic inactivity cleanup, set disableInactivityCleanup in the REST API, set TrainerJobConfig.disable_inactivity_cleanup=True in the Training SDK, or pass --disable-inactivity-cleanup in firectl. The trainer will not be stopped due to inactivity, and GPU usage continues to accrue while the trainer is running, so delete the trainer when you no longer need it.

Manual compatibility cleanup

If you provisioned resources yourself with TrainerJobManager / DeploymentManager instead of the managed service, delete them directly.

Cleaning up RLOR trainer jobs

import os
from fireworks.training.sdk import TrainerJobManager, DeploymentManager

api_key = os.environ["FIREWORKS_API_KEY"]
base_url = os.environ.get("FIREWORKS_BASE_URL", "https://api.fireworks.ai")

rlor_mgr = TrainerJobManager(api_key=api_key, base_url=base_url)
deploy_mgr = DeploymentManager(api_key=api_key, base_url=base_url)

# Delete known trainer jobs from this run
for job_id in ["<policy-job-id>", "<reference-job-id>"]:
    rlor_mgr.delete(job_id=job_id)

Cleaning up deployments

deploy_mgr.delete(deployment_id="<deployment-id>")
If you want to keep the deployment resource but release GPUs (lighter alternative to delete):
deploy_mgr.scale_to_zero(deployment_id="<deployment-id>")
This sets both minReplicaCount and maxReplicaCount to 0, releasing all accelerators while keeping the deployment available for future scale-up.

Manual cleanup with try/finally

policy_job_id = "<policy-job-id>"
reference_job_id = "<reference-job-id>"
deployment_id = "research-loop-serving"

try:
    run_training_loop()
finally:
    rlor_mgr.delete(policy_job_id)
    rlor_mgr.delete(reference_job_id)
    deploy_mgr.delete(deployment_id)

Checking for leaked resources

Track the IDs you create (trainer job IDs + deployment ID) and clean those explicitly. For broad account-wide discovery, use the Fireworks console or the managed fw.*.list() APIs.

Operational guidance

  • Delete both policy and reference trainers when running GRPO (which uses 2 RLOR jobs).
  • Close the managed service in finally so trainer/reference/deployment cleanup runs on Ctrl+C or exceptions.
  • Don’t delete a trainer while a save_weights_for_sampler operation is in progress — wait for it to complete first.