RLOR Service Job Lifecycle

What this is

Service-mode RLOR jobs provision GPU-backed trainer endpoints that your custom Python loop connects to via the Tinker SDK. In the current training SDK, job lifecycle is handled by TrainerJobManager.

Creating a service-mode trainer job

Use TrainerJobManager.create_and_wait(...) with TrainerJobConfig:

import os
from fireworks.training.sdk import TrainerJobManager, TrainerJobConfig

api_key = os.environ["FIREWORKS_API_KEY"]
account_id = os.environ.get("FIREWORKS_ACCOUNT_ID", "")
base_url = os.environ.get("FIREWORKS_BASE_URL", "https://api.fireworks.ai")

rlor_mgr = TrainerJobManager(
    api_key=api_key,
    account_id=account_id,
    base_url=base_url,
)

endpoint = rlor_mgr.create_and_wait(TrainerJobConfig(
    base_model="accounts/fireworks/models/qwen3-8b",
    lora_rank=0,
    max_context_length=4096,
    learning_rate=1e-5,
    gradient_accumulation_steps=4,
    display_name="grpo-policy-trainer",
    hot_load_deployment_id="my-serving-deployment",
))

# Ready-to-use trainer endpoint
print(endpoint.base_url)  # https://<trainer-endpoint>
print(endpoint.job_id)    # <JOB_ID>
print(endpoint.job_name)  # accounts/<ACCOUNT_ID>/rlorTrainerJobs/<JOB_ID>

Create parameters reference

Parameter	Type	Description
`base_model`	`str`	Required. Base model for the trainer.
`lora_rank`	`int`	`0` for full-parameter training, `>0` for LoRA.
`max_context_length`	`int`	Max sequence length.
`learning_rate`	`float`	Learning rate for trainer-side optimizer state.
`gradient_accumulation_steps`	`int`	Number of micro-batches before optimizer step.
`node_count`	`int`	Number of trainer nodes.
`hot_load_deployment_id`	`str`	Link this trainer to a deployment for checkpoint uploads and hotloading.
`display_name`	`str`	Human-readable name for the job.
`region`	`str`	Region override (optional).
`custom_image_tag`	`str`	Trainer image tag override (optional).
`extra_args`	`list[str]`	Extra trainer args (for example `--forward-only`).
`accelerator_type` / `accelerator_count`	`str` / `int`	Accelerator overrides (optional).
`skip_validations`	`bool`	Bypass control-plane validation checks.
`forward_only`	`bool`	Mark the job as forward-only.

Inspecting a job

status = rlor_mgr.get(job_id=endpoint.job_id)
print(status["state"])              # JOB_STATE_RUNNING
print(status["directRouteHandle"])  # Trainer endpoint URL

Job states

State	Meaning
`JOB_STATE_CREATING`	Resources being provisioned
`JOB_STATE_PENDING`	Queued, waiting for GPU availability
`JOB_STATE_RUNNING`	Trainer is ready — you can connect a Tinker client
`JOB_STATE_IDLE`	Service-mode job is idle (no active training)
`JOB_STATE_COMPLETED`	Job finished successfully
`JOB_STATE_FAILED`	Job failed
`JOB_STATE_CANCELLED`	Job was cancelled

Waiting for readiness

create_and_wait(...) and wait_for_existing(...) already block until the endpoint is healthy. For an existing job:

existing = rlor_mgr.wait_for_existing(job_id="<existing-job-id>")
print(existing.base_url)

Resuming a job

endpoint = rlor_mgr.resume_and_wait(job_id="<job-id>")
print(endpoint.base_url)

Reconnecting after preemption

reconnect_and_wait handles pod preemption and transient failures. It waits for the job to reach a resumable state (tolerating transitional states like CREATING or DELETING), resumes it, then polls until the endpoint is healthy:

endpoint = rlor_mgr.reconnect_and_wait(
    job_id="<job-id>",
    timeout_s=600,
    max_wait_for_resumable_s=120,
)
print(endpoint.base_url)

This is more robust than resume_and_wait() because it retries when the job is in a transitional state (e.g. the control plane is still processing the pod death).

Parameters

Parameter	Type	Default	Description
`job_id`	`str`	—	The RLOR job ID to reconnect
`poll_interval_s`	`float`	`5.0`	Seconds between health checks after resume
`timeout_s`	`float`	`600`	Overall timeout for the job to become RUNNING
`max_wait_for_resumable_s`	`float`	`120`	Max seconds to wait for a resumable state (FAILED/CANCELLED/PAUSED/COMPLETED)

Deleting a job

Always clean up trainer jobs when done to release GPU resources:

rlor_mgr.delete(job_id="<job-id>")

Resolving training shapes

Training shapes bundle region, accelerator, image tag, node count, and sharding config into a single resolved profile:

profile = rlor_mgr.resolve_training_profile("ts-qwen3-8b-policy")
print(profile.accelerator_type)   # e.g. "NVIDIA_B200_192GB"
print(profile.trainer_image_tag)   # e.g. "0.0.0-dev-..."
print(profile.node_count)          # e.g. 1
print(profile.pipeline_parallelism)  # e.g. 1

The returned TrainingShapeProfile contains: training_shape_version, trainer_image_tag, max_supported_context_length, node_count, deployment_shape_version, deployment_image_tag, accelerator_type, accelerator_count, base_model_weight_precision, pipeline_parallelism.

Operational guidance

Service mode supports both full-parameter and LoRA tuning. Set lora_rank=0 for full-parameter or a positive integer for LoRA.
Set hot_load_deployment_id when you plan to hotload checkpoints onto a deployment. This configures the checkpoint upload path.
Clean up jobs when your experiment is done — trainer jobs hold GPU resources.
Use display_name to identify jobs in logs and in the Fireworks console.
Use reconnect_and_wait for long-running experiments where pod preemption is possible. It handles transitional states and auto-resumes.
Use training shapes (resolve_training_profile) to auto-populate infra config instead of manually setting region, accelerator, and image tag.

API Reference

Inference

Training SDK

Deployments

Fine-tuning

Evals

Multimedia

Admin

What this is

Creating a service-mode trainer job

Create parameters reference

Inspecting a job

Job states

Waiting for readiness

Resuming a job

Reconnecting after preemption

Parameters

Deleting a job

Resolving training shapes

Operational guidance

API Reference

Inference

Training SDK

Deployments

Fine-tuning

Evals

Multimedia

Admin

​What this is

​Creating a service-mode trainer job

​Create parameters reference

​Inspecting a job

​Job states

​Waiting for readiness

​Resuming a job

​Reconnecting after preemption

​Parameters

​Deleting a job

​Resolving training shapes

​Operational guidance

​Related Guides

What this is

Creating a service-mode trainer job

Create parameters reference

Inspecting a job

Job states

Waiting for readiness

Resuming a job

Reconnecting after preemption

Parameters

Deleting a job

Resolving training shapes

Operational guidance

Related Guides