What this is
Training loops commonly pair with dedicated deployments that act as sampling and evaluation endpoints. For on-policy training (GRPO), the deployment is hotloaded with the latest policy weights so sampled completions come from the current model.Creating a hotload-enabled deployment
DeploymentConfig parameters
| Field | Type | Default | Description |
|---|---|---|---|
deployment_id | str | — | Stable ID per experiment family |
base_model | str | — | Must match trainer base model for hotload compatibility |
deployment_shape | str | None | None | Deployment shape resource name (overrides accelerator/region) |
region | str | "US_VIRGINIA_1" | Region for the deployment |
min_replica_count | int | 0 | Minimum replicas. Set 0 to scale to zero when idle |
max_replica_count | int | 1 | Maximum replicas for autoscaling |
accelerator_type | str | "NVIDIA_H200_141GB" | Accelerator type |
hot_load_bucket_type | str | None | "FW_HOSTED" | Hotload storage backend |
skip_shape_validation | bool | False | Bypass deployment shape validation |
extra_args | list[str] | None | None | Extra serving arguments |
DeploymentManager constructor
DeploymentManager supports separate URLs for control-plane, inference, and hotload traffic:
Inspecting deployment status
Linking deployment to RLOR trainer
When creating an RLOR trainer job, sethot_load_deployment_id so the trainer knows where to upload checkpoints:
Sampling from the deployment
For training/eval loops that need token IDs and logprobs, useDeploymentSampler:
Scaling to zero
Release GPU resources without deleting the deployment:minReplicaCount and maxReplicaCount to 0, releasing all accelerators while keeping the deployment resource available for future scale-up.
Operational guidance
- Keep deployment IDs stable per experiment family for easier rollbacks and metric comparisons.
- Use
min_replica_count=0for development to avoid idle GPU costs. - Use
scale_to_zeroafter training completes as a lighter alternative todelete— the deployment can be scaled back up without recreation. - Create the deployment before the trainer so the trainer can be linked at creation time.
- Use
deployment_shapewhen the control plane has a pre-validated shape for your model — it auto-configures accelerator type, world size, and serving args. - Delete deployments when experiments are done (see Cleanup).