Create Deployment
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
The Account Id
Query Parameters
By default, a deployment created with a currently undeployed base model will be deployed to this deployment. If true, this auto-deploy function is disabled.
By default, a deployment will use the speculative decoding settings from the base model. If true, this will disable speculative decoding.
Body
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The time at which this deployment will automatically be deleted.
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Contains detailed message when the last deployment operation fails.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
If true, PEFT addons are enabled for this deployment.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The performance profile to use for this deployment.
The geographic region where the deployment is located.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
Response
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The creation time of the deployment.
The time at which this deployment will automatically be deleted.
The time at which the resource will be hard deleted.
The email address of the user who created this deployment.
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Contains detailed message when the last deployment operation fails.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
If set, this deployment is deployed to a cloud-premise cluster.
If true, PEFT addons are enabled for this deployment.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The performance profile to use for this deployment.
The geographic region where the deployment is located.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
Was this page helpful?