POST
/
v1
/
accounts
/
{account_id}
/
deployments

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

account_id
string
required

The Account Id

Query Parameters

disableAutoDeploy
boolean

By default, a deployment created with a currently undeployed base model will be deployed to this deployment. If true, this auto-deploy function is disabled.

disableSpeculativeDecoding
boolean

By default, a deployment will use the speculative decoding settings from the base model. If true, this will disable speculative decoding.

Body

application/json
displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

description
string

Description of the deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

state
enum<string>
default: STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Contains detailed message when the last deployment operation fails.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).

autoscalingPolicy
object
baseModel
string
required
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default: ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB
precision
enum<string>
default: PRECISION_UNSPECIFIED

The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

autoTune
object

The performance profile to use for this deployment.

region
enum<string>
default: REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1

Response

200 - application/json
name
string
displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

description
string

Description of the deployment.

createTime
string

The creation time of the deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

purgeTime
string

The time at which the resource will be hard deleted.

createdBy
string

The email address of the user who created this deployment.

state
enum<string>
default: STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Contains detailed message when the last deployment operation fails.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).

replicaCount
integer
autoscalingPolicy
object
baseModel
string
required
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default: ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB
precision
enum<string>
default: PRECISION_UNSPECIFIED

The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
cluster
string

If set, this deployment is deployed to a cloud-premise cluster.

enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

numPeftDeviceCached
integer
deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

autoTune
object

The performance profile to use for this deployment.

region
enum<string>
default: REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1