Get Deployment
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Response
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
, NVIDIA_H200_141GB
The performance profile to use for this deployment.
If set, this deployment is deployed to a cloud-premise cluster.
The email address of the user who created this deployment.
The creation time of the deployment.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
Description of the deployment.
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
If true, PEFT addons are enabled for this deployment.
The time at which this deployment will automatically be deleted.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.
The minimum number of replicas. If not specified, the default is 0.
The length of previous input sequence to be considered for N-gram speculation.
The precision with which the model should be served.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
, FP8_MM_V2
, FP8_V2
, FP8_MM_KV_ATTN_V2
The time at which the resource will be hard deleted.
The geographic region where the deployment is located.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
, US_TEXAS_1
, US_ILLINOIS_2
, EU_FRANKFURT_1
, US_TEXAS_2
, EU_PARIS_1
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Detailed status information regarding the most recent operation.
Was this page helpful?