curl --request POST \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deployments/{deployment_id}:undelete \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{}'{
"baseModel": "<string>",
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"expireTime": "2023-11-07T05:31:56Z",
"purgeTime": "2023-11-07T05:31:56Z",
"deleteTime": "2023-11-07T05:31:56Z",
"state": "STATE_UNSPECIFIED",
"status": {
"code": "OK",
"message": "<string>"
},
"minReplicaCount": 123,
"maxReplicaCount": 123,
"desiredReplicaCount": 123,
"replicaCount": 123,
"autoscalingPolicy": {
"scaleUpWindow": "<string>",
"scaleDownWindow": "<string>",
"scaleToZeroWindow": "<string>",
"loadTargets": {}
},
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"cluster": "<string>",
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"enableSessionAffinity": true,
"directRouteApiKeys": [
"<string>"
],
"numPeftDeviceCached": 123,
"directRouteType": "DIRECT_ROUTE_TYPE_UNSPECIFIED",
"directRouteHandle": "<string>",
"deploymentTemplate": "<string>",
"autoTune": {
"longPrompt": true
},
"placement": {
"region": "REGION_UNSPECIFIED",
"multiRegion": "MULTI_REGION_UNSPECIFIED",
"regions": [
"REGION_UNSPECIFIED"
]
},
"region": "REGION_UNSPECIFIED",
"updateTime": "2023-11-07T05:31:56Z",
"disableDeploymentSizeValidation": true,
"enableMtp": true,
"enableHotLoad": true,
"hotLoadBucketType": "BUCKET_TYPE_UNSPECIFIED",
"enableHotReloadLatestAddon": true,
"deploymentShape": "<string>",
"activeModelVersion": "<string>",
"targetModelVersion": "<string>"
}curl --request POST \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deployments/{deployment_id}:undelete \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{}'{
"baseModel": "<string>",
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"expireTime": "2023-11-07T05:31:56Z",
"purgeTime": "2023-11-07T05:31:56Z",
"deleteTime": "2023-11-07T05:31:56Z",
"state": "STATE_UNSPECIFIED",
"status": {
"code": "OK",
"message": "<string>"
},
"minReplicaCount": 123,
"maxReplicaCount": 123,
"desiredReplicaCount": 123,
"replicaCount": 123,
"autoscalingPolicy": {
"scaleUpWindow": "<string>",
"scaleDownWindow": "<string>",
"scaleToZeroWindow": "<string>",
"loadTargets": {}
},
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"cluster": "<string>",
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"enableSessionAffinity": true,
"directRouteApiKeys": [
"<string>"
],
"numPeftDeviceCached": 123,
"directRouteType": "DIRECT_ROUTE_TYPE_UNSPECIFIED",
"directRouteHandle": "<string>",
"deploymentTemplate": "<string>",
"autoTune": {
"longPrompt": true
},
"placement": {
"region": "REGION_UNSPECIFIED",
"multiRegion": "MULTI_REGION_UNSPECIFIED",
"regions": [
"REGION_UNSPECIFIED"
]
},
"region": "REGION_UNSPECIFIED",
"updateTime": "2023-11-07T05:31:56Z",
"disableDeploymentSizeValidation": true,
"enableMtp": true,
"enableHotLoad": true,
"hotLoadBucketType": "BUCKET_TYPE_UNSPECIFIED",
"enableHotReloadLatestAddon": true,
"deploymentShape": "<string>",
"activeModelVersion": "<string>",
"targetModelVersion": "<string>"
}Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>
The body is of type object.
A successful response.
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The creation time of the deployment.
The time at which this deployment will automatically be deleted.
The time at which the resource will be hard deleted.
The time at which the resource will be soft deleted.
The state of the deployment.
STATE_UNSPECIFIED, CREATING, READY, DELETING, FAILED, UPDATING, DELETED Detailed status information regarding the most recent operation.
Show child attributes
The status code.
OK, CANCELLED, UNKNOWN, INVALID_ARGUMENT, DEADLINE_EXCEEDED, NOT_FOUND, ALREADY_EXISTS, PERMISSION_DENIED, UNAUTHENTICATED, RESOURCE_EXHAUSTED, FAILED_PRECONDITION, ABORTED, OUT_OF_RANGE, UNIMPLEMENTED, INTERNAL, UNAVAILABLE, DATA_LOSS A developer-facing error message in English.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.
The desired number of replicas for this deployment. This represents the target replica count that the system is trying to achieve.
Show child attributes
The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s.
The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m.
The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min_replica_count==0. Default is 1h. This must be at least 5 minutes.
Show child attributes
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use.
ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_A100_80GB, NVIDIA_H100_80GB, AMD_MI300X_192GB, NVIDIA_A10G_24GB, NVIDIA_A100_40GB, NVIDIA_L4_24GB, NVIDIA_H200_141GB, NVIDIA_B200_180GB, AMD_MI325X_256GB The precision with which the model should be served.
PRECISION_UNSPECIFIED, FP16, FP8, FP8_MM, FP8_AR, FP8_MM_KV_ATTN, FP8_KV, FP8_MM_V2, FP8_V2, FP8_MM_KV_ATTN_V2, NF4, FP4, BF16, FP4_BLOCKSCALED_MM, FP4_MX_MOE If set, this deployment is deployed to a cloud-premise cluster.
If true, PEFT addons are enabled for this deployment.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The length of previous input sequence to be considered for N-gram speculation.
Whether to apply sticky routing based on user field.
The set of API keys used to access the direct route deployment. If direct routing is not enabled, this field is unused.
If set, this deployment will expose an endpoint that bypasses the Fireworks API gateway.
DIRECT_ROUTE_TYPE_UNSPECIFIED, INTERNET, GCP_PRIVATE_SERVICE_CONNECT, AWS_PRIVATELINK The handle for calling a direct route. The meaning of the handle depends on the direct route type of the deployment: INTERNET -> The host name for accessing the deployment GCP_PRIVATE_SERVICE_CONNECT -> The service attachment name used to create the PSC endpoint. AWS_PRIVATELINK -> The service name used to create the VPC endpoint.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region.
Show child attributes
The region where the deployment must be placed.
REGION_UNSPECIFIED, US_IOWA_1, US_VIRGINIA_1, US_ILLINOIS_1, AP_TOKYO_1, US_ARIZONA_1, US_TEXAS_1, US_ILLINOIS_2, EU_FRANKFURT_1, US_TEXAS_2, EU_ICELAND_1, EU_ICELAND_2, US_WASHINGTON_1, US_WASHINGTON_2, US_WASHINGTON_3, AP_TOKYO_2, US_CALIFORNIA_1, US_UTAH_1, US_TEXAS_3, US_GEORGIA_1, US_GEORGIA_2, US_WASHINGTON_4, US_GEORGIA_3 The multi-region where the deployment must be placed.
MULTI_REGION_UNSPECIFIED, GLOBAL, US REGION_UNSPECIFIED, US_IOWA_1, US_VIRGINIA_1, US_ILLINOIS_1, AP_TOKYO_1, US_ARIZONA_1, US_TEXAS_1, US_ILLINOIS_2, EU_FRANKFURT_1, US_TEXAS_2, EU_ICELAND_1, EU_ICELAND_2, US_WASHINGTON_1, US_WASHINGTON_2, US_WASHINGTON_3, AP_TOKYO_2, US_CALIFORNIA_1, US_UTAH_1, US_TEXAS_3, US_GEORGIA_1, US_GEORGIA_2, US_WASHINGTON_4, US_GEORGIA_3 The geographic region where the deployment is presently located. This region may change
over time, but within the placement constraint.
REGION_UNSPECIFIED, US_IOWA_1, US_VIRGINIA_1, US_ILLINOIS_1, AP_TOKYO_1, US_ARIZONA_1, US_TEXAS_1, US_ILLINOIS_2, EU_FRANKFURT_1, US_TEXAS_2, EU_ICELAND_1, EU_ICELAND_2, US_WASHINGTON_1, US_WASHINGTON_2, US_WASHINGTON_3, AP_TOKYO_2, US_CALIFORNIA_1, US_UTAH_1, US_TEXAS_3, US_GEORGIA_1, US_GEORGIA_2, US_WASHINGTON_4, US_GEORGIA_3 The update time for the deployment.
Whether the deployment size validation is disabled.
If true, MTP is enabled for this deployment.
Whether to use hot load for this deployment.
BUCKET_TYPE_UNSPECIFIED, MINIO, S3, NEBIUS Allows up to 1 addon at a time to be loaded, and will merge it into the base model.
The name of the deployment shape that this deployment is using. On the server side, this will be replaced with the deployment shape version name.
The model version that is currently active and applied to running replicas of a deployment.
The target model version that is being rolled out to the deployment. In a ready steady state, the target model version is the same as the active model version.
Was this page helpful?