List Deployments
curl --request GET \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deployments \
--header 'Authorization: Bearer <token>'
{
"deployments": [
{
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"expireTime": "2023-11-07T05:31:56Z",
"purgeTime": "2023-11-07T05:31:56Z",
"deleteTime": "2023-11-07T05:31:56Z",
"createdBy": "<string>",
"state": "STATE_UNSPECIFIED",
"status": {
"code": "OK",
"message": "<string>"
},
"minReplicaCount": 123,
"maxReplicaCount": 123,
"replicaCount": 123,
"autoscalingPolicy": {
"scaleUpWindow": "<string>",
"scaleDownWindow": "<string>",
"scaleToZeroWindow": "<string>",
"loadTargets": {}
},
"baseModel": "<string>",
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"cluster": "<string>",
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"numPeftDeviceCached": 123,
"deploymentTemplate": "<string>",
"autoTune": {
"longPrompt": true
},
"region": "REGION_UNSPECIFIED",
"isNim": true,
"updateTime": "2023-11-07T05:31:56Z"
}
],
"nextPageToken": "<string>",
"totalSize": 123
}
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
The Account Id
Query Parameters
The maximum number of deployments to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50.
A page token, received from a previous ListDeployments call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeployments must match the call that provided the page token.
Only deployment satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar.
A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time".
If set, DELETED deployments will be included.
Response
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The creation time of the deployment.
The time at which this deployment will automatically be deleted.
The time at which the resource will be hard deleted.
The time at which the resource will be soft deleted.
The email address of the user who created this deployment.
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Detailed status information regarding the most recent operation.
The status code.
OK
, CANCELLED
, UNKNOWN
, INVALID_ARGUMENT
, DEADLINE_EXCEEDED
, NOT_FOUND
, ALREADY_EXISTS
, PERMISSION_DENIED
, UNAUTHENTICATED
, RESOURCE_EXHAUSTED
, FAILED_PRECONDITION
, ABORTED
, OUT_OF_RANGE
, UNIMPLEMENTED
, INTERNAL
, UNAVAILABLE
, DATA_LOSS
A developer-facing error message in English.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.
The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s.
The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m.
The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min_replica_count==0. Default is 1h.
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
, NVIDIA_H200_141GB
, NVIDIA_B200_180GB
The precision with which the model should be served.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
, FP8_MM_V2
, FP8_V2
, FP8_MM_KV_ATTN_V2
, NF4
If set, this deployment is deployed to a cloud-premise cluster.
If true, PEFT addons are enabled for this deployment.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The length of previous input sequence to be considered for N-gram speculation.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The performance profile to use for this deployment.
If true, this deployment is optimized for long prompt lengths.
The geographic region where the deployment is located.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
, US_TEXAS_1
, US_ILLINOIS_2
, EU_FRANKFURT_1
, US_TEXAS_2
, EU_PARIS_1
, EU_HELSINKI_1
, US_NEVADA_1
, EU_ICELAND_1
, EU_ICELAND_2
Whether this deployment should be created with NIM.
The update time for the deployment.
A token, which can be sent as page_token
to retrieve the next page.
If this field is omitted, there are no subsequent pages.
The total number of deployments.
Was this page helpful?
curl --request GET \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deployments \
--header 'Authorization: Bearer <token>'
{
"deployments": [
{
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"expireTime": "2023-11-07T05:31:56Z",
"purgeTime": "2023-11-07T05:31:56Z",
"deleteTime": "2023-11-07T05:31:56Z",
"createdBy": "<string>",
"state": "STATE_UNSPECIFIED",
"status": {
"code": "OK",
"message": "<string>"
},
"minReplicaCount": 123,
"maxReplicaCount": 123,
"replicaCount": 123,
"autoscalingPolicy": {
"scaleUpWindow": "<string>",
"scaleDownWindow": "<string>",
"scaleToZeroWindow": "<string>",
"loadTargets": {}
},
"baseModel": "<string>",
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"cluster": "<string>",
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"numPeftDeviceCached": 123,
"deploymentTemplate": "<string>",
"autoTune": {
"longPrompt": true
},
"region": "REGION_UNSPECIFIED",
"isNim": true,
"updateTime": "2023-11-07T05:31:56Z"
}
],
"nextPageToken": "<string>",
"totalSize": 123
}