curl --request GET \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions \
--header 'Authorization: Bearer <token>'{
"deploymentShapeVersions": [
{
"name": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"snapshot": {
"baseModel": "<string>",
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"updateTime": "2023-11-07T05:31:56Z",
"modelType": "<string>",
"parameterCount": "<string>",
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"disableDeploymentSizeValidation": true,
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"enableSessionAffinity": true,
"numLoraDeviceCached": 123,
"presetType": "PRESET_TYPE_UNSPECIFIED"
},
"validated": true,
"public": true,
"latestValidated": true
}
],
"nextPageToken": "<string>",
"totalSize": 123
}curl --request GET \
--url https://api.fireworks.ai/v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id}/versions \
--header 'Authorization: Bearer <token>'{
"deploymentShapeVersions": [
{
"name": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"snapshot": {
"baseModel": "<string>",
"name": "<string>",
"displayName": "<string>",
"description": "<string>",
"createTime": "2023-11-07T05:31:56Z",
"updateTime": "2023-11-07T05:31:56Z",
"modelType": "<string>",
"parameterCount": "<string>",
"acceleratorCount": 123,
"acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
"precision": "PRECISION_UNSPECIFIED",
"disableDeploymentSizeValidation": true,
"enableAddons": true,
"draftTokenCount": 123,
"draftModel": "<string>",
"ngramSpeculationLength": 123,
"enableSessionAffinity": true,
"numLoraDeviceCached": 123,
"presetType": "PRESET_TYPE_UNSPECIFIED"
},
"validated": true,
"public": true,
"latestValidated": true
}
],
"nextPageToken": "<string>",
"totalSize": 123
}Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>
The Account Id
The Deployment Shape Id
The maximum number of deployment shape versions to return. The maximum page_size is 200, values above 200 will be coerced to 200. If unspecified, the default is 50.
A page token, received from a previous ListDeploymentShapeVersions call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to ListDeploymentShapeVersions must match the call that provided the page token.
Only deployment shape versions satisfying the provided filter (if specified) will be returned. See https://google.aip.dev/160 for the filter grammar.
A comma-separated list of fields to order by. e.g. "foo,bar" The default sort order is ascending. To specify a descending order for a field, append a " desc" suffix. e.g. "foo desc,bar" Subfields are specified with a "." character. e.g. "foo.bar" If not specified, the default order is by "create_time".
The fields to be returned in the response. If empty or "*", all fields will be returned.
A successful response.
Show child attributes
The creation time of the deployment shape version. Lists will be ordered by this field.
Full snapshot of the Deployment Shape at this version.
Show child attributes
Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long.
The description of the deployment shape. Must be fewer than 1000 characters long.
The creation time of the deployment shape.
The update time for the deployment shape.
The model type of the base model.
The parameter count of the base model .
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_A100_80GB, NVIDIA_H100_80GB, AMD_MI300X_192GB, NVIDIA_A10G_24GB, NVIDIA_A100_40GB, NVIDIA_L4_24GB, NVIDIA_H200_141GB, NVIDIA_B200_180GB, AMD_MI325X_256GB The precision with which the model should be served.
PRECISION_UNSPECIFIED, FP16, FP8, FP8_MM, FP8_AR, FP8_MM_KV_ATTN, FP8_KV, FP8_MM_V2, FP8_V2, FP8_MM_KV_ATTN_V2, NF4, FP4, BF16, FP4_BLOCKSCALED_MM, FP4_MX_MOE If true, the deployment size validation is disabled.
If true, LORA addons are enabled for deployments created from this shape.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior.
The length of previous input sequence to be considered for N-gram speculation.
Whether to apply sticky routing based on user field.
Type of deployment shape for different deployment configurations.
PRESET_TYPE_UNSPECIFIED, MINIMAL, FAST, THROUGHPUT, FULL_PRECISION If true, this version has been validated.
If true, this version will be publicly readable.
If true, this version is the latest validated version. Only one version of the shape can be the latest validated version.
A token, which can be sent as page_token to retrieve the next page.
If this field is omitted, there are no subsequent pages.
The total number of deployment shape versions.
Was this page helpful?