Skip to main content
GET
/
v1
/
accounts
/
{account_id}
/
deploymentShapes
/
{deployment_shape_id}
Get Deployment Shape
curl --request GET \
  --url https://api.fireworks.ai/v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id} \
  --header 'Authorization: Bearer <token>'
{
  "baseModel": "<string>",
  "name": "<string>",
  "displayName": "<string>",
  "description": "<string>",
  "createTime": "2023-11-07T05:31:56Z",
  "updateTime": "2023-11-07T05:31:56Z",
  "modelType": "<string>",
  "parameterCount": "<string>",
  "acceleratorCount": 123,
  "acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
  "precision": "PRECISION_UNSPECIFIED",
  "disableDeploymentSizeValidation": true,
  "enableAddons": true,
  "draftTokenCount": 123,
  "draftModel": "<string>",
  "ngramSpeculationLength": 123,
  "enableSessionAffinity": true,
  "numLoraDeviceCached": 123,
  "presetType": "PRESET_TYPE_UNSPECIFIED"
}

Authorizations

Authorization
string
header
required

Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>

Path Parameters

account_id
string
required

The Account Id

deployment_shape_id
string
required

The Deployment Shape Id

Query Parameters

readMask
string

The fields to be returned in the response. If empty or "*", all fields will be returned.

Response

200 - application/json

A successful response.

baseModel
string
required
name
string
displayName
string

Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long.

description
string

The description of the deployment shape. Must be fewer than 1000 characters long.

createTime
string<date-time>

The creation time of the deployment shape.

updateTime
string<date-time>

The update time for the deployment shape.

modelType
string

The model type of the base model.

parameterCount
string<int64>

The parameter count of the base model .

acceleratorCount
integer<int32>

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default:ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB,
NVIDIA_H200_141GB,
NVIDIA_B200_180GB,
AMD_MI325X_256GB,
AMD_MI350X_288GB
precision
enum<string>
default:PRECISION_UNSPECIFIED

The precision with which the model should be served.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2,
NF4,
FP4,
BF16,
FP4_BLOCKSCALED_MM,
FP4_MX_MOE
disableDeploymentSizeValidation
boolean

If true, the deployment size validation is disabled.

enableAddons
boolean

If true, LORA addons are enabled for deployments created from this shape.

draftTokenCount
integer<int32>

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count.

draftModel
string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior.

ngramSpeculationLength
integer<int32>

The length of previous input sequence to be considered for N-gram speculation.

enableSessionAffinity
boolean

Whether to apply sticky routing based on user field.

numLoraDeviceCached
integer<int32>
presetType
enum<string>
default:PRESET_TYPE_UNSPECIFIED

Type of deployment shape for different deployment configurations.

Available options:
PRESET_TYPE_UNSPECIFIED,
MINIMAL,
FAST,
THROUGHPUT,
FULL_PRECISION