Get Deployment Shape

curl --request GET \
  --url https://api.fireworks.ai/v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id} \
  --header 'Authorization: Bearer <token>'

{
  "baseModel": "<string>",
  "name": "<string>",
  "displayName": "<string>",
  "description": "<string>",
  "createTime": "2023-11-07T05:31:56Z",
  "updateTime": "2023-11-07T05:31:56Z",
  "modelType": "<string>",
  "parameterCount": "<string>",
  "acceleratorCount": 123,
  "acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
  "precision": "PRECISION_UNSPECIFIED",
  "disableDeploymentSizeValidation": true,
  "enableAddons": true,
  "draftTokenCount": 123,
  "draftModel": "<string>",
  "ngramSpeculationLength": 123,
  "enableSessionAffinity": true,
  "numLoraDeviceCached": 123,
  "maxContextLength": 123,
  "presetType": "PRESET_TYPE_UNSPECIFIED"
}

GET

accounts

{account_id}

deploymentShapes

{deployment_shape_id}

Get Deployment Shape

curl --request GET \
  --url https://api.fireworks.ai/v1/accounts/{account_id}/deploymentShapes/{deployment_shape_id} \
  --header 'Authorization: Bearer <token>'

{
  "baseModel": "<string>",
  "name": "<string>",
  "displayName": "<string>",
  "description": "<string>",
  "createTime": "2023-11-07T05:31:56Z",
  "updateTime": "2023-11-07T05:31:56Z",
  "modelType": "<string>",
  "parameterCount": "<string>",
  "acceleratorCount": 123,
  "acceleratorType": "ACCELERATOR_TYPE_UNSPECIFIED",
  "precision": "PRECISION_UNSPECIFIED",
  "disableDeploymentSizeValidation": true,
  "enableAddons": true,
  "draftTokenCount": 123,
  "draftModel": "<string>",
  "ngramSpeculationLength": 123,
  "enableSessionAffinity": true,
  "numLoraDeviceCached": 123,
  "maxContextLength": 123,
  "presetType": "PRESET_TYPE_UNSPECIFIED"
}

Authorizations

Authorization

string

header

required

Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>

Path Parameters

account_id

string

required

The Account Id

deployment_shape_id

string

required

The Deployment Shape Id

Query Parameters

readMask

string

The fields to be returned in the response. If empty or "*", all fields will be returned.

Response

200 - application/json

A successful response.

baseModel

string

required

name

string

displayName

string

Human-readable display name of the deployment shape. e.g. "My Deployment Shape" Must be fewer than 64 characters long.

description

string

The description of the deployment shape. Must be fewer than 1000 characters long.

createTime

string<date-time>

The creation time of the deployment shape.

updateTime

string<date-time>

The update time for the deployment shape.

modelType

string

The model type of the base model.

parameterCount

string<int64>

The parameter count of the base model .

acceleratorCount

integer<int32>

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType

enum<string>

default:ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:

ACCELERATOR_TYPE_UNSPECIFIED,

NVIDIA_A100_80GB,

NVIDIA_H100_80GB,

AMD_MI300X_192GB,

NVIDIA_A10G_24GB,

NVIDIA_A100_40GB,

NVIDIA_L4_24GB,

NVIDIA_H200_141GB,

NVIDIA_B200_180GB,

AMD_MI325X_256GB,

AMD_MI350X_288GB

precision

enum<string>

default:PRECISION_UNSPECIFIED

The precision with which the model should be served.

Available options:

PRECISION_UNSPECIFIED,

FP16,

FP8,

FP8_MM,

FP8_AR,

FP8_MM_KV_ATTN,

FP8_KV,

FP8_MM_V2,

FP8_V2,

FP8_MM_KV_ATTN_V2,

NF4,

FP4,

BF16,

FP4_BLOCKSCALED_MM,

FP4_MX_MOE

disableDeploymentSizeValidation

boolean

If true, the deployment size validation is disabled.

enableAddons

boolean

If true, LORA addons are enabled for deployments created from this shape.

draftTokenCount

integer<int32>

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count.

draftModel

string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. this behavior.

ngramSpeculationLength

integer<int32>

The length of previous input sequence to be considered for N-gram speculation.

enableSessionAffinity

boolean

Whether to apply sticky routing based on user field.

numLoraDeviceCached

integer<int32>

maxContextLength

integer<int32>

The maximum context length supported by the model (context window). If set to 0 or not specified, the model's default maximum context length will be used.

presetType

enum<string>

default:PRESET_TYPE_UNSPECIFIED

Type of deployment shape for different deployment configurations.

Available options:

PRESET_TYPE_UNSPECIFIED,

MINIMAL,

FAST,

THROUGHPUT,

FULL_PRECISION,

AGENTIC_CODING,

CHAT,

SUMMARIZATION

List Deployment Shapes Versions

Get Deployment Shape Version

⌘I

API Reference

Inference

Training SDK

Deployments

Fine-tuning

Evals

Multimedia

Admin

Get Deployment Shape

Authorizations

Path Parameters

Query Parameters

Response