PATCH
/
v1
/
accounts
/
{account_id}
/
deployments
/
{deployment_id}

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

account_id
string
required

The Account Id

deployment_id
string
required

The Deployment Id

Body

application/json
baseModel
string
required
displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

description
string

Description of the deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

state
enum<string>
default: STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Contains detailed message when the last deployment operation fails.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).

autoscalingPolicy
object
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default: ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB
precision
enum<string>
default: PRECISION_UNSPECIFIED

The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

draftTokenCount
integer

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

draftModel
string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

ngramSpeculationLength
integer

The length of previous input sequence to be considered for N-gram speculation.

deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

autoTune
object

The performance profile to use for this deployment.

region
enum<string>
default: REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1

Response

200 - application/json
baseModel
string
required
name
string
displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

description
string

Description of the deployment.

createTime
string

The creation time of the deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

purgeTime
string

The time at which the resource will be hard deleted.

createdBy
string

The email address of the user who created this deployment.

state
enum<string>
default: STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Contains detailed message when the last deployment operation fails.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1).

replicaCount
integer
autoscalingPolicy
object
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default: ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB
precision
enum<string>
default: PRECISION_UNSPECIFIED

The precision with which the model should be served. TODO: Make the default value FP16 once legacy models are fixed.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
cluster
string

If set, this deployment is deployed to a cloud-premise cluster.

enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

draftTokenCount
integer

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

draftModel
string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

ngramSpeculationLength
integer

The length of previous input sequence to be considered for N-gram speculation.

numPeftDeviceCached
integer
deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

autoTune
object

The performance profile to use for this deployment.

region
enum<string>
default: REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1