PATCH
/
v1
/
accounts
/
{account_id}
/
deployments
/
{deployment_id}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

account_id
string
required

The Account Id

deployment_id
string
required

The Deployment Id

Body

application/json
baseModel
string
required
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default:
ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB,
NVIDIA_H200_141GB
autoscalingPolicy
object
autoTune
object

The performance profile to use for this deployment.

deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

description
string

Description of the deployment.

displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

draftModel
string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

draftTokenCount
integer

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

ngramSpeculationLength
integer

The length of previous input sequence to be considered for N-gram speculation.

precision
enum<string>
default:
PRECISION_UNSPECIFIED

The precision with which the model should be served.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
region
enum<string>
default:
REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1,
US_TEXAS_2,
EU_PARIS_1
state
enum<string>
default:
STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Detailed status information regarding the most recent operation.

Response

200 - application/json
baseModel
string
required
acceleratorCount
integer

The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.

acceleratorType
enum<string>
default:
ACCELERATOR_TYPE_UNSPECIFIED

The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.

Available options:
ACCELERATOR_TYPE_UNSPECIFIED,
NVIDIA_A100_80GB,
NVIDIA_H100_80GB,
AMD_MI300X_192GB,
NVIDIA_A10G_24GB,
NVIDIA_A100_40GB,
NVIDIA_L4_24GB,
NVIDIA_H200_141GB
autoscalingPolicy
object
autoTune
object

The performance profile to use for this deployment.

cluster
string

If set, this deployment is deployed to a cloud-premise cluster.

createdBy
string

The email address of the user who created this deployment.

createTime
string

The creation time of the deployment.

deploymentTemplate
string

The name of the deployment template to use for this deployment. Only available to enterprise accounts.

description
string

Description of the deployment.

displayName
string

Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.

draftModel
string

The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

draftTokenCount
integer

The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.

enableAddons
boolean

If true, PEFT addons are enabled for this deployment.

expireTime
string

The time at which this deployment will automatically be deleted.

maxReplicaCount
integer

The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.

minReplicaCount
integer

The minimum number of replicas. If not specified, the default is 0.

name
string
ngramSpeculationLength
integer

The length of previous input sequence to be considered for N-gram speculation.

numPeftDeviceCached
integer
precision
enum<string>
default:
PRECISION_UNSPECIFIED

The precision with which the model should be served.

Available options:
PRECISION_UNSPECIFIED,
FP16,
FP8,
FP8_MM,
FP8_AR,
FP8_MM_KV_ATTN,
FP8_KV,
FP8_MM_V2,
FP8_V2,
FP8_MM_KV_ATTN_V2
purgeTime
string

The time at which the resource will be hard deleted.

region
enum<string>
default:
REGION_UNSPECIFIED

The geographic region where the deployment is located.

Available options:
REGION_UNSPECIFIED,
US_IOWA_1,
US_VIRGINIA_1,
US_VIRGINIA_2,
US_ILLINOIS_1,
AP_TOKYO_1,
EU_LONDON_1,
US_ARIZONA_1,
US_TEXAS_1,
US_ILLINOIS_2,
EU_FRANKFURT_1,
US_TEXAS_2,
EU_PARIS_1
replicaCount
integer
state
enum<string>
default:
STATE_UNSPECIFIED

The state of the deployment.

Available options:
STATE_UNSPECIFIED,
CREATING,
READY,
DELETING,
FAILED,
UPDATING,
DELETED
status
object

Detailed status information regarding the most recent operation.