Firectl
Create a deployment
Firectl
Create a deployment
Create a Deployment on Fireworks AI platform
Creates a new deployment.
firectl create deployment [flags]
Example
firectl create deployment falcon-7b
Flags
--description string Description of the deployment.
--disable-speculative-decoding If true, speculative decoding is disabled.
--display-name string Human-readable name of the deployment. Must be fewer than 64 characters long.
--max-peft-batch-size int32 Max batching of concurrent peft requests of the server.
--max-replica-count int32 Maximum number of replicas for the deployment. If min-replica-count > 0 defaults to 0, otherwise defaults to 1.
--min-replica-count int32 Minimum number of replicas for the deployment. If min-replica-count < max-replica-count the deployment will automatically scale between the two replica counts based on load.
--model-id string The ID of a model that should be deployed when the deployment is created.
--scale-down-window duration The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m.
--scale-to-zero-window duration The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min-replica-count is 0. Default 1h.
--scale-up-window duration The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s.
--unused-auto-delete-duration duration The duration for which if no requests are received, the deployment will automatically be deleted. If 0, the auto-deletion is disabled. (default 168h0m0s)
--wait Wait until the deployment is ready.
--world-size int32 The number of GPUs the base model is served with.
-h, --help help for deployment
Flags inherited from parent commands
--dry-run Print the request proto without running it.
-o, --output Output Set the output format to "text" or "json". (default text)
Was this page helpful?
On this page