Creates a new deployment.

firectl create deployment [flags]

Example

firectl create deployment falcon-7b

Flags

      --description string                     Description of the deployment.
      --disable-speculative-decoding           If true, speculative decoding is disabled.
      --display-name string                    Human-readable name of the deployment. Must be fewer than 64 characters long.
      --max-peft-batch-size int32              Max batching of concurrent peft requests of the server.
      --max-replica-count int32                Maximum number of replicas for the deployment. If min-replica-count > 0 defaults to 0, otherwise defaults to 1.
      --min-replica-count int32                Minimum number of replicas for the deployment. If min-replica-count < max-replica-count the deployment will automatically scale between the two replica counts based on load.
      --model-id string                        The ID of a model that should be deployed when the deployment is created.
      --scale-down-window duration             The duration the autoscaler will wait before scaling down a deployment after observing decreased load. Default is 10m.
      --scale-to-zero-window duration          The duration after which there are no requests that the deployment will be scaled down to zero replicas, if min-replica-count is 0. Default 1h.
      --scale-up-window duration               The duration the autoscaler will wait before scaling up a deployment after observing increased load. Default is 30s.
      --unused-auto-delete-duration duration   The duration for which if no requests are received, the deployment will automatically be deleted. If 0, the auto-deletion is disabled. (default 168h0m0s)
      --wait                                   Wait until the deployment is ready.
      --world-size int32                       The number of GPUs the base model is served with.
      -h, --help                               help for deployment

Flags inherited from parent commands

      --dry-run         Print the request proto without running it.
  -o, --output Output   Set the output format to "text" or "json". (default text)