AWS EKS

Fireworks currently only supports Bring-your-own-cloud (BYOC) with AWS. We are working to support other cloud providers.

This page will guide you through setting up your Fireworks cluster on your own AWS cloud account using EKS, and then creating a model and deployment on that cluster.

Set up your cluster

Create required resources in your account

The first step is to create the resources necessary for running a Fireworks cluster in your own AWS account. Fireworks has made this easy by preparing a Terraform module that can be applied. With this module you can set the input variables to suit your existing cloud set up. For example, the module can use an existing VPC or subnet, create nodes in various availability zones, etc. The Fireworks team can help with this configuration process.The following is an example configuration that creates a new VPC and new subnets, and creates a cluster with a single H100 node in one availability zone:

module "fireworks_cluster" {
  source  = "fw-ai-external/aws-cluster/fireworksai"
  version = "0.1.2"

  vpc = {
    cidr = "172.19.0.0/16"  # An IP range that does not conflict with any existing VPCs
  }
  availability_zones = {
    "us-east-1a" = {
      public_cidr   = "172.19.0.0/20"  # must be within the VPC range
      private_cidr  = "172.19.16.0/20" # must be within the VPC range
      node_count    = "1"
      instance_type = "p5.48xlarge" # p4d.24xlarge and p4de.24xlarge are also supported
    }
    "us-east-1b" = {
      public_cidr   = "172.19.32.0/20" # must be within the VPC range
      private_cidr  = "172.19.48.0/20" # must be within the VPC range
      node_count    = "1"
      instance_type = "p5.48xlarge" # p4d.24xlarge and p4de.24xlarge are also supported
    }
  }
  cluster_name = "my-cluster"
}

output "fireworks_cluster" {
  value = module.fireworks_cluster
}

The module will create the following resources:

A VPC network (or reuse an existing one in your account)
Subnets for the VPC (or reuse existing ones in your account)
Various roles for operating the cluster
An EKS cluster
A storage bucket for model artifacts
Container registries to hold images uses by the cluster

Applying this module will output the following data about your resources:

fireworks_cluster = {
  "cluster_node_role_arn" = "arn:aws:iam::<account-id>:role/FireworksClusterNodeRole"
  "eks_cluster_autoscaler_role_arn" = "arn:aws:iam::<account-id>:role/FireworksEKSClusterAutoscalerRole"
  "eks_cluster_role_arn" = "arn:aws:iam::<account-id>:role/FireworksEKSClusterRole"
  "eks_load_balancer_controller_role_arn" = "arn:aws:iam::<account-id>:role/FireworksEKSLoadBalancerControllerRole-<cluster-name>"
  "fireworks_manager_role_arn" = "arn:aws:iam::<account-id>:role/FireworksManagerRole"
  "inference_role_arn" = "arn:aws:iam::<account-id>:role/FireworksInferenceRole-<cluster-name>"
  "llm_downloader_ecr_repo_uri" = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/fireworks/llm-downloader"
  "metrics_writer_role_arn" = "arn:aws:iam::<account-id>:role/FireworksMetricWriterRole-<cluster-name>"
  "s3_bucket_arn" = "arn:aws:s3:::fireworks-<cluster-name>-<unique-suffix>"
  "text_completion_ecr_repo_uri" = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/fireworks/text-completion"
  "vpc_id" = "vpc-12345678901234567"
}

The Terraform module assumes that you or an administrator will have permission to upload to the created S3 bucket. If this is not accurate, you will need to add an additional piece of Terraform code granting that permission.

Grant Fireworks permission to access your cluster

Most of the cluster set up can be done automatically by Fireworks, but before we can access your cluster you will need to apply the following Kubernetes manifest. Note that the rolearn fields will need to be filled in with values taken from the fireworks_cluster output from the previous step.

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: kube-system
  name: aws-auth
data:
  mapRoles: |
    - groups:
      - system:masters
      rolearn: <YOUR_FIREWORKS_MANAGER_ROLE_ARN_HERE>
      username: inference-cluster-manager
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: <YOUR_FIREWORKS_CLUSTER_NODE_ROLE_ARN_HERE>
      username: system:node:{{EC2PrivateDNSName}}

kubectl apply -f /path/to/manifest

Send Fireworks your cluster's metadata

To continue the setup of your cluster, send Fireworks the Terraform output from step one, and also the ID you’d like to give this cluster (or you can use the same name as the created EKS cluster) After Fireworks finishes setting up your cluster, you should be able to see the cluster in firectl.

$ firectl list clusters
NAME         REGION     STATE  STATUS MESSAGE
my-cluster   us-east-1  READY

Total size: 1

Upload a model to your cluster

Now you can create a model in your new cluster. These steps are mostly the same as in Custom base models, with the following differences:

Add the --cluster-id=<YOUR_CLUSTER_ID> argument to your firectl create model command.
By default firectl will attempt to upload the model files to the S3 bucket associated with your cluster. If your default AWS profile has permission to do this, you don’t need to do anything else. If you need to use a different AWS profile, you can set the AWS_PROFILE environment variable as it will be respected by firectl. If neither of those work, you can pass the --manual-upload flag to firectl create model. This will output a series of aws s3 cp commands that can be run to manually upload the model files.

Once your model is in the READY state and you’ve uploaded the model’s files to S3, you can continue to the next section.

$ firectl list models
NAME      CREATE TIME          KIND           CHAT   PUBLIC  STATE  STATUS MESSAGE
my-model  2024-01-01 00:00:00  HF_BASE_MODEL  false  false   READY

Create a deployment in your cluster

Creating a deployment in your cluster is similar to creating an on-demand deployment:

firectl create deployment my-cluster --cluster-id=my-cluster --min-replica-count=1 --accelerator-type=NVIDIA_A100_40GB --wait

After a few minutes your deployment should be READY

$ firectl list deployments                                                                                                                                                                                                                                                                             ✘ INT  venv-fireworks 04:12:48 PM
NAME      BASE MODEL                             REPLICAS  GPUS/REPLICA  TYPE              CREATE TIME          STATE  STATUS MESSAGE
12345678  accounts/my-account/models/my-cluster  1 [1,1]   1             NVIDIA_A100_40GB  2024-01-01 00:00:00  READY

Some on-demand deployment features (like autoscaling) are not yet available on BYOC clusters.

Query your deployment

The final step is to test your deployment by querying it. To do this you will need to retrieve the hostname of the the load balancer that was automatically created for your cluster

AWS Console
CLI

Navigate to the “Load Balancer” section in the EC2 console. The created load balancer will have a name that starts with k8s-default-gateway. Click this row, and you’ll find the host name under “DNS name”.

kubectl get gateway gateway -o json | jq ".status.addresses[0].value"

Now you can query your deployment by passing the header Fireworks-Deployment to the load balancer:

Python (Fireworks)
Python (OpenAI)
cURL

from fireworks import Fireworks

client = Fireworks(
  # The Fireworks SDK does not require the /v1 suffix
  base_url="http://<YOUR-LOAD-BALANCER-HOSTNAME>",
  # API key field is not used for BYOC deployments but cannot be blank
  api_key="unused",
)
response = client.completions.create(
  model="accounts/my-account/models/my-model",
  prompt="The sky is",
  extra_headers={
    "Fireworks-Deployment": "accounts/my-account/deployments/12345678",
  },
)
print(response.choices[0].text)

from openai import OpenAI

client = OpenAI(
  # The OpenAI SDK requires the /v1 suffix in the base URL
  base_url="http://<YOUR-LOAD-BALANCER-HOSTNAME>/v1",
  # API key field is not used for BYOC deployments but cannot be blank
  api_key="unused",
)
response = client.completions.create(
  model="accounts/my-account/models/my-model",
  prompt="The sky is",
  extra_headers={
    "Fireworks-Deployment": "accounts/my-account/deployments/12345678",
  },
)
print(response.choices[0].text)

curl http://<YOUR-LOAD-BALANCER-HOSTNAME>/v1/completions \
  -H 'Fireworks-Deployment: accounts/my-account/deployments/12345678' \
  -H 'Content-Type: application/json' \
  --data '{
    "model": "accounts/my-account/models/my-model",
    "prompt": "The sky is"
  }'

You will need to run this from an environment that has access to the subnet associated with the cluster. An easy way to enter such an environment is to spin up a curl pod in the cluster:

kubectl run curl-test --image=radial/busyboxplus:curl -i --tty --rm

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Set up your cluster

Upload a model to your cluster

Create a deployment in your cluster

Query your deployment

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Set up your cluster

​Upload a model to your cluster

​Create a deployment in your cluster

​Query your deployment

Set up your cluster

Upload a model to your cluster

Create a deployment in your cluster

Query your deployment