Skip to main contentFireworks AI on Amazon EKS (Airgapped)
What is an Airgapped Deployment?
An airgapped deployment ensures your AI inference workloads and model data remain completely isolated within your private infrastructure.
This setup is ideal for organizations with strict security, compliance, or regulatory requirements where inference operations must run without external dependencies.
Unlike the EKS setup with Fireworks Control Plane, this setup does not send any metadata to Fireworks, but cannot be managed via the Fireworks web app or firectl
CLI.
Why Airgapped on EKS?
Maximum Security & Compliance
Deploy Fireworks inference in your VPC with all container images, models, and dependencies set up directly in your infrastructure. Meet the strictest regulatory requirements for healthcare, financial services, or other sensitive workloads.
Complete Infrastructure Control
You control the entire deployment stack using your private container registry, internal S3 buckets, and Helm charts for deployment management.
Access Fireworks’ fast inference capabilities all running entirely within your VPC environment. Deploy on the latest GPU instance types like NVIDIA H100, H200, or B200s.
Main Deployment Steps
Deploying Fireworks AI in an airgapped EKS environment is a four-phase process:
Phase 1: Foundation Infrastructure (Automated)
- Run shell script to create S3 bucket, security groups, and IAM roles via Terraform
- Establish secure foundation for isolated inference workloads
Phase 2: Resource Preparation (Manual)
- Upload your model to the internal S3 bucket
- Push Fireworks container image to your private Amazon ECR registry
- Store metering key in AWS Secrets Manager
Phase 3: Cluster Deployment (Automated)
- Run shell script to deploy EKS cluster with GPU node groups
- Install NVIDIA device plugin and AWS Load Balancer Controller
- Results in production-ready Kubernetes cluster for isolated inference workloads
Phase 4: Helm Deployment (Automated)
- Deploy Fireworks model using Helm charts
- Configure deployments with replicas, GPU allocation, and scaling
- Creates internal-only inference endpoint accessible within your VPC
Detailed Deployment Guide
To get access to the step-by-step instructions, Fireworks container images and scripts, Helm deployment templates, and troubleshooting guidance, contact Fireworks AI to request access.