> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Bring Your Own Cluster

> Run Fireworks inference in your own Kubernetes cluster, cloud account or data center, and network boundary.

Bring Your Own Cluster (BYOC) lets Enterprise customers run Fireworks inference inside their own Kubernetes cluster. Your inference compute runs in your cloud account or data center boundary, while Fireworks installs and operates the serving stack for you.

<Warning>
  Bring Your Own Cluster is in Private Preview for Enterprise customers. Contact [sales@fireworks.ai](mailto:sales@fireworks.ai) to discuss whether BYOC is a fit and to participate in the preview.
</Warning>

## What BYOC provides

With BYOC, Fireworks deploys the managed serving software stack into Kubernetes infrastructure that you own. Fireworks operates model deployment, performance optimization, autoscaling, GPU node health and reliability, load balancing and routing, and observability for the cluster.

You continue using the Fireworks product surface: the same APIs, SDKs, model deployment workflows, and performance work available across the Fireworks platform. The main difference is where inference runs: the model serving workload runs in your environment instead of Fireworks-managed cloud infrastructure.

<Note>
  Architecture diagram coming soon. During Private Preview, Fireworks reviews the exact deployment architecture, networking boundaries, and request flow with each customer during onboarding.
</Note>

## Why choose BYOC

BYOC is designed for organizations that need more control over where inference runs without taking on the operational burden of self-hosting raw open-source serving infrastructure.

Common reasons to consider BYOC include:

* **Data residency and compliance:** Inference request and response handling runs within your Kubernetes environment, aligned to your cloud account, data center, and network requirements.
* **Existing GPU capacity:** Use GPU capacity you already own or procure in your preferred cloud or data center environment.
* **Network boundary control:** Keep inference workloads inside your cloud account or data center network architecture.
* **Managed Fireworks experience:** Fireworks runs the serving stack, applies performance optimizations, manages model deployment, and operates the cluster day to day.
* **Consistent developer interface:** Use Fireworks APIs and SDKs across serverless, dedicated Fireworks-hosted deployments, and BYOC deployments.

## When BYOC fits

BYOC is usually a fit when you need Fireworks-managed inference but have requirements that place compute or data in your own environment:

* You have data residency, compliance, or internal policy requirements for inference traffic.
* You want Fireworks to operate model serving on GPU capacity in your cloud account or data center.
* You need the same Fireworks API and managed operations model across multiple deployment environments.
* You are an Enterprise customer planning a production deployment with Fireworks support.

For workloads where Fireworks-managed infrastructure already satisfies your compliance and operational needs, serverless or dedicated deployments on Fireworks cloud may be simpler to start with.

## Benefits of Fireworks-managed operations

BYOC is not a raw self-hosting kit. Fireworks brings the managed serving experience to infrastructure you own, so your team can focus on applications instead of rebuilding model serving operations.

<CardGroup cols={2}>
  <Card title="Inference performance focus" icon="gauge-high">
    Fireworks continuously optimizes serving performance across model configuration, deployment shape, scheduling, quantization, speculative decoding, and workload-specific tuning such as FireOptimizer where enabled.
  </Card>

  <Card title="Managed operations and upgrades" icon="gears">
    Fireworks handles serving-stack installation, rollout, maintenance, upgrades, and day-to-day operations so your team does not have to rebuild a self-hosted inference platform.
  </Card>

  <Card title="GPU reliability operations" icon="heart-pulse">
    Fireworks monitors GPU and node health, detects unhealthy capacity, and safely remediates issues to reduce the operational burden of running inference on large GPU fleets.
  </Card>

  <Card title="Current models and GPU generations" icon="microchip">
    Fireworks tracks new model releases and new GPU generations, including day-0 enablement for supported models and kernel-level optimizations for supported hardware, so your BYOC environment can adopt supported updates without your team rebuilding the serving stack.
  </Card>
</CardGroup>

<Note>
  When elected during onboarding and supported for the workload, Fireworks can also help coordinate overflow scheduling onto Fireworks-managed capacity while preserving the same Fireworks API surface.
</Note>

## Known gaps during Private Preview

Fine-tuning is not supported in BYOC during Private Preview. If you need fine-tuning and BYOC together, contact [sales@fireworks.ai](mailto:sales@fireworks.ai) so the team can review your requirements and roadmap fit.

## Supported environments

Fireworks supports BYOC on major cloud providers and their managed Kubernetes offerings, select GPU cloud providers, and on-premises environments that provide a reachable Kubernetes endpoint, supported NVIDIA GPU nodes, and the required network setup. During preview onboarding, Fireworks confirms whether your target environment, GPU capacity, and networking model are supported.

At a high level, BYOC requires:

* A Kubernetes cluster with NVIDIA GPU nodes
* Outbound network access that allows Fireworks to manage the cluster
* A deployment and support plan agreed with Fireworks during Enterprise onboarding

## Next steps

If BYOC may be a fit, contact [sales@fireworks.ai](mailto:sales@fireworks.ai) to review your requirements and preview eligibility.

<CardGroup cols={2}>
  <Card title="How setup works" icon="diagram-project" href="/ecosystem/integrations/byoc/how-setup-works">
    Understand prerequisites, setup access, installation, and validation.
  </Card>

  <Card title="Operational model" icon="gears" href="/ecosystem/integrations/byoc/operational-model">
    Learn how Fireworks operates BYOC clusters after onboarding.
  </Card>
</CardGroup>