Fireworks AI is a first-party inference provider inside Microsoft Foundry. You can access frontier open models through your existing Azure account, with usage billed through Azure and counting toward your Microsoft Azure Consumption Commitment (MACC). This page covers the Fireworks side of the integration. For Azure portal setup steps, see the Microsoft Learn guide.Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
New to Fireworks? Foundry users get the same OpenAI-compatible API and model catalog as direct Fireworks customers. Start with the PayGo quickstart below — you can be making requests in about 10 minutes.
Prerequisites
- An active Azure subscription
- The Fireworks integration enabled at the subscription level (see below)
- A Microsoft Foundry project with the Azure AI Developer role assigned
Opt-in
Fireworks on Foundry requires a one-time opt-in per Azure subscription before you can create deployments. Follow the steps in the Microsoft Learn guide.Deployment modes
Fireworks on Foundry supports three deployment modes.| Mode | Also called | Pricing | Regions | Right for |
|---|---|---|---|---|
| PayGo | Serverless, Data Zone Standard | Per token, MACC-eligible | US Data Zone only | Prototyping, low-volume workloads |
| PTU | Provisioned Throughput | Per PTU-hour, ACD + MACC eligible | Global | Production workloads with consistent traffic |
| Custom Models | Bring Your Own Model | PTU pricing | Global (PTU regions) | Fine-tuned model deployment |
Available models
All models use the OpenAI-compatible chat completions API and are added to the catalog on a rolling basis. For the current list of available models, see the Microsoft Learn catalog. Chat completions only. Embeddings, image generation, and audio modalities are not available through Foundry.PayGo quickstart
PayGo (Data Zone Standard) is available in: East US, East US 2, Central US, North Central US, West US, West US 3. The throughput limit for PayGo deployments is 250,000 tokens per minute (TPM).Make your first request
Foundry deployments use an OpenAI-compatible endpoint. Use your Foundry project endpoint and Azure API key.PTU (Provisioned Throughput)
PTU deployments provide dedicated GPU capacity reserved for your workload, with consistent throughput and global region availability.- Dedicated capacity, not shared with other tenants
- Available globally, not limited to US Data Zone
- ACD-eligible and MACC-eligible
Custom Models
Fine-tune on Fireworks and deploy on Foundry, or bring your own weights from wherever you post-train to deploy on Foundry. Your model is served on Fireworks infrastructure within Azure, billed through your Azure account.Supported base architectures
For the list of supported custom model architectures, see the Microsoft Learn guide.Deployment
To import and deploy a custom model, follow the Import custom models into Foundry guide.Billing
All Fireworks on Foundry usage is billed through Azure. You do not need a separate Fireworks billing account or contract.- PayGo and PTU usage is MACC-eligible
- PTU deployments are ACD-eligible and qualify for quota retirement
- Direct Fireworks usage at fireworks.ai is billed separately and does not count toward MACC
Troubleshooting
| Issue | Resolution |
|---|---|
| Quota exceeded error | Request a limit increase at aka.ms/fireworks-quota |
| Access denied on deployment | Verify you have the Azure AI Developer role on the project |
| Opt-in not propagating | Allow up to 30 minutes after registering Fireworks.EnableDeploy |
| Custom Model deployment failing | Confirm weights are full-weight (not LoRA adapters) and the architecture is in the supported list |
| PTU provisioning questions | Contact sales@fireworks.ai |
Additional resources
- Enable Fireworks on Foundry (Microsoft Learn)
- Microsoft Foundry portal
- Fireworks fine-tuning docs
- Fireworks Trust Center
- sales@fireworks.ai for PTU provisioning and Custom Model support