Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt

Use this file to discover all available pages before exploring further.

Fireworks AI is a first-party inference provider inside Microsoft Foundry. You can access frontier open models through your existing Azure account, with usage billed through Azure and counting toward your Microsoft Azure Consumption Commitment (MACC). This page covers the Fireworks side of the integration. For Azure portal setup steps, see the Microsoft Learn guide.
New to Fireworks? Foundry users get the same OpenAI-compatible API and model catalog as direct Fireworks customers. Start with the PayGo quickstart below — you can be making requests in about 10 minutes.

Prerequisites

  • An active Azure subscription
  • The Fireworks integration enabled at the subscription level (see below)
  • A Microsoft Foundry project with the Azure AI Developer role assigned

Opt-in

Fireworks on Foundry requires a one-time opt-in per Azure subscription before you can create deployments. Follow the steps in the Microsoft Learn guide.

Deployment modes

Fireworks on Foundry supports three deployment modes.
ModeAlso calledPricingRegionsRight for
PayGoServerless, Data Zone StandardPer token, MACC-eligibleUS Data Zone onlyPrototyping, low-volume workloads
PTUProvisioned ThroughputPer PTU-hour, ACD + MACC eligibleGlobalProduction workloads with consistent traffic
Custom ModelsBring Your Own ModelPTU pricingGlobal (PTU regions)Fine-tuned model deployment
PTU deployments can be created directly in the Azure portal. For help with PTU sizing on Fireworks models, contact sales@fireworks.ai.

Available models

All models use the OpenAI-compatible chat completions API and are added to the catalog on a rolling basis. For the current list of available models, see the Microsoft Learn catalog. Chat completions only. Embeddings, image generation, and audio modalities are not available through Foundry.

PayGo quickstart

PayGo (Data Zone Standard) is available in: East US, East US 2, Central US, North Central US, West US, West US 3. The throughput limit for PayGo deployments is 250,000 tokens per minute (TPM).

Make your first request

Foundry deployments use an OpenAI-compatible endpoint. Use your Foundry project endpoint and Azure API key.
from openai import OpenAI

client = OpenAI(
    base_url="https://<your-project>.services.ai.azure.com/models",
    api_key="<your-azure-api-key>",
)

response = client.chat.completions.create(
    model="fireworks-ai/FW-GLM-5.1",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.choices[0].message.content)
Find your project endpoint in the Microsoft Foundry portal under Project settings.

PTU (Provisioned Throughput)

PTU deployments provide dedicated GPU capacity reserved for your workload, with consistent throughput and global region availability.
  • Dedicated capacity, not shared with other tenants
  • Available globally, not limited to US Data Zone
  • ACD-eligible and MACC-eligible
You can create a PTU deployment directly in the Azure portal. For more on provisioned throughput, see the Microsoft Learn guide. For help with PTU sizing on Fireworks models, contact sales@fireworks.ai.

Custom Models

Fine-tune on Fireworks and deploy on Foundry, or bring your own weights from wherever you post-train to deploy on Foundry. Your model is served on Fireworks infrastructure within Azure, billed through your Azure account.

Supported base architectures

For the list of supported custom model architectures, see the Microsoft Learn guide.

Deployment

To import and deploy a custom model, follow the Import custom models into Foundry guide.

Billing

All Fireworks on Foundry usage is billed through Azure. You do not need a separate Fireworks billing account or contract.
  • PayGo and PTU usage is MACC-eligible
  • PTU deployments are ACD-eligible and qualify for quota retirement
  • Direct Fireworks usage at fireworks.ai is billed separately and does not count toward MACC

Troubleshooting

IssueResolution
Quota exceeded errorRequest a limit increase at aka.ms/fireworks-quota
Access denied on deploymentVerify you have the Azure AI Developer role on the project
Opt-in not propagatingAllow up to 30 minutes after registering Fireworks.EnableDeploy
Custom Model deployment failingConfirm weights are full-weight (not LoRA adapters) and the architecture is in the supported list
PTU provisioning questionsContact sales@fireworks.ai

Additional resources