Skip to main content
Direct routing is deprecated. Our API gateway is now as fast as direct routing and offers additional benefits. Please follow the migration guide below to switch your requests to the API gateway.

Why migrate?

Direct routing used to be the go-to option for ultra-low latency because it bypassed the global API gateway and hit your deployment directly. We have since made major upgrades to the API gateway so it is now as fast as direct routing. By migrating to the API gateway you get:
  • Multi-region reliability — opt in to multi-region deployments for automatic failover across regions
  • Region flexibility — move your deployment to other regions without changing your client code
  • Sub-10 ms overhead — negligible added latency for most users across the globe
  • Automatic retries — the gateway retries many classes of transient errors for you
  • One URL for everything — no need to manage a different URL per deployment

How to migrate

The migration is a two-liner change: update the URL and swap your direct-route API key for your standard Fireworks API key.

cURL

curl https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIRECT_ROUTE_API_KEY" \
  -d '{
    "model": "accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ]
  }'
What changed:
  1. URL — replace https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai/v1 with https://api.fireworks.ai/inference/v1
  2. API key — replace your direct-route API key with your Fireworks API key

Python (Fireworks SDK)

from fireworks import Fireworks

client = Fireworks(
    base_url="https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai",
    api_key="<YOUR_DIRECT_ROUTE_API_KEY>"
)

response = client.chat.completions.create(
    model="accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(response.choices[0].message.content)
What changed:
  1. base_url — remove it; the SDK defaults to api.fireworks.ai
  2. api_key — remove it; the SDK reads the FIREWORKS_API_KEY environment variable automatically

Python (OpenAI SDK)

import os
from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_DIRECT_ROUTE_API_KEY>",
    base_url="https://my-account-abcd1234.us-arizona-1.direct.fireworks.ai/v1",
)

response = client.chat.completions.create(
    model="accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(response.choices[0].message.content)
What changed:
  1. base_url — replace the direct-route handle with https://api.fireworks.ai/inference/v1
  2. api_key — use your Fireworks API key (via the FIREWORKS_API_KEY environment variable)

Cloud-specific endpoints

The examples above use api.fireworks.ai, which gives you the lowest latency possible. For advanced use cases, two additional endpoints are available:
  • api.fireworks.ai — Default, lowest latency for most users
  • aws.api.fireworks.ai — Use if you are on AWS and want traffic to stay on AWS as much as possible
  • gcp.api.fireworks.ai — Use if you are on GCP and want traffic to stay on GCP as much as possible

Private connectivity

Private Service Connect (PSC)

Contact your Fireworks representative to set up GCP Private Service Connect to your deployment. Contact your Fireworks representative to set up AWS PrivateLink to your deployment.