Internet direct routing
Internet direct routing bypasses our global API load balancer and directly routes your request to the machines where your deployment is running. This can save several tens or even hundreds of milliseconds of time-to-first-token (TTFT) latency. To create a deployment using Internet direct routing:When creating a deployment with direct routing, the
--region parameter is required to specify the deployment region.--direct-route-api-keys=<API_KEY_1> --direct-route-api-keys=<API_KEY_2>. These keys can
be any alpha-numeric string and are a distinct concept from the API keys provisioned via the Fireworks console. A key
provisioned in the console but not specified the list here will not be allowed when querying the model via direct
routing.
Take note of the Direct Route Handle to get the inference endpoint. This is what you will use access the deployment
instead of the global https://api.fireworks.ai/inference/ endpoint. For example:
Use the OpenAI SDK with direct routing
Set the direct route handle (with the/v1 suffix) as the base_url when you initialize the OpenAI SDK so your calls go straight to the regional deployment endpoint.
The direct route handle replaces the standard https://api.fireworks.ai/inference/v1 endpoint—keep the
/v1 suffix so the OpenAI SDK routes requests correctly while bypassing the global load balancer to reduce latency.Supported Regions for Direct Routing
Direct routing is currently supported in the following regions:US_IOWA_1US_VIRGINIA_1US_ARIZONA_1US_ILLINOIS_1US_TEXAS_1US_ILLINOIS_2EU_FRANKFURT_1US_WASHINGTON_3US_WASHINGTON_1AP_TOKYO_1