Direct routing
Direct routing enables enterprise users reduce latency to their deployments.
Internet direct routing
Internet direct routing bypasses our global API load balancer and directly routes your request to the machines where your deployment is running. This can save several tens or even hundreds of milliseconds of time-to-first-token (TTFT) latency.
To create a deployment using Internet direct routing:
If you have multiple API keys, use repeated fields, such as:
--direct-route-api-keys=<API_KEY_1> --direct-route-api-keys=<API_KEY_2>
. These keys can
be any alpha-numeric string and are a distinct concept from the API keys provisioned via the Fireworks console. A key
provisioned in the console but not specified the list here will not be allowed when querying the model via direct
routing.
Take note of the Direct Route Handle
to get the inference endpoint. This is what you will use access the deployment
instead of the global https://api.fireworks.ai/inference/
endpoint. For example:
Private Service Connect (PSC)
Contact your Fireworks representative to set up GCP Private Service Connect to your deployment.
AWS PrivateLink
Contact your Fireworks representative to set up AWS PrivateLink to your deployment.
Was this page helpful?