Priority tier and Turbo mode are in Preview. The features, pricing, and availability may change - we welcome your feedback!
Fireworks offers a Priority tier for workloads that require higher reliability, as well as a Turbo mode for workloads that require higher speeds.
Priority tier
Priority tier is for workloads that require higher reliability during peak traffic periods, at a higher price point. Priority tier is prioritized above Standard traffic and is less likely to be rate limited.
To use priority tier, set service_tier to "priority" (OpenAI-compatible chat completions only):
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/kimi-k2p5",
"service_tier": "priority",
"messages": [{"role": "user", "content": "Hello"}]
}'
Priority tier is available on select models. Models and pricing are listed on the Pricing page.
Turbo mode
Turbo mode is a high speed configuration, useful for interactive applications that require fast response speeds, at a higher price point. It is not a different model and the quality of the model remains the same.
Turbo mode is available for select models. To use Turbo mode, change the model id as listed below.
| Model | model id |
|---|
| Kimi K2.6 Turbo | accounts/fireworks/routers/kimi-k2p6-turbo |
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/kimi-k2p6-turbo",
"messages": [{"role": "user", "content": "Hello"}]
}'
Pricing is listed on the Pricing page.