When using Serverless, you may experienceDocumentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
429 Too Many Requests or 503 Service Overloaded. To avoid 429s, you need to stay below our adaptive rate limits. To reduce the likelihood of 503s, you can upgrade to Priority tier.
What are your rate limits?
There are two metrics we use to rate limit accounts:- Total Prompt TPM — input tokens per minute (cached + uncached).
- Uncached Prompt TPM - cached input tokens per minute.
- Generated TPM — output tokens per minute.
X-Ratelimit-Limit-Tokens-Prompt and X-Ratelimit-Limit-Tokens-Generated.
Adaptive rate limits have an upper and lower bound. A higher account Spending Tier correlates with higher upper bound rate limits.
How do I get higher limits sooner?
Reach out to inquiries@fireworks.ai for a custom solution if either of these applies:- You need higher than the defaults from day one. Your launch traffic exceeds the starting limit and you can’t wait for the adaptive ramp.
- You’re ramping past the highest upper bound. You are already at the highest account Spending Tier and the adaptive rate limits are not growing.