> ## Documentation Index > Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt > Use this file to discover all available pages before exploring further. # Serverless Rate Limits > Adaptive rate limits grow and shrink with your usage When using Serverless, you may experience `429 Too Many Requests` or `503 Service Overloaded`. To avoid 429s, you need to stay below our adaptive rate limits. To reduce the likelihood of 503s, you can upgrade to [Priority tier](/serverless/serving-paths). ## What are your rate limits? There are three metrics we use to rate limit accounts: * **Total Prompt TPM** — input tokens per minute (cached + uncached). * **Uncached Prompt TPM** — uncached input tokens per minute. * **Generated TPM** — output tokens per minute. **Starting limits:** 3.6M Total Prompt TPM, 900k Uncached Prompt TPM, 36k Generated TPM (\~60k / \~15k / \~600 TPS). **Enforcement uses TPM**, not TPS. Based on your usage, your adaptive limits will grow and shrink. If your traffic ramps up too quickly, you will get 429s. kimi-k2p6 usage and rate limits

Your current effective rate limits (described in tokens per second) are in the response headers `X-Ratelimit-Limit-Tokens-Prompt`, `X-Ratelimit-Limit-Tokens-Cache-Adjusted-Prompt`, and `X-Ratelimit-Limit-Tokens-Generated`. Adaptive rate limits have an upper and lower bound. A higher account [Spending Tier](/guides/quotas_usage/account-quotas#spending-tiers) correlates with higher upper bound rate limits; **enterprise accounts** get higher upper bounds automatically. ## FAQ **No.** Staying within your rate limits does not guarantee that every request succeeds. When a deployment is busy, your traffic can still be **load shed**, and those responses are **`503 Service Overloaded`**. To **decrease the chance** of being load shed, you can use [Priority tier](/serverless/serving-paths), which is prioritized during high load. Rate limits are scoped **per account** and **per model**. **Fast** and **regular** model variants have **separate** limits. **Priority tier** and **regular** requests share the **same** rate limits for a given model. First, try **exponential backoff** when retrying. Reach out to [inquiries@fireworks.ai](mailto:inquiries@fireworks.ai) for a custom solution if either of these applies: * **You need higher than the defaults from day one.** Your launch traffic exceeds the starting limit and you can't wait for the adaptive ramp. * **You're ramping past the highest upper bound.** You are already at the highest account Spending Tier and the adaptive rate limits are not growing.