Rate limits, spend limits and quotas for serverless inference and on-demand deployments
Limits | Self-Serve |
---|---|
Requests per minute | 6,000 |
Audio min per minute, Whisper-v3-large | 200 |
Audio min per minute, Whisper-v3-turbo | 400 |
Concurrent connections, streaming speech transcription | 10 |
# LoRAs | 100 |
x-ratelimit-over-limit: yes
.Metric | Minimum Guaranteed Limit | 10 Minutes | 1 Hour | 2 Hours |
---|---|---|---|---|
Requests per minute | 60 | 120 | 720 | 1440 |
Input tokens per minute | 60000 | 120000 | 720000 | 1440000 |
Output tokens per minute | 6000 | 12000 | 72000 | 144000 |
Header | Description |
---|---|
x-ratelimit-limit-requests, x-ratelimit-limit-tokens-prompt, x-ratelimit-limit-tokens-generated | The maximum number of requests or tokens that are permitted per minute before the limit is exhausted and future requests are de-prioritized. requests refers to the number of completions (n > 1 counts as several requests). tokens-prompt and tokens-generated refer to the number of input and output tokens respectively. |
x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens-prompt, x-ratelimit-remaining-tokens-generated | The remaining number of requests or tokens that are permitted before exhausting the rate limit. Note that the limit is replenished continuously. If your usage is sustainably below the rate limit, this number will hover near its maximum value. |
x-ratelimit-over-limit | Contains “yes” or “no”. The value “yes” means that at least one of the limits is exhausted and this request was executed with lower priority. |
Limits | Self-Serve |
---|---|
Tokens per day, models < 40B | 2.5B |
Tokens per day, models between 40B - 100B | 1.25B |
Tokens per day, models > 100B (incl. large MoE like Deepseek R1) | 600M |
Quota Name | Default Value |
---|---|
# Nvidia A100 | 8 |
# Nvidia H100 | 8 |
# Nvidia H200 | 8 |
# AMD MI300X | 8 |
Total GPU Hours per month | 2000 |
# LoRAs | 100 |
Note that the limit on # LoRAs is a total limit across Serverless and On-Demand. |
$0
historical spend, you can purchase $100
prepaid credits and become a Tier 2 user.
Tier | Qualification | Spending Limit |
---|---|---|
Tier 1 | Valid payment method added | $50/mo |
Tier 2 | $50 spent in payments or credits added | $500/mo |
Tier 3 | $500 spent in payments or credits added | $5,000/mo |
Tier 4 | $5000 spent in payments or credits added | $50,000/mo |
Unlimited | Contact us at inquiries@fireworks.ai | Unlimited |