Fireworks uses different controls for serverless and on-demand deployments. This page is the canonical reference for spending tiers, budget controls, on-demand GPU quotas, and account-wide request limits. For serverless TPM and adaptive limits, see Serverless rate limits.Documentation Index
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
Use this file to discover all available pages before exploring further.
Check your current limits
View your account’s current quotas and limits:Spending tiers
Your account tier determines the maximum budget you can set:| Tier | Criteria | Max Monthly Budget |
|---|---|---|
| Tier 1 | Valid payment method | $50 |
| Tier 2 | Spend or add $50 in credits | $500 |
| Tier 3 | Spend or add $500 in credits | $5,000 |
| Tier 4 | Spend or add $5,000 in credits | $50,000 |
| Unlimited | Contact us | Unlimited |
These spending tiers control both your maximum monthly budget and the maximum serverless TPM upper bounds your account can reach.
Manage your quotas
Account-wide request limits
All API usage on your account shares a single request-throughput envelope:| Account state | Request-rate limit |
|---|---|
| No payment method | 10 RPM |
| Payment method on file | 6,000 RPM (maximum) |
Budget control
Control your monthly spending with flexible budget limits. Set a limit that fits your needs and adjust it anytime.View and adjust your spend limit
Check your current spend limit:When you reach your budget
When you reach your spending limit, all API requests pause automatically across serverless inference, deployments, and fine-tuning. To resume, add credits to increase your tier and set a higher budget.On-demand deployment quotas
On-demand deployments have GPU quotas instead of rate limits:| GPU Type | Default Quota |
|---|---|
| Nvidia A100 | 16 GPUs |
| Nvidia H100 | 16 GPUs |
| Nvidia H200 | 16 GPUs |
| Nvidia B200 | 16 GPUs |
| LoRAs (on-demand) | 100 |
On-demand and dedicated deployments are not limited by adaptive serverless TPM upper bounds. If you receive HTTP 429 on those endpoints, it typically means deployment saturation (GPUs busy) rather than hitting a TPM tier cap. Requests still count toward account-wide request limits. See understanding 429 errors for details and resolution steps.
Account recovery
If your account is suspended due to failed payment:- Go to Billing → Invoices
- Pay any outstanding invoices
- Your account reactivates automatically within an hour