Rate limits

Rate limits exist on Serverless to ensure fair platform usage and reasonable performance for all users on serverless. The rate limit is currently 600 queries per minute (QPM). Rate limits apply across all serverless models.

We are actively working on increasing rate limits on Serverless. Come back in a few weeks to check for updates!

If you need higher rate limits, consider switching to on-demand deployments, where there are no hard rate limits and you can scale the size of your deployment to meet your needs.

Spend limits

In order to prevent fraud, Fireworks imposes a monthly spending limit on your account. Once you hit the spending limit, your account will automatically enter a suspended state, API requests will be rejected and all Fireworks usage will be stopped. This includes serverless inference, dedicated deployments, and fine-tuning jobs.

Your spend limit will organically increase over time as you spend more on the platform. You can also increase your spend limit at any time, by purchasing prepaid credits to meet the historical spend required for a higher tier. For instance, if you are a new Tier 1 user with $0 historical spend, you can purchase $100 prepaid credits and become a Tier 2 user.

TierSpending LimitQualification
Tier 1$50/moValid payment method added
Tier 2$500/moTotal historical spend of $100+
Tier 3$5,000/moTotal historical spend of $1,000+
Tier 4$50,000/moTotal historical spend of $10,000+
UnlimitedUnlimitedContact us at inquiries@fireworks.ai
There could be a propagation delay after you prepay for credits. You may still see “monthly usage exceeded error” for a few minutes after topup.
Credits are counted against your spending limit, so it is possible to hit the spending limit before all of your current credits are depleted.

Other quotas

We impose limits on the number of custom models & LoRA you can have in your account, as well as the number of A100 and H100 GPUs you can deploy in your on-demand deployments. Higher quotas are available for enterprise accounts - contact the Fireworks team at inquiries@fireworks.ai.

Quota NameDefault ValueCan be raised?
# deployed models100Yes
# A100 GPUs8Yes
# H100 GPUs8Yes

Viewing quotas

You can view your current quota capacity by running:

firectl list quotas

Account suspension

Account suspension occurs when your spending limit is hit, no payment method is on file after credits are depleted, or past invoice payment fails. If you have a failed payment, go to the [Invoices] section at https://fireworks.ai/billing, pay all failed invoices, and your account will be automatically unsuspended. If your account is still suspended after 1 hour, contact the Fireworks team in Discord or via email.