Skip to main content
Fireworks AI Docs home page
Documentation
API & SDK Reference
CLI Reference
Resources
Community
Status
Dashboard
Dashboard
Search...
Navigation
Deployment & Infrastructure
What’s the supported throughput?
Search...
⌘K
Reference
Concepts
Changelog
Examples
Featured
Fine-tuning
Reinforcement Learning
FAQ
Account & Access
Billing & Pricing
Deployment & Infrastructure
Serverless SLAs
Serverless quotas
Model removal notice
Serverless timeout issues
System scaling
Auto scaling support
Throughput capacity
Request handling factors
Autoscaling cost impact
On-demand rate limits
On-demand billing
GPU deployment billing
Models & Inference
Deployment & Infrastructure
What’s the supported throughput?
Copy page
Copy page
Throughput capacity typically depends on several factors:
Deployment type
(serverless or on-demand)
Traffic patterns
and
request patterns
Hardware configuration
Model size and complexity
Was this page helpful?
Yes
No
Do you support Auto Scaling?
Previous
What factors affect the number of simultaneous requests that can be handled?
Next
⌘I