Skip to main content
Fireworks AI Docs home page
Documentation
API & SDK Reference
CLI Reference
Resources
Community
Status
Dashboard
Dashboard
Search...
Navigation
Deployment & Infrastructure
What factors affect the number of simultaneous requests that can be handled?
Search...
⌘K
Reference
Concepts
Changelog
Examples
Featured
Fine-tuning
Reinforcement Learning
FAQ
Account & Access
Billing & Pricing
Deployment & Infrastructure
Serverless SLAs
Serverless quotas
Model removal notice
Serverless timeout issues
System scaling
Auto scaling support
Throughput capacity
Request handling factors
Autoscaling cost impact
On-demand rate limits
On-demand billing
GPU deployment billing
Models & Inference
Deployment & Infrastructure
What factors affect the number of simultaneous requests that can be handled?
Copy page
Copy page
The request handling capacity is influenced by multiple factors:
Model size and type
Number of GPUs
allocated to the deployment
GPU type
(e.g., A100 vs. H100)
Prompt size
and
generation token length
Deployment type
(serverless vs. on-demand)
Was this page helpful?
Yes
No
What’s the supported throughput?
Previous
How does autoscaling affect my costs?
Next
⌘I