What factors affect the number of simultaneous requests that can be handled? - Fireworks AI Docs

The request handling capacity is influenced by multiple factors:

Model size and type
Number of GPUs allocated to the deployment
GPU type (e.g., A100 vs. H100)
Prompt size and generation token length
Deployment type (serverless vs. on-demand)

What’s the supported throughput?

How does autoscaling affect my costs?

⌘I