What factors affect the number of simultaneous requests that can be handled? - Fireworks AI Docs

Request handling capacity depends on several factors:

Model size and type
Number of GPUs allocated to the deployment
GPU type (e.g., A100, H100)
Prompt size
Generation token length
Deployment type (serverless vs. on-demand)

Does the API support batching and load balancing?

Can safety filters or content restrictions be disabled on text generation models?