System scaling
Q: How does the system scale? Our system is horizontally scalable, meaning it:- Scales linearly with additional replicas of the deployment
- Automatically allocates resources based on demand
- Manages distributed load handling efficiently
Auto scaling
Q: Do you support Auto Scaling? Yes, our system supports auto scaling with the following features:- Scaling down to zero capability for resource efficiency
- Controllable scale-up and scale-down velocity
- Custom scaling rules and thresholds to match your specific needs
Throughput capacity
Q: What’s the supported throughput? Throughput capacity typically depends on several factors:- Deployment type (serverless or on-demand)
- Traffic patterns and request patterns
- Hardware configuration
- Model size and complexity
Request handling
Q: What factors affect the number of simultaneous requests that can be handled? The request handling capacity is influenced by multiple factors:- Model size and type
- Number of GPUs allocated to the deployment
- GPU type (e.g., A100 vs. H100)
- Prompt size and generation token length
- Deployment type (serverless vs. on-demand)
Additional resources
- Discord Community: discord.gg/fireworks-ai
- Email Support: inquiries@fireworks.ai
- Documentation: Fireworks.ai docs