On-demand infrastructure
Hardware options
Understanding hardware choices for Fireworks.ai on-demand deployments.
Hardware selection
Q: Which accelerator/GPU should I use?
It depends on your specific needs. Fireworks has two grouping of accelerators: smaller (A100) and larger (H100 and MI300X) accelerators. Small accelerators are less expensive (see pricing page), so they’re more cost-effective for low-volume use cases. However, if you have enough volume to fully utilize a larger accelerator, we find that they tend to be both faster and more cost-effective per token.
Choosing between larger accelerators depends on the use case.
- The MI300X has the highest memory capacity and sometimes enables large models to be deployed with comparatively few GPUs. For example, unquantized Llama 3.1 70B fits on one MI300X and FP8 Llama 405B fits on 4 MI300X’s. Higher memory also may enable better throughput for longer prompts and less sharded deployments. It’s also more affordably priced than the H100.
- The H100 offers blazing fast inference and often provides the highest throughput, especially for high-volume use cases
Best Practices for Selection
- Analyze your workload requirements to determine which GPU fits your processing needs.
- Consider your throughput needs and the scale of your deployment.
- Calculate the cost-performance ratio for each hardware option.
- Factor in future scaling needs to ensure the selected GPU can support growth.
Additional resources
- Discord Community: discord.gg/fireworks-ai
- Email Support: inquiries@fireworks.ai
- Contact our sales team for custom pricing options
Was this page helpful?