Which accelerator/GPU should I use?

On this page

Best Practices for Selection

It depends on your specific needs. Fireworks has two grouping of accelerators: smaller (A100) and larger (H100, H200, and MI300X) accelerators. Small accelerators are less expensive (see pricing page), so they’re more cost-effective for low-volume use cases. However, if you have enough volume to fully utilize a larger accelerator, we find that they tend to be both faster and more cost-effective per token. Choosing between larger accelerators depends on the use case.

MI300X has the highest memory capacity and sometimes enables large models to be deployed with comparatively few GPUs. For example, unquantized Llama 3.1 70B fits on one MI300X and FP8 Llama 405B fits on 4 MI300X’s. Higher memory also may enable better throughput for longer prompts and less sharded deployments. It’s also more affordably priced than the H100.
H100 offers blazing fast inference and often provides the highest throughput, especially for high-volume use cases
H200 is recommended for large models like DeepSeek V3 and DeepSeek R1 e.g. the minimum config for DeepSeek V3, DeepSeek R1 is 8 H200s.

Best Practices for Selection

Analyze your workload requirements to determine which GPU fits your processing needs.
Consider your throughput needs and the scale of your deployment.
Calculate the cost-performance ratio for each hardware option.
Factor in future scaling needs to ensure the selected GPU can support growth.

How does billing and scaling work for on-demand GPU deployments?

What are the common issues when deploying custom models?

Account & Access

Billing & Pricing

Deployment & Infrastructure

Models & Inference

Fine-tuning

Security & Compliance

Support & General

Which accelerator/GPU should I use?

Best Practices for Selection

Account & Access

Billing & Pricing

Deployment & Infrastructure

Models & Inference

Fine-tuning

Security & Compliance

Support & General

​Best Practices for Selection

Best Practices for Selection