Models & Inference
What quantization format is used for the Llama 3.1 405B model?
The Llama 3.1 405B model uses the FP8 quantization format, which:
- Closely matches Meta’s reference implementation
- Provides further details in the model description at fireworks.ai/models/fireworks/llama-v3p1-405b-instruct
- Has a general quantization methodology documented in our Quantization blog
Note: BF16 precision will be available soon for on-demand deployments.