Cost structure

Platform costs

Q: How much does Fireworks cost? Fireworks AI operates on a pay-as-you-go model for all non-Enterprise usage, and new users automatically receive free credits. You pay based on:

Per token for serverless inference
Per GPU usage time for on-demand deployments
Per token of training data for fine-tuning

For customers needing enterprise-grade security and reliability, please reach out to us at inquiries@fireworks.ai to discuss options. Find out more about our current pricing on our Pricing page.

Fine-tuning fees

Q: Are there extra fees for serving fine-tuned models? Fine-tuned (LoRA) models require a dedicated deployment. Here’s what you need to know: What you pay for:

Deployment costs on a per-GPU-second basis for hosting the model
Usage costs on a per-token basis when the model is used for inference
The fine-tuning process itself, if applicable

Deployment options:

Live-merge deployment: Deploy your LoRA model with weights merged into the base model for optimal performance
Multi-LoRA deployment: Deploy up to 100 LoRA models as addons on a single base model deployment

For more details, see the Deploying Fine Tuned Models guide.

Additional resources

Discord Community: discord.gg/fireworks-ai
Email Support: inquiries@fireworks.ai
Documentation: Fireworks.ai docs

Reference

Examples

FAQ

Platform costs

Fine-tuning fees

Additional resources

Reference

Examples

FAQ

Documentation Index

​Platform costs

​Fine-tuning fees

​Additional resources

Platform costs

Fine-tuning fees

Additional resources