Skip to main content

Platform costs

Q: How much does Fireworks cost? Fireworks AI uses a usage-based pre-paid model for new self-serve accounts. You purchase credits, then usage is deducted based on:
  • Per token for serverless inference
  • Per GPU usage time for on-demand deployments
  • Per token of training data for fine-tuning
Billing model by account type:
  • Accounts created before June 1 keep their existing postpaid terms (grandfathered).
  • Enterprise accounts can be configured for postpaid billing on request.
For customers needing enterprise-grade security and reliability, please reach out to us at inquiries@fireworks.ai to discuss options. Find out more about our current pricing on our Pricing page.

Fine-tuning fees

Q: Are there extra fees for serving fine-tuned models? Fine-tuned (LoRA) models require a dedicated deployment. Here’s what you need to know: What you pay for:
  • Deployment costs on a per-GPU-second basis for hosting the model
  • Usage costs on a per-token basis when the model is used for inference
  • The fine-tuning process itself, if applicable
Deployment options:
  • Live-merge deployment: Deploy your LoRA model with weights merged into the base model for optimal performance
  • Multi-LoRA deployment: Deploy up to 100 LoRA models as addons on a single base model deployment
For more details, see the Deploying Fine Tuned Models guide.

Additional resources