What we offer

Fireworks offers a variety of generative AI services. All services are pay-as-you-go using postpaid, developer-friendly pricing

  • Serverless Models - Run different generative AI models on Fireworks-hosted infrastructure with our optimized FireAttention software stack. This is the easiest way to get started. We’ve set up the hardware, so you only pay per token/image and don’t wait for boot-ups. We offer:
  • On-demand Deployments - Run text models on our own, private GPU and pay per second of GPU-usage. This is a great option if you (a) Have high volume (b) Need guaranteed latency (c) Need models that aren’t offered on-demand (see blog overview)
  • Fine-tuning - Fine-tune text models to use either serverless or on-demand. Fireworks charges only for tokens used for tuning. There’s no charge for deploying fine-tuned models. Fireworks lets you deploy 100 fine tuned models to be simultaneously ready for serverless or on-demand inference at 0 extra cost.