Start building with open source AI models
Fireworks AI is the best platform for building AI product experiences with open source AI models. You can run and customize AI models with just a few lines of code!
Make your first API call with Fireworks Serverless Inference
View 100s of supported models across text, vision, audio, image and more
Get the best speed, reliability, & scalability
Customize a model for your specific use case
Query vision language models
Convert speech to text async or in realtime
Get responses in your specified JSON schema
Customize and deploy a model on Fireworks
Get support and discuss with other developers
Code examples, tutorials and guides
Technical analysis, features and customer stories
Check status of Fireworks AI services
Security and compliance resources
Contact Sales or reach out to our team
The Fireworks platform empowers developers to create generative AI systems with the best quality, cost and speed. All publicly available services are pay-as-you-go with developer friendly pricing. See the below list for offerings and docs links. Scroll further for more detailed descriptions and blog links.
Fireworks has 3 options for running generative AI models with unparalleled speed and costs.
Property | Serverless | On-demand | Enterprise reserved |
---|---|---|---|
Performance | Industry-leading speed on Fireworks-curated set-up. Performance may vary with others’ usage. | Speed dependent on user-specified GPU configuration and private usage. Per GPU latency should be significantly faster than vLLM. | Tailor-made set-up by Fireworks AI experts for best possible latency |
Getting Started | Self-serve - immediately use serverless with 1 line of code | Self-serve - configure GPUs, then use them with 1 line of code. | Chat with Fireworks |
Scaling and management | Scale up and down freely within rate limits | Option for auto-scaling GPUs with traffic. GPUs scale to zero automatically, so no charge for unused GPUs and for boot-ups. | Chat with Fireworks |
Pricing | Pay fixed price per token | Pay per GPU second with no commitments. Per GPU throughput should be significantly greater than options like vLLM. | Customized price based on reserved GPU capacity |
Commitment | None | None | Arrange plan length with Fireworks |
Rate limits | Yes, see quotas | No rate limits. Quotas on number of GPUs | None |
Model Selection | Collection of popular models, curated by Fireworks | Use 100s of pre-uploaded models or upload your own custom model within supported architecture | Use 100s of pre-uploaded models or upload any model |
FireOptimizer: Fireworks optimizes inference for your workload and your use case, and performs fine-tuning, through FireOptimizer. FireOptimizer includes several optimization techniques. Publicly available features are:
Fireworks makes it easy to use multiple models and modalities together in one compound AI system. Features include: