Introduction
Welcome to the Fireworks onboarding guide! This guide is designed to help you quickly and effectively get started with the Fireworks platform, whether you’re a developer, researcher, or AI enthusiast. By following this step-by-step resource, you’ll learn how to explore and experiment with state-of-the-art AI models, prototype your ideas using Fireworks’ serverless infrastructure, and scale your projects with advanced on-demand deployments.Who this guide is for
This guide is designed for new Fireworks users who are exploring the platform for the first time. It provides a hands-on introduction to the core features of Fireworks, including the model library, playgrounds, and on-demand deployments, all accessible through the web app. For experienced users, this guide serves as a starting point, with future resources planned to dive deeper into advanced tools likefirectl
and other intermediate features to enhance your workflow.
Objectives of the guide
- Explore the Fireworks model library: Navigate and select generative AI models for text, image, and audio tasks.
- Experiment with the playground: Test prompts, tweak parameters, and generate outputs in real time.
- Prototype effortlessly: Use Fireworks’ serverless infrastructure to deploy and iterate without managing servers.
- Scale your AI: Learn how on-demand deployments offer predictable performance and advanced customization.
- Develop complex systems: Unlock advanced capabilities like Compound AI, function calling, and retrieval-augmented generation to create production-ready applications.
Step 1. Explore our model library
Fireworks provides a range of leading open-source models for tasks like text generation, code generation, and image understanding. With the Fireworks model library, you can choose from our wide range of popular LLMs, VLMs, LVMs, and audio models, such as:- LLMs: Llama 3.3 70B, Deepseek V3, and Qwen2.5 Coder 32B Instruct.
- VLMs: Llama 3.2 90B Vision Instruct.
- Vision models: BFL’s FLUX.1 [dev] FP8 and Stability.ai’s Stable Diffusion 3.5 Large Turbo.
- Audio models: Whisper V3 and (blazing fast)Whisper V3 Turbo.
🎥 Part 1: Introducing the Model Library
🎥 Part 1: Introducing the Model Library
In this video, we introduce the Fireworks Model Library, your gateway to a diverse range of open-source and proprietary models designed for tasks like text generation, image understanding, and audio processing. Whether you’re a developer or a creative, Fireworks makes it easy to find and integrate the right tools for your generative AI needs.
2️⃣ Customizing your experience: Use filters like “Serverless Models” to find models that fit your specific needs.
3️⃣ Seamless integration: Discover how Fireworks simplifies the process of discovering and managing AI models.
What you’ll learn:
1️⃣ Navigating the model library: Browse popular models, filter by deployment type, and search for specific tools like Llama, Whisper, and Flux.2️⃣ Customizing your experience: Use filters like “Serverless Models” to find models that fit your specific needs.
3️⃣ Seamless integration: Discover how Fireworks simplifies the process of discovering and managing AI models.
Developers building generative AI applications can interact with Fireworks in multiple ways:
- 🌐 Via the web app: Access the Fireworks platform directly in your browser for easy model management.
- 🐍 Through our Python SDK: Programmatically integrate and manage models within your codebase.
- 🔗 With external providers: Pass your Fireworks API key to third-party tools for seamless workflow integration.
Action items
- 👀 Browse the model library: Explore our open and closed-source models.
- 📚 Read real-world use cases: See how customers are building production systems like:
- 👋 Join our Discord community: Connect and share your projects.
Step 2. Experiment using the model playground
The easiest way to get started with Fireworks and test models with minimal setup is through the Model Playground. Here, you can experiment with prompts, adjust parameters, and get immediate feedback on results before moving to more advanced steps. Take a closer look at how the LLM Playground lets you experiment with text-based models.🎥 Part 2A: Introducing the LLM playground
🎥 Part 2A: Introducing the LLM playground
In this video, we explore the Fireworks Model Playground, the easiest way to experiment with LLMs, adjust parameters, and get instant feedback. Whether you’re crafting creative prompts, refining outputs, or testing model performance, the Playground is your go-to tool for seamless experimentation.
✨ What you’ll learn:
- 🔍 Getting started: Access the Playground from the Model Library by selecting models like Llama 3.3 70B Instruct.
- 📋 Model details: Discover key information, including starter code in Python, Typescript, Java, Go, and Shell for Chat and Completion modes.
- 🎭 Running prompts: Test creative prompts like “Write a synopsis of the modern 2020 version of the Cats musical” and see instant results.
- 🎛️ Parameter controls: Adjust settings like temperature and max tokens to refine outputs to your liking.
- ⚡ Completion mode: Explore latency and tokens-per-second metrics with prompts like “Write a synopsis of the modern 2020 Tarzan movie with Brendan Fraser.”
- 💻 Code integration: Generate ready-to-use code snippets directly from the Playground for effortless integration into your projects.
Discover how the Image Playground transforms visual AI experimentation into an intuitive process.
🎥 Part 2B: Introducing the Image Playground
🎥 Part 2B: Introducing the Image Playground
In this video, we dive into the Fireworks Image Playground, where you can create stunning visuals, refine parameters, and explore the possibilities of AI-driven image generation. Perfect for developers, designers, and creators, the Image Playground is your gateway to experimenting with prompts and parameters for artistic and practical outputs.
✨ What you’ll learn:
- ☑️ Getting started: Navigate the Model Library to find image models like FLUX.1 schnell FP8 and open them in the Model Playground.
- ☑️ Crafting prompts: Use creative prompts like “Movie poster for a film set in a world where gravity doesn’t exist” and watch the model bring your vision to life.
- ☑️ Adjusting parameters: Experiment with settings like Guidance Scale, Inference Steps, and Seed to refine and perfect your results.
- ☑️ Exploring variants: Test different models, such as FLUX.1 dev FP8, for varied image quality and creative flexibility.
- ☑️ Integrating code: Generate and view sample code in Python, Typescript, or Shell, complete with request parameters and response codes for seamless integration.
Experience how the Audio Playground empowers advanced audio transcription and translation tasks.
🎥 Part 2C: Introducing the Audio Playground
🎥 Part 2C: Introducing the Audio Playground
Welcome to Part 2C of our onboarding series! In this video, we explore the Fireworks Audio Playground, showcasing the incredible speed and accuracy of the Whisper Turbo models. Whether you’re transcribing, translating, or analyzing audio, Fireworks makes it easy to experiment and unlock the potential of advanced audio models.
✨ What you’ll learn:
- 🎵 Real-world test case: Using the song Do You Hear the People Sing? from Les Misérables, featuring nine distinct languages and various English accents, to demonstrate transcription and translation capabilities.
- 🔍 Navigating the model library: Find Whisper v3 Turbo and access its playground.
- 📂 Uploading audio: Test the model with screen-recorded audio to ensure unbiased results without metadata influence.
- ⚡ Fast and accurate transcription: Observe Whisper Turbo’s ability to transcribe multilingual content at lightning speed and compare its output to the original lyrics.
🔑 Key features of the Audio Playground:
- 🌍 Multilingual capabilities: Whisper Turbo excels in recognizing and transcribing multiple languages and dialects.
- ⚡ Incredible speed: Experience near-instant transcriptions for even complex audio files.
- 🎛️ Interactive testing: Upload audio, tweak parameters, and explore transcription and translation features in real time.
Each model in the Playground includes the following features, designed to enhance your experimentation and streamline your workflow:
- 🎛️ Parameter controls: Adjust settings like temperature and max tokens for LLMs or image-specific parameters (e.g., Guidance Scale) for image generation models. These controls allow you to fine-tune the behavior and outputs of the models, helping you achieve the desired results for different use cases.
- 🧩 Code samples: Copy-paste ready-to-use code in Python, Typescript, or Shell to integrate models directly into your applications. This eliminates the guesswork of API implementation and speeds up development, so you can focus on building impactful solutions.
- 🎨 Additional UI elements: Leverage interactive features like file upload buttons for image or audio inputs, making it easy to test multimodal capabilities without any additional setup. This ensures a smooth, hands-on testing experience, even for complex workflows.
-
🔍 Model ID: Clearly displayed in the format
account/fireworks/models/<model_name>
, allowing you to switch between models effortlessly with a single line of code, making experimentation and integration faster and more efficient.
Action items
-
💻 🖱️ Sign into your account and explore various models, including:
- LLMs and VLMs: Llama 3.3 70B, Llama 3.2 90B Vision Instruct
- VLMs: FLUX.1 [dev] FP8
- Audio models: Whisper V3 Turbo
-
❓ Have questions, comments, or feedback? Head over to Discord and post in:
#feature-requests
#questions
#bug-reports
- 📚 Check out sampling options: Review the sampling options for text models to see the parameters we currently support.
Step 3. Prototyping with serverless
Fireworks’ serverless infrastructure lets you quickly prototype AI models without managing servers or committing to long-term contracts. This setup supports fast experimentation and seamless scaling for your projects.Why use Fireworks serverless?
- 🚀 Launch instantly: Deploy apps with no setup or configuration required.
- 🎯 Focus on prompt engineering: Design and refine your prompts without worrying about infrastructure.
- ⚙️ Adjust parameters easily: Modify settings like temperature and max tokens to customize model outputs.
- 💰 Pay-as-you-go: Only pay for what you use, with pricing based on parameter size buckets, making it cost-effective for projects of any size.
🎥 Part 3A: Generating Your API Key
🎥 Part 3A: Generating Your API Key
In this video, we’ll guide you through generating your Fireworks API key, the first step to leveraging Fireworks’ serverless infrastructure. Prototype AI models with ease, scale seamlessly, and focus on building without worrying about managing servers.
2️⃣ Generate your key: Select ‘API Keys’ and click ‘Create API Key’ to generate your unique key.
3️⃣ Copy and secure: Save your API key securely—it’s essential for authentication.
✨ Why use Fireworks serverless?
- 🚀 Launch instantly: Deploy apps with no setup or configuration required.
- 🎯 Focus on prompt engineering: Refine your prompts without infrastructure headaches.
- ⚙️ Adjust parameters easily: Tweak settings like temperature and max tokens to customize outputs.
- 💰 Pay-as-you-go: Cost-effective pricing based on usage, perfect for projects of any size.
🛠️ How to get your API key:
1️⃣ Navigate to User Settings: Log in to your Fireworks account and click the profile icon.2️⃣ Generate your key: Select ‘API Keys’ and click ‘Create API Key’ to generate your unique key.
3️⃣ Copy and secure: Save your API key securely—it’s essential for authentication.
Using your API key
Your API key is essential for securely accessing and managing your serverless deployments. Here’s how to use it:- Via the API: Include your API key in the headers of your RESTful API requests to integrate Fireworks’ models into your applications.
- Using our SDK: Configure the Fireworks Python library with your API key to manage and deploy models programmatically.
- Through third-party tools: Pass your API key to third-party clients (like LangChain) to incorporate Fireworks into your existing workflows, enabling you to use serverless models seamlessly.
🎥 Part 3B: Calling an LLM
🎥 Part 3B: Calling an LLM
In this video, we’ll show you how to use your Fireworks API key to call serverless LLMs and effortlessly prototype with Fireworks’ serverless infrastructure. Whether you’re creating structured datasets or testing model outputs, Fireworks makes scaling your ideas simple—no servers required!
✨ What you’ll learn:
- 📖 Accessing the Cookbook: Explore Fireworks’ GitHub repo and open example notebooks like “Llama 3.1 Synthetic Data Generation” in Colab.
- 🔑 Using your API key: Learn how to securely generate and use your Fireworks API key for authentication.
- 🤖 Interacting with models: Call Llama 3.1 models to generate structured synthetic data and customize outputs.
- 🎯 Prompt engineering in action: See how to craft prompts to generate JSON-structured quiz questions with context, responses, and metadata.
🌟 Featured example:
Watch as we:- 📍 Generate geography quiz questions: Using Llama 3.1 405B for structured outputs.
- 💾 Save data: Store structured data in JSONL format for project use.
- ⚡ Showcase flexibility: Highlight how Fireworks supports dataset creation, testing, and more.
Action items
- 🔑 Get your API key: Navigate to your account settings and generate your API key to authenticate your requests.
- 📓 Call a serverless model: See how you can call a serverless model using a sample notebook.
- 🔖 Read the API usage guide: Understand the different endpoints and parameters available for use in your projects.
- 📚 Read the serverless deployment guides: Access our docs on serverless usage, pricing, and rate limits.
- 💻 Try out additional sample notebooks: Use your Fireworks API key to explore more sample notebooks in our cookbook.
Step 4. Scale out with on-demand deployments
Fireworks’ on-demand deployments provide you with dedicated GPU instances, ensuring predictable performance and advanced customization options for your AI workloads. These deployments allow you to scale efficiently, optimize costs, and access exclusive models that aren’t available on serverless infrastructure.Why choose on-demand deployments?
- 🏎️ Predictable performance: Enjoy consistent performance unaffected by other users’ workloads.
- 📈 Flexible scaling: Adjust replicas or GPU resources to handle varying workloads efficiently.
- ⚙️ Customization: Choose GPU types, enable features like long-context support, and apply quantization to optimize costs.
- 🔓 Expanded access: Deploy larger models or custom models from Hugging Face files.
- 💰 Cost optimization: Save more with reserved capacity when you have high utilization needs.
Key features of on-demand deployments
- 🔄 Replica scaling: Automatically adjust replicas to handle workload changes.
- 🖥️ Hardware options: Choose GPUs like NVIDIA H100, NVIDIA A100, or AMD MI300X to match your performance and budget needs. Check the Regions Guide for availability.
- ⚡ Quantization: Use FP8 or other precision settings to improve speed and reduce costs while keeping accuracy high. See the Quantization Guide.
Action items
- 🔖 Understand the benefits of on-demand versus serverless: Learn about the full range of deployment options and how to customize them to your needs.
- 📚 Explore optimization techniques: Learn how caching, quantization, and speculative decoding can improve performance and reduce costs.
- ❓ Check out our FAQs: Find answers to common questions about account management, support services, and on-demand deployment infrastructure.
Step 5. Building Compound AI systems
Expand your AI capabilities by incorporating advanced features like Compound AI, function calling, or retrieval-augmented generation (RAG). These tools enable you to build sophisticated applications that integrate seamlessly with external systems. For greater control, consider on-prem or BYOC deployments.With Fireworks, you can:
- 🛠️ Leverage advanced features: Build Compound AI systems with function calling, RAG, and agents (Advanced Features).
- 🔗 Integrate external tools: Connect models with APIs, databases, or other services to enhance functionality.
- 🔍 Optimize workflows: Use Fireworks’ advanced tools to streamline AI development, enhance system efficiency, and scale complex applications with ease.
Action items
-
📚 Learn about Compound AI and Advanced Features: Explore richer functionality to create more sophisticated applications.
- Fireworks Compound AI System: With f1, experience how specialized models work together to deliver groundbreaking performance, efficiency, and advanced reasoning capabilities.
- Multimodal enterprise: See how Fireworks integrates text, image, and audio models to power enterprise-grade multimodal AI solutions.
- Multi-LoRA fine-tuning: Learn how Multi-LoRA fine-tuning enables precise model customization across diverse datasets.
- Audio transcription launch: Explore Fireworks’ state-of-the-art audio transcription models for fast and accurate speech-to-text applications.
- 📞 Contact us for enterprise solutions: Have complex requirements or need reserved capacity? Reach out to our team to discuss tailored solutions for your organization.