2025-12-22
Playground Categories, New User Roles, Fine-Tuning Improvements, and New Models
Playground Categories
The Playground now features category tabs (LLM, Image, TTS, STT) in the header for easier switching between model types. The playground automatically detects the appropriate category based on the selected model and provides smart defaults for each category.User Roles: Contributor and Inference
New user roles provide more granular access control for team collaboration:- Contributor: Read and write access to resources without administrative privileges
- Inference: Read-only access with the ability to run inference on deployments
Fine-Tuning Improvements
Fine-tuning workflows have been enhanced with several new capabilities:- Stop and Resume Jobs: Stop running fine-tuning jobs and resume them later from where they left off. Available for Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs.
- Clone Jobs: Quickly create new fine-tuning jobs based on existing job configurations using the Clone action.
- Download Output Datasets: Download output datasets from Reinforcement Fine-Tuning jobs, including individual files or bulk download as a ZIP archive.
- Download Rollout Logs: Download rollout logs from Reinforcement Fine-Tuning jobs for offline analysis.
✨ New Models
- Gemma 3 12B Instruct is now available in the Model Library
- Gemma 3 4B Instruct is now available in the Model Library
- Qwen3 Omni 30B A3B Instruct is now available in the Model Library
📚 Documentation Updates
- Deployment Shapes API: Added List Deployment Shapes and Get Deployment Shape endpoints for querying available deployment shapes
- Evaluator APIs: Added Create Evaluator, Update Evaluator, and helper endpoints for evaluator source code, build logs, and upload validation
- Fine-Tuning APIs: Added Resume DPO Job, Resume Reinforcement Fine-Tuning Step, Execute Reinforcement Fine-Tuning Step, and Get Evaluation Job Log Endpoint
- SDK Examples: Added Python SDK example links for direct routing and supervised fine-tuning workflows
Bug Fixes & Minor Improvements
Bug Fixes & Minor Improvements
- Deployment Progress Display: Deployment details page now shows live deployment progress with replica status (pending, downloading, initializing, ready) and error banners (Web App)
- Multi-LoRA Display: Deployment details page now shows all deployed models with expandable sections, not just the default model (Web App)
- Prompt Cache Usage Chart: Added Cached Prompt Tokens chart to the Serverless Usage page for visibility into prompt caching savings (Web App)
- Audio Usage Charts: Audio usage metrics are now displayed in a dedicated Voice tab on the Usage page with filtering support (Web App)
- Deployment Shape Search: Improved deployment shape discovery for models where exact base model matches aren’t found, using parameter count bucketing (Web App)
- Vision Model Auto-Detection: Vision-language models (Qwen VL, LLaVA, Phi-3 Vision, etc.) now automatically have image input support enabled when uploaded (API)
- Dataset Loading UX: Output dataset tables now stream results with a progress indicator for faster perceived loading (Web App)
- File Size Limit: Dataset uploads now enforce a clear file size limit with improved error messaging (Web App)
- Number Input Fields: Improved number input validation across forms (Web App)
- Combobox Responsiveness: Improved combobox dropdown height on smaller screens (Web App)
- Evaluator Editor Scroll: Prevented accidental page navigation when scrolling inside the evaluator code editor (Web App)
- Evaluator Save Dialog: Fixed overflow issues in the save evaluator dialog (Web App)
- Evaluator Selector Labels: Fixed label rendering in async evaluator select components (Web App)
- Deploy Button Validation: Deploy button is now disabled when the model is not ready (Web App)
- Model Metadata: Fixed missing model metadata display on deployment pages (Web App)
- Invoice Display: Invoice list now shows “paid” status for contract payments (Web App)
- Color Palette Update: Updated UI color palette for improved visual consistency (Web App)
2025-12-15
Reasoning Guide, Prompt Caching Updates, New Models and CLI Updates
Reasoning Guide
A new Reasoning guide is now available in the documentation. This comprehensive guide covers:- Accessing
reasoning_contentfrom thinking/reasoning models - Controlling reasoning effort with the
reasoning_effortparameter - Streaming with reasoning content
- Interleaved thinking for multi-step tool-calling workflows
Prompt Caching Updates
Prompt caching documentation has been updated with expanded guidance:- Cached prompt tokens on serverless now cost 50% less than uncached tokens
- Session affinity routing via the
userfield orx-session-affinityheader for improved cache hit rates - Prompt optimization techniques for maximizing cache efficiency
✨ New Models
- Devstral Small 2 24B Instruct 2512 is now available in the Model Library
- NVIDIA Nemotron Nano 3 30B A3B is now available in the Model Library
📚 Documentation Updates
- Reasoning Guide: New documentation for working with reasoning models, including
reasoning_content,reasoning_effort, streaming, and interleaved thinking (Reasoning) - Recommended Models: Updated recommendations to include DeepSeek V3.2 for code generation and Kimi K2 Thinking as a GPT-5 alternative (Recommended Models)
- OpenAI Compatibility: Removed
stopsequence documentation as Fireworks is now 1:1 compatible with OpenAI’s behavior (OpenAI Compatibility) - Evaluator APIs: Added REST API documentation for Evaluator and Evaluation Job CRUD operations (Evals API Reference)
- firectl CLI Reference: Updated with new commands including
cancel dpo-job,cancel supervised-fine-tuning-job,set-api-key,redeem-credit-code, and evaluator revision management
Bug Fixes & Minor Improvements
Bug Fixes & Minor Improvements
- Audio Usage Charts: Added a Voice modality tab to the Usage page for viewing audio-specific usage metrics and charts (Web App)
- Deployment Page Redesign: Redesigned deployment details page with inline action buttons (Edit, Delete, Enable/Disable), reordered metadata sections, and collapsible API examples (Web App)
- Deployment API Examples: API code examples now use the deployment route as the canonical model identifier for clearer usage patterns (Web App)
- LoRA Addons Tab: Renamed “Serverless LoRA” tab to “LoRA Addons” on the deployments dashboard for clarity (Web App)
- Evaluator Selector: Improved evaluator selector UI to display both display name and evaluator ID for easier identification (Web App)
- Evaluator Delete Confirmation: Added confirmation modal with success/error feedback when deleting evaluators (Web App)
- Evaluator Code Viewer: Evaluator source files now load asynchronously, preventing browser freezes with large files (Web App)
- Dataset Image Preview: Dataset preview now properly renders image content in message bubbles and comparison views (Web App)
- Model Search: Improved model search accuracy by restricting results to display name and model ID matches only (Web App)
- Fine-Tuning Progress Status: Fine-tuning job detail pages now show the initial job status immediately on page load (Web App)
- Evaluator Test Controls: Dataset selection and pagination are now disabled while an evaluator test is running to prevent conflicts (Web App)
- Repository Name Validation: Evaluator repository names now validate against GitHub naming conventions (Web App)
- Billing Contracts Tab: Added a Contracts tab to the billing page alongside Invoices for viewing contract details (Web App)
- Dataset Size Limit: Enforced 1GB maximum file size for dataset uploads with clear error messaging (Web App)
- Evaluator Documentation Link: Fixed evaluator documentation links to point to the correct location (Web App)
- Session Update Fix: Fixed an issue with session state updates in the web app (Web App)
- Popover Fix: Fixed popover components nested in dialogs not displaying correctly (Web App)
- Per-Replica Status: Deployment status now shows per-replica counts (pending, downloading, initializing, ready) for better visibility into deployment progress (CLI, API)
- set-api-key Command: The
set-api-keycommand is now visible in firectl help output (CLI) - DPO Resume: Added support for resuming cancelled DPO fine-tuning jobs (API)
- Reasoning Effort Normalization:
reasoning_effortparameter now accepts boolean values in addition to strings and integers (API)
2025-12-08
DeepSeek V3.2 on Serverless, Cached Token Pricing, and New Models
☁️ Serverless
- DeepSeek V3.2 is now available on serverless
Cached Token Pricing Display
The Model Library and model detail pages now display cached and uncached input token pricing for serverless models that support prompt caching. This gives you better visibility into potential cost savings when using prompt caching with supported models.Evaluations Dashboard Improvements
The Evaluations dashboard has been enhanced with new filtering and status tracking capabilities:- Status column showing evaluator build state (Active, Building, Failed)
- Quick filters to filter evaluators and evaluation jobs by status
- Improved table layout with actions integrated into the status column
✨ New Models
- DeepSeek V3.2 is now available in the Model Library
- Ministral 3 14B Instruct 2512 is now available in the Model Library
- Ministral 3 8B Instruct 2512 is now available in the Model Library
- Ministral 3 3B Instruct 2512 is now available in the Model Library
- Mistral Large 3 675B Instruct is now available in the Model Library
- Qwen3-VL-32B-Instruct is now available in the Model Library
- Qwen3-VL-8B-Instruct is now available in the Model Library
📚 Documentation Updates
- Reranking Guide: Added documentation for using the
/rerankendpoint and/embeddingsendpoint withreturn_logitsfor reranking, including parallel batching examples (Querying Embeddings Models)
Bug Fixes & Minor Improvements
Bug Fixes & Minor Improvements
- Deployment Page Enhancements: Redesigned deployment detail page with new model header, quick actions (Playground, Go to Model), tabbed API/Info interface, collapsible code examples, and improved GPU count display (Web App)
- Login Page Redesign: New marketing panel with customer testimonials carousel, highlighted platform capabilities, and improved visual design (Web App)
- Playground CTA: Added “Deploy on Demand” button next to “Try the API” for eligible models, making it easier to deploy models directly from the playground (Web App)
- Console Navigation Icons: Updated sidebar icons with a refreshed icon set for improved visual consistency (Web App)
- Reinforcement Fine-Tuning Defaults: Changed default epochs to 1 and increased maximum inference N from 8 to 32 for rollout configuration (Web App)
- Failed Job Visibility: Training progress and loss curves now display for failed and cancelled fine-tuning jobs, helping with debugging (Web App)
- Large Dataset Uploads: Improved upload handling for large JSONL files with progress tracking and direct-to-storage uploads (Web App)
- Dataset Preview: Fixed page freeze issue when previewing datasets with very long text content (Web App)
- Loss Chart Y-Axis: Fixed y-axis scaling on loss charts to properly display the full range of values (Web App)
- Model Deletion Dialog: Improved custom model delete confirmation dialog with better validation and feedback (Web App)
- Billing Date Range: Fixed date range calculation errors on the first day of the month in billing usage views (Web App)
- Login Session: Fixed an issue where expired sessions required manual cookie clearing to log in again (Web App)
- Rollout Detail Panel: Improved rollout log viewing with resizable split panels and better log formatting (Web App)
- Checkpoint Promotion: Added validation and error messages when promoting checkpoints with missing target modules or base model (Web App)
- Model Validation: Added validation before deploying fine-tuned models to ensure the model ID is valid (Web App)
- Quota Error Message: Improved error message clarity when request quota is exceeded in the playground (Web App)
- Safari Layout: Fixed extra spacing in login page marketing panel on Safari browsers (Web App)
2025-12-01
Audit Logs, Dataset Download, Weighted Training for Reinforcement Fine-Tuning, and New Model
Audit Logs in Web App
You can now view and search audit logs directly from the Fireworks web app. The new Audit Logs page provides:- Search and filter logs by status and timeframe
- Detailed view panel for individual log entries
- Easy navigation from the console sidebar under Account settings
Dataset Download
You can now download datasets directly from the Fireworks web app. The new download functionality allows you to:- Download individual files from a dataset
- Download all files at once with “Download All”
- Access downloads from the Datasets table in the dashboard
Weighted Training for Reinforcement Fine-Tuning
Reinforcement Fine-Tuning now supports per-token and per-example weighting, giving you more control over which samples have greater influence during training. This feature mirrors the weighted training functionality already available in Supervised Fine-Tuning.See the Supervised Fine-Tuning documentation for details on the weight field format.✨ New Models
- KAT Coder is now available in the Model Library
Bug Fixes & Minor Improvements
Bug Fixes & Minor Improvements
- Console Navigation: Redesigned sidebar with organized groups (CREATE, EXPLORE, MANAGE) for easier navigation (Web App)
- Fine-Tuning Progress Display: Training progress (progress/epoch) now displays inline in the job title while Supervised Fine-Tuning jobs are running (Web App)
- Reinforcement Fine-Tuning Evaluator Selection: Evaluators that are still building are now disabled in the selector with a tooltip and status badge (Web App)
- Loss Chart Smoothing: Large datasets (>1000 metrics) now show EMA smoothing by default for improved visibility (Web App)
- Checkpoint Restore: Fixed base model resolution when promoting checkpoints from fine-tuning jobs (Web App)
- Deployment Usage Charts: Fixed usage graph display on deployment details page (Web App)
- Evaluation Job Share Links: Fixed incorrect output dataset share links and improved deep-link behavior for evaluation jobs (Web App)
- Embedding Model Deployments: Embedding models can now be deployed directly from the UI (Web App)
- Dataset Download State: Download option is now disabled for datasets that are still uploading (Web App)
- GPU Hints: Removed invalid GPU hints from region selector in deployment form (Web App)
- PEFT Model Shapes: Fixed deployment shape lookup for PEFT Addon and Live Merge models (Web App)
- LoRA Validation: Improved error messages when LoRA checkpoints are missing the required
language_model.prefix, with actionable conversion instructions (API) - Reinforcement Fine-Tuning Timeout: Extended maximum job timeout from 4 to 7 days for longer training runs (API)
- Training Job Cancellation: Added ability to cancel Supervised Fine-Tuning, DPO, and Reinforcement Fine-Tuning jobs via API (API)
- Resource Errors: Improved error messages for capacity-related issues during training and deployment (API)
2025-11-24
Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints
Improved Evaluator Creation Experience
The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now:- Fork evaluator templates directly from GitHub repositories
- Browse and preview templates before using them
- Create evaluators with a streamlined save dialog
- View evaluators in a new sortable and paginated table
MLOps & Observability Integrations
New documentation for integrating Fireworks with MLOps and observability tools:- Weights & Biases (W&B) integration for experiment tracking during fine-tuning
- MLflow integration for model management and experiment logging
✨ New Models
- Kimi K2 Thinking is now available in the Model Library
- KAT Dev 32B is now available in the Model Library
- KAT Dev 72B Exp is now available in the Model Library
☁️ Serverless
- Kimi K2 Thinking is now available on serverless
📚 New REST API Endpoints
New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments:- Create Reinforcement Fine-Tuning Step
- List Reinforcement Fine-Tuning Steps
- Get Reinforcement Fine-Tuning Step
- Delete Reinforcement Fine-Tuning Step
- Scale Deployment
- List Deployment Shape Versions
- Get Deployment Shape Version
- Get Dataset Download Endpoint
Bug Fixes & Minor Improvements
Bug Fixes & Minor Improvements
- Deployment Region Selector: Added GPU accelerator hints to the region selector, with Global set as default for optimal availability (Web App)
- Preference Fine-Tuning (DPO): Added to the Fine-Tuning page for training models with human preference data (Web App)
- Redeem Credits: Credit code redemption is now available to all users from the Billing page (Web App)
- Model Library Search: Improved fuzzy search with hybrid matching for better model discovery (Web App)
- Cogito Models: Added Cogito namespace to the Model Library for easier discovery (Web App)
- Custom Model Editing: You can now edit display name and description inline on custom model detail pages (Web App)
- Loss Curve Charts: Fixed an issue where loss curves were not updating in real-time during fine-tuning jobs (Web App)
- Deployment Shapes: Fixed deployment shape selection for fine-tuned models (PEFT and live-merge) (Web App)
- Usage Charts: Fixed replica calculation in multi-series usage charts (Web App)
- Session Management: Removed auto-logout on inactivity for improved user experience (Web App)
- Onboarding: Updated onboarding survey with improved profile and questionnaire flow (Web App)
- Fine-Tuning Form: Max context length now defaults to and is capped by the selected base model’s context length (Web App)
- Secrets for Evaluators: Added documentation for using secrets in evaluators to securely call external services (Docs)
- Region Selection: Deprecated regions are now filtered from deployment options (Web App)
- Playground: Embedding and reranker models are now filtered from playground model selection (Web App)
- LoRA Rank: Updated valid LoRA rank range to 4-32 in documentation (Docs)
- SFT Documentation: Added documentation for batch size, learning rate warmup, and gradient accumulation settings (Docs)
- Direct Routing: Added OpenAI SDK code examples for direct routing (Docs)
- Recommended Models: Updated model recommendations with migration guidance from Claude, GPT, and Gemini (Docs)
2025-11-12
☀️ Sunsetting Build SDK
The Build SDK is being deprecated in favor of a new Python SDK generated directly from our REST API. The new SDK is more up-to-date, flexible, and continuously synchronized with our REST API. Please note that the last version of the Build SDK will be0.19.20, and the new SDK will start at 1.0.0.
Python package managers will not automatically update to the new SDK, so you
will need to manually update your dependencies and refactor your code.Existing codebases using the Build SDK will continue to function as before and
will not be affected unless you choose to upgrade to the new SDK version.The new SDK replaces the Build SDK’s LLM and Dataset classes with REST
API-aligned methods. If you upgrade to version 1.0.0 or later, you will need
to migrate your code.🚀 Improved RFT Experience
We’ve drastically improved the RFT experience with better reliability, developer-friendly SDK for hooking up your existing agents, support for multi-turn training, better observability in our Web App, and better overall developer experience.See Reinforcement Fine-Tuning for more details.2025-08-22
Supervised Fine-Tuning
We now support supervised fine tuning with separate thinking traces for reasoning models (e.g. DeepSeek R1, GPT OSS, Qwen3 Thinking etc) that ensures training-inference consistency. An example including thinking traces would look like:2025-08-10
Supervised Fine-Tuning
We now support Qwen3 MoE model (Qwen3 dense models are already supported) and GPT OSS models for supervised fine-tuning. GPT OSS model fine tunning support is single-turn without thinking traces at the moment.2025-07-29
🎨 Vision-Language Model Fine-Tuning
You can now fine-tune Vision-Language Models (VLMs) on Fireworks AI using the Qwen 2.5 VL model family. This extends our Supervised Fine-tuning V2 platform to support multimodal training with both images and text data.Supported models:- Qwen 2.5 VL 3B Instruct
- Qwen 2.5 VL 7B Instruct
- Qwen 2.5 VL 32B Instruct
- Qwen 2.5 VL 72B Instruct
- Fine-tune on datasets containing both images and text in JSONL format with base64-encoded images
- Support for up to 64K context length during training
- Built on the same Supervised Fine-tuning V2 infrastructure as text models
🔧 Build SDK: Deployment Configuration Application Requirement
The Build SDK now requires you to call.apply() to apply any deployment configurations to Fireworks when using deployment_type="on-demand" or deployment_type="on-demand-lora". This change ensures explicit control over when deployments are created and helps prevent accidental deployment creation.Key changes:.apply()is now required for on-demand and on-demand-lora deployments- Serverless deployments do not require
.apply()calls - If you do not call
.apply(), you are expected to set up the deployment through the deployment page at https://app.fireworks.ai/dashboard/deployments
- Add
llm.apply()after creating LLM instances withdeployment_type="on-demand"ordeployment_type="on-demand-lora" - No changes needed for serverless deployments
- See updated documentation for examples and best practices
This applies to Python SDK version
>=0.19.14.2025-07-23
🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning
You can now develop your own custom rollout and reward functionality while using Fireworks to manage the training and deployment of your models. This gives you full control over your reinforcement learning workflows while leveraging Fireworks’ infrastructure for model training and deployment.See the new LLM.reinforcement_step() method and ReinforcementStep class for usage examples and details.2025-07-16
Supervised Fine-Tuning V2
We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only).2025-07-10
🏗️ Build SDK LLM Deployment Logic Refactor
Based on early feedback from users and internal testing, we’ve refactored the
LLM class deployment logic in the Build SDK to make it easier to understand.Key changes:- The
idparameter is now required whendeployment_typeis"on-demand" - The
base_idparameter is now required whendeployment_typeis"on-demand-lora" - The
deployment_display_nameparameter is now optional and defaults to the filename where the LLM was instantiated
id does not
exist. Otherwise, the existing deployment will be reused.2025-07-02
🚀 Support for Responses API in Python SDK
You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications.See the Responses API guide for usage examples and details.2025-07-01
Support for LinkedIn authentication
You can now log in to Fireworks using your LinkedIn account. This is useful if you already have a LinkedIn account and want to use it to log in to Fireworks.To log in with LinkedIn, go to the Fireworks login page and click the “Continue with LinkedIn” button.You can also log in with LinkedIn from the CLI using thefirectl login
command.How it works:- Fireworks uses your LinkedIn primary email address for account identification
- You can switch between different Fireworks accounts by changing your LinkedIn primary email
- See our LinkedIn authentication FAQ for detailed instructions on managing email addresses
2025-06-30
Support for GitHub authentication
You can now log in to Fireworks using your GitHub account. This is useful if you already have a GitHub account and want to use it to log in to Fireworks.To log in with GitHub, go to the Fireworks login page and click the “Continue with GitHub” button.You can also log in with GitHub from the CLI using thefirectl login
command.🚨 Document Inlining Deprecation
Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending#transform=inline to document URLs.Migration recommendations:- For image processing: Use Vision Language Models (VLMs) like Qwen2.5-VL 32B Instruct
- For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs
- For structured extraction: Leverage our structured responses capabilities
2025-06-24
🎯 Build SDK: Reward-kit integration for evaluator development
The Build SDK now natively integrates with reward-kit to simplify evaluator development for Reinforcement Fine-Tuning (RFT). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure.Key features:- Native reward-kit integration for evaluator development
- Automatic packaging of dependencies from
pyproject.tomlorrequirements.txt - Local testing capabilities before deployment
- Direct integration with Fireworks datasets and evaluation jobs
- Support for third-party libraries and complex evaluation logic
Added new Responses API for advanced conversational workflows and integrations
- Continue conversations across multiple turns using the
previous_response_idparameter to maintain context without resending full history - Stream responses in real time as they are generated for responsive applications
- Control response storage with the
storeparameter—choose whether responses are retrievable by ID or ephemeral
2025-06-13
Supervised Fine-Tuning V2
Supervised Fine-Tuning V2 released.Key features:- Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1
- Longer context window up to full context length of the supported models
- Multi-turn function calling fine-tuning
- Quantization aware training
Reinforcement Fine-Tuning (RFT)
Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in blospost.2025-05-20
2025-05-19
🚀 Easier & faster LoRA fine-tune deployments on Fireworks
You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model:- Create a deployment using
firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons - Then deploy the addon to the deployment:
firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>
This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment or deploy LoRA(s) on some Serverless models as described in the documentation.