Skip to main content
2025-12-22

Playground Categories, New User Roles, Fine-Tuning Improvements, and New Models

Playground Categories

The Playground now features category tabs (LLM, Image, TTS, STT) in the header for easier switching between model types. The playground automatically detects the appropriate category based on the selected model and provides smart defaults for each category.

User Roles: Contributor and Inference

New user roles provide more granular access control for team collaboration:
  • Contributor: Read and write access to resources without administrative privileges
  • Inference: Read-only access with the ability to run inference on deployments
Assign these roles when inviting team members to provide appropriate access levels.

Fine-Tuning Improvements

Fine-tuning workflows have been enhanced with several new capabilities:
  • Stop and Resume Jobs: Stop running fine-tuning jobs and resume them later from where they left off. Available for Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs.
  • Clone Jobs: Quickly create new fine-tuning jobs based on existing job configurations using the Clone action.
  • Download Output Datasets: Download output datasets from Reinforcement Fine-Tuning jobs, including individual files or bulk download as a ZIP archive.
  • Download Rollout Logs: Download rollout logs from Reinforcement Fine-Tuning jobs for offline analysis.

✨ New Models

📚 Documentation Updates

  • Deployment Progress Display: Deployment details page now shows live deployment progress with replica status (pending, downloading, initializing, ready) and error banners (Web App)
  • Multi-LoRA Display: Deployment details page now shows all deployed models with expandable sections, not just the default model (Web App)
  • Prompt Cache Usage Chart: Added Cached Prompt Tokens chart to the Serverless Usage page for visibility into prompt caching savings (Web App)
  • Audio Usage Charts: Audio usage metrics are now displayed in a dedicated Voice tab on the Usage page with filtering support (Web App)
  • Deployment Shape Search: Improved deployment shape discovery for models where exact base model matches aren’t found, using parameter count bucketing (Web App)
  • Vision Model Auto-Detection: Vision-language models (Qwen VL, LLaVA, Phi-3 Vision, etc.) now automatically have image input support enabled when uploaded (API)
  • Dataset Loading UX: Output dataset tables now stream results with a progress indicator for faster perceived loading (Web App)
  • File Size Limit: Dataset uploads now enforce a clear file size limit with improved error messaging (Web App)
  • Number Input Fields: Improved number input validation across forms (Web App)
  • Combobox Responsiveness: Improved combobox dropdown height on smaller screens (Web App)
  • Evaluator Editor Scroll: Prevented accidental page navigation when scrolling inside the evaluator code editor (Web App)
  • Evaluator Save Dialog: Fixed overflow issues in the save evaluator dialog (Web App)
  • Evaluator Selector Labels: Fixed label rendering in async evaluator select components (Web App)
  • Deploy Button Validation: Deploy button is now disabled when the model is not ready (Web App)
  • Model Metadata: Fixed missing model metadata display on deployment pages (Web App)
  • Invoice Display: Invoice list now shows “paid” status for contract payments (Web App)
  • Color Palette Update: Updated UI color palette for improved visual consistency (Web App)
2025-12-15

Reasoning Guide, Prompt Caching Updates, New Models and CLI Updates

Reasoning Guide

A new Reasoning guide is now available in the documentation. This comprehensive guide covers:
  • Accessing reasoning_content from thinking/reasoning models
  • Controlling reasoning effort with the reasoning_effort parameter
  • Streaming with reasoning content
  • Interleaved thinking for multi-step tool-calling workflows
The guide provides code examples using the Fireworks Python SDK and explains how to work with models that support extended reasoning capabilities.

Prompt Caching Updates

Prompt caching documentation has been updated with expanded guidance:
  • Cached prompt tokens on serverless now cost 50% less than uncached tokens
  • Session affinity routing via the user field or x-session-affinity header for improved cache hit rates
  • Prompt optimization techniques for maximizing cache efficiency
See the Prompt Caching guide for details.

✨ New Models

📚 Documentation Updates

  • Reasoning Guide: New documentation for working with reasoning models, including reasoning_content, reasoning_effort, streaming, and interleaved thinking (Reasoning)
  • Recommended Models: Updated recommendations to include DeepSeek V3.2 for code generation and Kimi K2 Thinking as a GPT-5 alternative (Recommended Models)
  • OpenAI Compatibility: Removed stop sequence documentation as Fireworks is now 1:1 compatible with OpenAI’s behavior (OpenAI Compatibility)
  • Evaluator APIs: Added REST API documentation for Evaluator and Evaluation Job CRUD operations (Evals API Reference)
  • firectl CLI Reference: Updated with new commands including cancel dpo-job, cancel supervised-fine-tuning-job, set-api-key, redeem-credit-code, and evaluator revision management
  • Audio Usage Charts: Added a Voice modality tab to the Usage page for viewing audio-specific usage metrics and charts (Web App)
  • Deployment Page Redesign: Redesigned deployment details page with inline action buttons (Edit, Delete, Enable/Disable), reordered metadata sections, and collapsible API examples (Web App)
  • Deployment API Examples: API code examples now use the deployment route as the canonical model identifier for clearer usage patterns (Web App)
  • LoRA Addons Tab: Renamed “Serverless LoRA” tab to “LoRA Addons” on the deployments dashboard for clarity (Web App)
  • Evaluator Selector: Improved evaluator selector UI to display both display name and evaluator ID for easier identification (Web App)
  • Evaluator Delete Confirmation: Added confirmation modal with success/error feedback when deleting evaluators (Web App)
  • Evaluator Code Viewer: Evaluator source files now load asynchronously, preventing browser freezes with large files (Web App)
  • Dataset Image Preview: Dataset preview now properly renders image content in message bubbles and comparison views (Web App)
  • Model Search: Improved model search accuracy by restricting results to display name and model ID matches only (Web App)
  • Fine-Tuning Progress Status: Fine-tuning job detail pages now show the initial job status immediately on page load (Web App)
  • Evaluator Test Controls: Dataset selection and pagination are now disabled while an evaluator test is running to prevent conflicts (Web App)
  • Repository Name Validation: Evaluator repository names now validate against GitHub naming conventions (Web App)
  • Billing Contracts Tab: Added a Contracts tab to the billing page alongside Invoices for viewing contract details (Web App)
  • Dataset Size Limit: Enforced 1GB maximum file size for dataset uploads with clear error messaging (Web App)
  • Evaluator Documentation Link: Fixed evaluator documentation links to point to the correct location (Web App)
  • Session Update Fix: Fixed an issue with session state updates in the web app (Web App)
  • Popover Fix: Fixed popover components nested in dialogs not displaying correctly (Web App)
  • Per-Replica Status: Deployment status now shows per-replica counts (pending, downloading, initializing, ready) for better visibility into deployment progress (CLI, API)
  • set-api-key Command: The set-api-key command is now visible in firectl help output (CLI)
  • DPO Resume: Added support for resuming cancelled DPO fine-tuning jobs (API)
  • Reasoning Effort Normalization: reasoning_effort parameter now accepts boolean values in addition to strings and integers (API)
2025-12-08

DeepSeek V3.2 on Serverless, Cached Token Pricing, and New Models

☁️ Serverless

Cached Token Pricing Display

The Model Library and model detail pages now display cached and uncached input token pricing for serverless models that support prompt caching. This gives you better visibility into potential cost savings when using prompt caching with supported models.

Evaluations Dashboard Improvements

The Evaluations dashboard has been enhanced with new filtering and status tracking capabilities:
  • Status column showing evaluator build state (Active, Building, Failed)
  • Quick filters to filter evaluators and evaluation jobs by status
  • Improved table layout with actions integrated into the status column

✨ New Models

📚 Documentation Updates

  • Reranking Guide: Added documentation for using the /rerank endpoint and /embeddings endpoint with return_logits for reranking, including parallel batching examples (Querying Embeddings Models)
  • Deployment Page Enhancements: Redesigned deployment detail page with new model header, quick actions (Playground, Go to Model), tabbed API/Info interface, collapsible code examples, and improved GPU count display (Web App)
  • Login Page Redesign: New marketing panel with customer testimonials carousel, highlighted platform capabilities, and improved visual design (Web App)
  • Playground CTA: Added “Deploy on Demand” button next to “Try the API” for eligible models, making it easier to deploy models directly from the playground (Web App)
  • Console Navigation Icons: Updated sidebar icons with a refreshed icon set for improved visual consistency (Web App)
  • Reinforcement Fine-Tuning Defaults: Changed default epochs to 1 and increased maximum inference N from 8 to 32 for rollout configuration (Web App)
  • Failed Job Visibility: Training progress and loss curves now display for failed and cancelled fine-tuning jobs, helping with debugging (Web App)
  • Large Dataset Uploads: Improved upload handling for large JSONL files with progress tracking and direct-to-storage uploads (Web App)
  • Dataset Preview: Fixed page freeze issue when previewing datasets with very long text content (Web App)
  • Loss Chart Y-Axis: Fixed y-axis scaling on loss charts to properly display the full range of values (Web App)
  • Model Deletion Dialog: Improved custom model delete confirmation dialog with better validation and feedback (Web App)
  • Billing Date Range: Fixed date range calculation errors on the first day of the month in billing usage views (Web App)
  • Login Session: Fixed an issue where expired sessions required manual cookie clearing to log in again (Web App)
  • Rollout Detail Panel: Improved rollout log viewing with resizable split panels and better log formatting (Web App)
  • Checkpoint Promotion: Added validation and error messages when promoting checkpoints with missing target modules or base model (Web App)
  • Model Validation: Added validation before deploying fine-tuned models to ensure the model ID is valid (Web App)
  • Quota Error Message: Improved error message clarity when request quota is exceeded in the playground (Web App)
  • Safari Layout: Fixed extra spacing in login page marketing panel on Safari browsers (Web App)
2025-12-01

Audit Logs, Dataset Download, Weighted Training for Reinforcement Fine-Tuning, and New Model

Audit Logs in Web App

You can now view and search audit logs directly from the Fireworks web app. The new Audit Logs page provides:
  • Search and filter logs by status and timeframe
  • Detailed view panel for individual log entries
  • Easy navigation from the console sidebar under Account settings
See the Audit Logs documentation for more information.

Dataset Download

You can now download datasets directly from the Fireworks web app. The new download functionality allows you to:
  • Download individual files from a dataset
  • Download all files at once with “Download All”
  • Access downloads from the Datasets table in the dashboard

Weighted Training for Reinforcement Fine-Tuning

Reinforcement Fine-Tuning now supports per-token and per-example weighting, giving you more control over which samples have greater influence during training. This feature mirrors the weighted training functionality already available in Supervised Fine-Tuning.See the Supervised Fine-Tuning documentation for details on the weight field format.

✨ New Models

  • KAT Coder is now available in the Model Library
  • Console Navigation: Redesigned sidebar with organized groups (CREATE, EXPLORE, MANAGE) for easier navigation (Web App)
  • Fine-Tuning Progress Display: Training progress (progress/epoch) now displays inline in the job title while Supervised Fine-Tuning jobs are running (Web App)
  • Reinforcement Fine-Tuning Evaluator Selection: Evaluators that are still building are now disabled in the selector with a tooltip and status badge (Web App)
  • Loss Chart Smoothing: Large datasets (>1000 metrics) now show EMA smoothing by default for improved visibility (Web App)
  • Checkpoint Restore: Fixed base model resolution when promoting checkpoints from fine-tuning jobs (Web App)
  • Deployment Usage Charts: Fixed usage graph display on deployment details page (Web App)
  • Evaluation Job Share Links: Fixed incorrect output dataset share links and improved deep-link behavior for evaluation jobs (Web App)
  • Embedding Model Deployments: Embedding models can now be deployed directly from the UI (Web App)
  • Dataset Download State: Download option is now disabled for datasets that are still uploading (Web App)
  • GPU Hints: Removed invalid GPU hints from region selector in deployment form (Web App)
  • PEFT Model Shapes: Fixed deployment shape lookup for PEFT Addon and Live Merge models (Web App)
  • LoRA Validation: Improved error messages when LoRA checkpoints are missing the required language_model. prefix, with actionable conversion instructions (API)
  • Reinforcement Fine-Tuning Timeout: Extended maximum job timeout from 4 to 7 days for longer training runs (API)
  • Training Job Cancellation: Added ability to cancel Supervised Fine-Tuning, DPO, and Reinforcement Fine-Tuning jobs via API (API)
  • Resource Errors: Improved error messages for capacity-related issues during training and deployment (API)
2025-11-24

Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints

Improved Evaluator Creation Experience

The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now:
  • Fork evaluator templates directly from GitHub repositories
  • Browse and preview templates before using them
  • Create evaluators with a streamlined save dialog
  • View evaluators in a new sortable and paginated table

MLOps & Observability Integrations

New documentation for integrating Fireworks with MLOps and observability tools:
  • Weights & Biases (W&B) integration for experiment tracking during fine-tuning
  • MLflow integration for model management and experiment logging

✨ New Models

☁️ Serverless

📚 New REST API Endpoints

New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments:
  • Deployment Region Selector: Added GPU accelerator hints to the region selector, with Global set as default for optimal availability (Web App)
  • Preference Fine-Tuning (DPO): Added to the Fine-Tuning page for training models with human preference data (Web App)
  • Redeem Credits: Credit code redemption is now available to all users from the Billing page (Web App)
  • Model Library Search: Improved fuzzy search with hybrid matching for better model discovery (Web App)
  • Cogito Models: Added Cogito namespace to the Model Library for easier discovery (Web App)
  • Custom Model Editing: You can now edit display name and description inline on custom model detail pages (Web App)
  • Loss Curve Charts: Fixed an issue where loss curves were not updating in real-time during fine-tuning jobs (Web App)
  • Deployment Shapes: Fixed deployment shape selection for fine-tuned models (PEFT and live-merge) (Web App)
  • Usage Charts: Fixed replica calculation in multi-series usage charts (Web App)
  • Session Management: Removed auto-logout on inactivity for improved user experience (Web App)
  • Onboarding: Updated onboarding survey with improved profile and questionnaire flow (Web App)
  • Fine-Tuning Form: Max context length now defaults to and is capped by the selected base model’s context length (Web App)
  • Secrets for Evaluators: Added documentation for using secrets in evaluators to securely call external services (Docs)
  • Region Selection: Deprecated regions are now filtered from deployment options (Web App)
  • Playground: Embedding and reranker models are now filtered from playground model selection (Web App)
  • LoRA Rank: Updated valid LoRA rank range to 4-32 in documentation (Docs)
  • SFT Documentation: Added documentation for batch size, learning rate warmup, and gradient accumulation settings (Docs)
  • Direct Routing: Added OpenAI SDK code examples for direct routing (Docs)
  • Recommended Models: Updated model recommendations with migration guidance from Claude, GPT, and Gemini (Docs)
2025-11-12

☀️ Sunsetting Build SDK

The Build SDK is being deprecated in favor of a new Python SDK generated directly from our REST API. The new SDK is more up-to-date, flexible, and continuously synchronized with our REST API. Please note that the last version of the Build SDK will be 0.19.20, and the new SDK will start at 1.0.0. Python package managers will not automatically update to the new SDK, so you will need to manually update your dependencies and refactor your code.Existing codebases using the Build SDK will continue to function as before and will not be affected unless you choose to upgrade to the new SDK version.The new SDK replaces the Build SDK’s LLM and Dataset classes with REST API-aligned methods. If you upgrade to version 1.0.0 or later, you will need to migrate your code.

🚀 Improved RFT Experience

We’ve drastically improved the RFT experience with better reliability, developer-friendly SDK for hooking up your existing agents, support for multi-turn training, better observability in our Web App, and better overall developer experience.See Reinforcement Fine-Tuning for more details.
2025-08-22

Supervised Fine-Tuning

We now support supervised fine tuning with separate thinking traces for reasoning models (e.g. DeepSeek R1, GPT OSS, Qwen3 Thinking etc) that ensures training-inference consistency. An example including thinking traces would look like:
  {
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}, 
      {"role": "assistant", "content": "Paris.", "reasoning_content": "The user is asking about the capital city of France, it should be Paris."}
    ]
  }
  {
    "messages": [
      {"role": "user", "content": "What is 1+1?"},
      {"role": "assistant", "content": "2", "weight": 0, "reasoning_content": "The user is asking about the result of 1+1, the answer is 2."},
      {"role": "user", "content": "Now what is 2+2?"},
      {"role": "assistant", "content": "4", "reasoning_content": "The user is asking about the result of 2+2, the answer should be 4."}
    ]
  }
We are also properly supporting multi-turn fine tuning (with or without thinking traces) for GPT OSS model family that ensures training-inference consistency.
2025-08-10

Supervised Fine-Tuning

We now support Qwen3 MoE model (Qwen3 dense models are already supported) and GPT OSS models for supervised fine-tuning. GPT OSS model fine tunning support is single-turn without thinking traces at the moment.
2025-07-29

🎨 Vision-Language Model Fine-Tuning

You can now fine-tune Vision-Language Models (VLMs) on Fireworks AI using the Qwen 2.5 VL model family. This extends our Supervised Fine-tuning V2 platform to support multimodal training with both images and text data.Supported models:
  • Qwen 2.5 VL 3B Instruct
  • Qwen 2.5 VL 7B Instruct
  • Qwen 2.5 VL 32B Instruct
  • Qwen 2.5 VL 72B Instruct
Features:
  • Fine-tune on datasets containing both images and text in JSONL format with base64-encoded images
  • Support for up to 64K context length during training
  • Built on the same Supervised Fine-tuning V2 infrastructure as text models
See the VLM fine-tuning documentation for setup instructions and dataset formatting requirements.

🔧 Build SDK: Deployment Configuration Application Requirement

The Build SDK now requires you to call .apply() to apply any deployment configurations to Fireworks when using deployment_type="on-demand" or deployment_type="on-demand-lora". This change ensures explicit control over when deployments are created and helps prevent accidental deployment creation.Key changes:
  • .apply() is now required for on-demand and on-demand-lora deployments
  • Serverless deployments do not require .apply() calls
  • If you do not call .apply(), you are expected to set up the deployment through the deployment page at https://app.fireworks.ai/dashboard/deployments
Migration guide:
  • Add llm.apply() after creating LLM instances with deployment_type="on-demand" or deployment_type="on-demand-lora"
  • No changes needed for serverless deployments
  • See updated documentation for examples and best practices
This change improves deployment management and provides better control over resource creation.
This applies to Python SDK version >=0.19.14.
2025-07-23

🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning

You can now develop your own custom rollout and reward functionality while using Fireworks to manage the training and deployment of your models. This gives you full control over your reinforcement learning workflows while leveraging Fireworks’ infrastructure for model training and deployment.See the new LLM.reinforcement_step() method and ReinforcementStep class for usage examples and details.
2025-07-16

Supervised Fine-Tuning V2

We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only).
2025-07-10

🏗️ Build SDK LLM Deployment Logic Refactor

Based on early feedback from users and internal testing, we’ve refactored the LLM class deployment logic in the Build SDK to make it easier to understand.Key changes:
  • The id parameter is now required when deployment_type is "on-demand"
  • The base_id parameter is now required when deployment_type is "on-demand-lora"
  • The deployment_display_name parameter is now optional and defaults to the filename where the LLM was instantiated
A new deployment will be created if a deployment with the same id does not exist. Otherwise, the existing deployment will be reused.
2025-07-02

🚀 Support for Responses API in Python SDK

You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications.See the Responses API guide for usage examples and details.
2025-07-01

Support for LinkedIn authentication

You can now log in to Fireworks using your LinkedIn account. This is useful if you already have a LinkedIn account and want to use it to log in to Fireworks.To log in with LinkedIn, go to the Fireworks login page and click the “Continue with LinkedIn” button.You can also log in with LinkedIn from the CLI using the firectl login command.How it works:
  • Fireworks uses your LinkedIn primary email address for account identification
  • You can switch between different Fireworks accounts by changing your LinkedIn primary email
  • See our LinkedIn authentication FAQ for detailed instructions on managing email addresses
2025-06-30

Support for GitHub authentication

You can now log in to Fireworks using your GitHub account. This is useful if you already have a GitHub account and want to use it to log in to Fireworks.To log in with GitHub, go to the Fireworks login page and click the “Continue with GitHub” button.You can also log in with GitHub from the CLI using the firectl login command.

🚨 Document Inlining Deprecation

Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending #transform=inline to document URLs.Migration recommendations:
  • For image processing: Use Vision Language Models (VLMs) like Qwen2.5-VL 32B Instruct
  • For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs
  • For structured extraction: Leverage our structured responses capabilities
For assistance with migration, please contact our support team or visit our Discord community.
2025-06-24

🎯 Build SDK: Reward-kit integration for evaluator development

The Build SDK now natively integrates with reward-kit to simplify evaluator development for Reinforcement Fine-Tuning (RFT). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure.Key features:
  • Native reward-kit integration for evaluator development
  • Automatic packaging of dependencies from pyproject.toml or requirements.txt
  • Local testing capabilities before deployment
  • Direct integration with Fireworks datasets and evaluation jobs
  • Support for third-party libraries and complex evaluation logic
See our Developing Evaluators guide to get started with your first evaluator in minutes.

Added new Responses API for advanced conversational workflows and integrations

  • Continue conversations across multiple turns using the previous_response_id parameter to maintain context without resending full history
  • Stream responses in real time as they are generated for responsive applications
  • Control response storage with the store parameter—choose whether responses are retrievable by ID or ephemeral
See the Response API guide for usage examples and details.
2025-06-13

Supervised Fine-Tuning V2

Supervised Fine-Tuning V2 released.Key features:
  • Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1
  • Longer context window up to full context length of the supported models
  • Multi-turn function calling fine-tuning
  • Quantization aware training
More details in the blogpost.

Reinforcement Fine-Tuning (RFT)

Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in blospost.
2025-05-20

Diarization and batch processing support added to audio inference

See our blog post for details.
2025-05-19

🚀 Easier & faster LoRA fine-tune deployments on Fireworks

You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of lora model>"
Previously, this involved two distinct steps, and the resulting deployment was slower than the base model:
  1. Create a deployment using firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
  2. Then deploy the addon to the deployment: firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>
For more information, see our deployment documentation.
This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment or deploy LoRA(s) on some Serverless models as described in the documentation.