> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Changelog

<Update label="2026-05-14">
  # Serverless deprecation: legacy models removed May 14, 2026

  Several legacy serverless models will be decommissioned on **May 14, 2026** to make room for newer, higher-performance releases. This applies **only to serverless** usage; **dedicated deployments are unaffected**.

  ## **Action required**

  If you use any of the models below on serverless, migrate to a recommended replacement **before May 14, 2026**. After that date, they will no longer be available via serverless endpoints.

  ## **Serverless migration guide**

  * **DeepSeek V3.1** — migrate to **[Kimi K2.6](https://app.fireworks.ai/models/fireworks/kimi-k2p6)** or **[GLM 5.1](https://app.fireworks.ai/models/fireworks/glm-5p1)**
  * **DeepSeek V3.2** — migrate to **[Kimi K2.6](https://app.fireworks.ai/models/fireworks/kimi-k2p6)** or **[GLM 5.1](https://app.fireworks.ai/models/fireworks/glm-5p1)**
  * **GLM 4.7** — migrate to **[GLM 5.1](https://app.fireworks.ai/models/fireworks/glm-5p1)**
  * **GLM 5** — migrate to **[GLM 5.1](https://app.fireworks.ai/models/fireworks/glm-5p1)**
  * **Qwen3 8B** — migrate to **[GPT-OSS 20B](https://app.fireworks.ai/models/fireworks/gpt-oss-20b)**
  * **Qwen3 VL 30B A3B Thinking** — migrate to **[Kimi K2.6](https://app.fireworks.ai/models/fireworks/kimi-k2p6)**
  * **Qwen3 VL 30B A3B Instruct** — migrate to **[Kimi K2.6](https://app.fireworks.ai/models/fireworks/kimi-k2p6)**
  * **Llama 3.3 70B Instruct** — migrate to **[GPT-OSS 120B](https://app.fireworks.ai/models/fireworks/gpt-oss-120b)**

  We are continuing to invest in faster, more capable models and appreciate your support during this transition.
</Update>

<Update label="2026-02-05">
  # Video & Audio Models, AWS S3 Training Integration, and SSO Improvements

  ## **Video & Audio Input Models**

  You can now query multimodal models with video and audio inputs for video captioning, scene analysis, and multimodal question answering. Deploy models like Qwen3 Omni and Molmo2 to process video and audio content directly using the Chat Completions API.

  See the [Video & Audio Inputs guide](/guides/video-audio-inputs) for deployment instructions and code examples.

  ## **AWS S3 Integration for Training Datasets**

  Training datasets can now be stored in your own AWS S3 buckets using GCP-to-AWS OIDC federation. This Bring Your Own Bucket (BYOB) approach keeps your data private while enabling secure access during Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs—no long-lived credentials required.

  See the [Secure Training (BYOB) documentation](/fine-tuning/secure-fine-tuning) for IAM role setup and usage examples.

  ## **Just-In-Time (JIT) User Provisioning for SSO** (Enterprise)

  JIT user provisioning automatically creates user accounts when users sign in through SSO for the first time. Enable this when configuring your identity provider to eliminate manual user creation.

  See the [SSO documentation](/accounts/sso) for setup instructions.

  ## 📚 Documentation Updates

  * **Video & Audio Inputs:** New guide for processing video and audio with Qwen3 Omni and Molmo2 models ([Video & Audio Inputs](/guides/video-audio-inputs))
  * **AWS S3 Bucket Integration:** BYOB dataset storage for training via OIDC federation ([Secure Training](/fine-tuning/secure-fine-tuning))
  * **Rate Limits Clarification:** Expanded documentation on adaptive serverless limits, upper bounds, and monitoring ([Serverless rate limits](/serverless/rate-limits))
  * **Anthropic-Compatible Thinking Parameter:** Control reasoning with `thinking` parameter alongside `reasoning_effort` ([Reasoning](/guides/reasoning))
  * **Scaling from Zero Behavior:** Deployments scaled to zero return `503` immediately with retry guidance ([Autoscaling](/deployments/autoscaling))
  * **Custom Model Configuration:** Generation defaults via `fireworks.json` for LoRA adapters ([Uploading Custom Models](/models/uploading-custom-models))
  * **Service Account Roles:** Role assignment and management for service accounts ([Service Accounts](/accounts/service-accounts))
  * **Free RFT:** Documentation highlighting free Reinforcement Fine-Tuning for models under 16B parameters
  * **Serverless Model Retrieval:** Python SDK and curl examples for programmatically listing serverless models ([FAQ](/faq-new/models-inference/how-to-check-if-a-model-is-available-on-serverless))
  * **Function Tools in Responses API:** Client-executed function tools following OpenAI-compatible format ([Responses API](/guides/response-api))
  * **Autoscaling Metric:** New `prompt_tokens_per_second` load target for prefill-heavy workloads ([Autoscaling](/deployments/autoscaling))
  * **firectl CLI Syntax:** All documentation updated to `firectl <noun> <verb>` syntax

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Model Library Enhancements:** Skeleton loader for faster perceived loading, new filter categories for Embeddings, Reranks, and Agents (Web App)
    - **Service Account Roles:** Role column and role assignment in the Users management page (Web App)
    - **JIT User Provisioning Toggle:** Enable JIT provisioning when configuring SSO identity providers (Web App, Enterprise)
    - **Autoscaling Configuration:** Autoscaling metric and threshold settings in deployment creation form (Web App)
    - **RFT Chunk Size:** Configurable chunk size setting with tooltips for Reinforcement Fine-Tuning creation (Web App)
    - **Batch Inference Form:** Redesigned step-based form flow for batch inference job creation (Web App)
    - **Full Precision Shape:** "Full Precision" deployment shape option displayed in deployment configuration (Web App)
    - **Batch Inference Error Banner:** Error banners now display on batch inference job detail pages (Web App)
    - **Audit Log Resource Column:** Improved readability with tooltip for long resource names (Web App)
    - **API Key Error Messages:** Clearer permission error messages when users lack API key management access (Web App)
    - **Fine-Tuning Cost Display:** Fixed estimated cost display for fine-tuning jobs (Web App)
    - **Batch Inference Max Tokens:** Fallback max value increased to 128,000 tokens (Web App)
    - **Playground Model Select:** Improved performance for model selection dropdown (Web App)
    - **Embedding Model Shapes:** Deployment shapes correctly filtered by embedding model type (Web App)
    - **Audit Logs Command:** New `firectl audit-logs list` command for querying audit logs (CLI)
  </Accordion>
</Update>

<Update label="2026-01-20">
  # Warm-Start Training and Azure Model Uploads

  ## **Warm-Start Training for Reinforcement Fine-Tuning**

  You can now warm-start Reinforcement Fine-Tuning jobs from previously supervised fine-tuned checkpoints using the `--warm-start-from` flag. This enables a streamlined SFT-to-RFT workflow where you first train a model with supervised fine-tuning, then continue training with reinforcement learning.

  See the [Warm-Start Training guide](/fine-tuning/warm-start) for details.

  ## **Azure Federated Identity for Model Uploads**

  Model uploads from Azure Blob Storage now support Azure AD federated identity authentication as an alternative to SAS tokens. This eliminates the need for credential rotation and enables secure, credential-less authentication.

  See the [Uploading Custom Models documentation](/models/uploading-custom-models) for setup instructions.

  ## 📚 Documentation Updates

  * **Warm-Start Training:** New guide for SFT-to-RFT workflows ([Warm-Start Training](/fine-tuning/warm-start))
  * **Azure Federated Identity:** Setup instructions for Azure AD authentication ([Uploading Custom Models](/models/uploading-custom-models))
  * **Preserved Thinking:** Multi-turn reasoning with preserved thinking context ([Reasoning](/guides/reasoning))
  * **GLM 4.7:** Added to models supporting `reasoning_effort` parameter

  <Accordion title="Bug Fixes & Minor Improvements">
    - **RFT Cost Display:** Reinforcement Fine-Tuning job pages now show approximate final cost (Web App)
    - **GPU Information:** Deployments table displays GPU type and count (Web App)
    - **DPO Job Resume:** Preference Fine-Tuning jobs can now be resumed after stopping (Web App, API)
    - **Free Tuning Filter:** New filter in RFT model selector for free-to-fine-tune models (Web App)
    - **Playground Inputs:** Editable number inputs for temperature, top\_p, and other parameters (Web App)
    - **Clone Fine-Tuning Jobs:** Fixed field population when cloning jobs (Web App)
    - **Evaluation Job Errors:** Errors now display with alert banners (Web App)
    - **Invoice CSV Export:** Improved download experience (Web App)
    - **Playground Validation:** Prevents querying deployments with 0 replicas (Web App)
    - **List Models Filter:** `firectl model list` supports `--name` and `--public-only` flags (CLI)
    - **RFT Concurrency:** New `--max-concurrent-rollouts` and `--max-concurrent-evaluations` flags (CLI)
  </Accordion>
</Update>

<Update label="2025-12-22">
  # Playground Categories, New User Roles, Fine-Tuning Improvements, and New Models

  ## **Playground Categories**

  The Playground now features category tabs (LLM, Image, TTS, STT) in the header for easier switching between model types. The playground automatically detects the appropriate category based on the selected model and provides smart defaults for each category.

  ## **User Roles: Contributor and Inference**

  New user roles provide more granular access control for team collaboration:

  * **Contributor**: Read and write access to resources without administrative privileges
  * **Inference**: Read-only access with the ability to run inference on deployments

  Assign these roles when inviting team members to provide appropriate access levels.

  ## **Fine-Tuning Improvements**

  Fine-tuning workflows have been enhanced with several new capabilities:

  * **Stop and Resume Jobs**: Stop running fine-tuning jobs and resume them later from where they left off. Available for Supervised Fine-Tuning and Reinforcement Fine-Tuning jobs.
  * **Clone Jobs**: Quickly create new fine-tuning jobs based on existing job configurations using the Clone action.
  * **Download Output Datasets**: Download output datasets from Reinforcement Fine-Tuning jobs, including individual files or bulk download as a ZIP archive.
  * **Download Rollout Logs**: Download rollout logs from Reinforcement Fine-Tuning jobs for offline analysis.

  ## ✨ New Models

  * **[Gemma 3 12B Instruct](https://app.fireworks.ai/models/fireworks/gemma-3-12b-it)** is now available in the Model Library
  * **[Gemma 3 4B Instruct](https://app.fireworks.ai/models/fireworks/gemma-3-4b-it)** is now available in the Model Library
  * **[Qwen3 Omni 30B A3B Instruct](https://app.fireworks.ai/models/fireworks/qwen3-omni-30b-a3b-instruct)** is now available in the Model Library

  ## 📚 Documentation Updates

  * **Deployment Shapes API:** Added [List Deployment Shape Versions](/api-reference/list-deployment-shape-versions) and [Get Deployment Shape](/api-reference/get-deployment-shape) endpoints for querying available deployment shapes
  * **Evaluator APIs:** Added [Create Evaluator](/api-reference/create-evaluator), [Update Evaluator](/api-reference/update-evaluator), and helper endpoints for evaluator source code, build logs, and upload validation
  * **Fine-Tuning APIs:** Added [Resume DPO Job](/api-reference/resume-dpo-job), [Resume Reinforcement Fine-Tuning Step](/api-reference/resume-reinforcement-fine-tuning-step), [Execute Reinforcement Fine-Tuning Step](/api-reference/execute-reinforcement-fine-tuning-step), and [Get Evaluation Job Log Endpoint](/api-reference/get-evaluation-job-log-endpoint)
  * **SDK Examples:** Added Python SDK example links for direct routing and [supervised fine-tuning](/fine-tuning/fine-tuning-models) workflows

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Deployment Progress Display:** Deployment details page now shows live deployment progress with replica status (pending, downloading, initializing, ready) and error banners (Web App)
    - **Multi-LoRA Display:** Deployment details page now shows all deployed models with expandable sections, not just the default model (Web App)
    - **Audio Usage Charts:** Audio usage metrics are now displayed in a dedicated Voice tab on the Usage page with filtering support (Web App)
    - **Deployment Shape Search:** Improved deployment shape discovery for models where exact base model matches aren't found, using parameter count bucketing (Web App)
    - **Vision Model Auto-Detection:** Vision-language models (Qwen VL, LLaVA, Phi-3 Vision, etc.) now automatically have image input support enabled when uploaded (API)
    - **Dataset Loading UX:** Output dataset tables now stream results with a progress indicator for faster perceived loading (Web App)
    - **File Size Limit:** Dataset uploads now enforce a clear file size limit with improved error messaging (Web App)
    - **Number Input Fields:** Improved number input validation across forms (Web App)
    - **Combobox Responsiveness:** Improved combobox dropdown height on smaller screens (Web App)
    - **Evaluator Editor Scroll:** Prevented accidental page navigation when scrolling inside the evaluator code editor (Web App)
    - **Evaluator Save Dialog:** Fixed overflow issues in the save evaluator dialog (Web App)
    - **Evaluator Selector Labels:** Fixed label rendering in async evaluator select components (Web App)
    - **Deploy Button Validation:** Deploy button is now disabled when the model is not ready (Web App)
    - **Model Metadata:** Fixed missing model metadata display on deployment pages (Web App)
    - **Invoice Display:** Invoice list now shows "paid" status for contract payments (Web App)
    - **Color Palette Update:** Updated UI color palette for improved visual consistency (Web App)
  </Accordion>
</Update>

<Update label="2025-12-15">
  # Reasoning Guide, Prompt Caching Updates, New Models and CLI Updates

  ## **Reasoning Guide**

  A new [Reasoning guide](/guides/reasoning) is now available in the documentation. This comprehensive guide covers:

  * Accessing `reasoning_content` from thinking/reasoning models
  * Controlling reasoning effort with the `reasoning_effort` parameter
  * Streaming with reasoning content
  * Interleaved thinking for multi-step tool-calling workflows

  The guide provides code examples using the Fireworks Python SDK and explains how to work with models that support extended reasoning capabilities.

  ## **Prompt Caching Updates**

  Prompt caching documentation has been updated with expanded guidance:

  * Cached prompt tokens on serverless now cost 50% less than uncached tokens
  * Session affinity routing via the `user` field or `x-session-affinity` header for improved cache hit rates
  * Prompt optimization techniques for maximizing cache efficiency

  See the [Prompt Caching guide](/guides/prompt-caching) for details.

  ## ✨ New Models

  * **[Devstral Small 2 24B Instruct 2512](https://app.fireworks.ai/models/fireworks/devstral-small-2-24b-instruct-2512)** is now available in the Model Library
  * **[NVIDIA Nemotron Nano 3 30B A3B](https://app.fireworks.ai/models/fireworks/nemotron-nano-3-30b-a3b)** is now available in the Model Library

  ## 📚 Documentation Updates

  * **Reasoning Guide:** New documentation for working with reasoning models, including `reasoning_content`, `reasoning_effort`, streaming, and interleaved thinking ([Reasoning](/guides/reasoning))
  * **Recommended Models:** Updated recommendations to include DeepSeek V3.2 for code generation and Kimi K2 Thinking as a GPT-5 alternative ([Recommended Models](/guides/recommended-models))
  * **OpenAI Compatibility:** Removed `stop` sequence documentation as Fireworks is now 1:1 compatible with OpenAI's behavior ([OpenAI Compatibility](/tools-sdks/openai-compatibility))
  * **Evaluator APIs:** Added REST API documentation for Evaluator and Evaluation Job CRUD operations ([Evals API Reference](/api-reference/list-evaluators))
  * **firectl CLI Reference:** Updated with new commands including `cancel dpo-job`, `cancel supervised-fine-tuning-job`, `set-api-key`, `redeem-credit-code`, and evaluator revision management

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Audio Usage Charts:** Added a Voice modality tab to the Usage page for viewing audio-specific usage metrics and charts (Web App)
    - **Deployment Page Redesign:** Redesigned deployment details page with inline action buttons (Edit, Delete, Enable/Disable), reordered metadata sections, and collapsible API examples (Web App)
    - **Deployment API Examples:** API code examples now use the deployment route as the canonical model identifier for clearer usage patterns (Web App)
    - **LoRA Addons Tab:** Renamed "Serverless LoRA" tab to "LoRA Addons" on the deployments dashboard for clarity (Web App)
    - **Evaluator Selector:** Improved evaluator selector UI to display both display name and evaluator ID for easier identification (Web App)
    - **Evaluator Delete Confirmation:** Added confirmation modal with success/error feedback when deleting evaluators (Web App)
    - **Evaluator Code Viewer:** Evaluator source files now load asynchronously, preventing browser freezes with large files (Web App)
    - **Dataset Image Preview:** Dataset preview now properly renders image content in message bubbles and comparison views (Web App)
    - **Model Search:** Improved model search accuracy by restricting results to display name and model ID matches only (Web App)
    - **Fine-Tuning Progress Status:** Fine-tuning job detail pages now show the initial job status immediately on page load (Web App)
    - **Evaluator Test Controls:** Dataset selection and pagination are now disabled while an evaluator test is running to prevent conflicts (Web App)
    - **Repository Name Validation:** Evaluator repository names now validate against GitHub naming conventions (Web App)
    - **Billing Contracts Tab:** Added a Contracts tab to the billing page alongside Invoices for viewing contract details (Web App)
    - **Dataset Size Limit:** Enforced 1GB maximum file size for dataset uploads with clear error messaging (Web App)
    - **Evaluator Documentation Link:** Fixed evaluator documentation links to point to the correct location (Web App)
    - **Session Update Fix:** Fixed an issue with session state updates in the web app (Web App)
    - **Popover Fix:** Fixed popover components nested in dialogs not displaying correctly (Web App)
    - **Per-Replica Status:** Deployment status now shows per-replica counts (pending, downloading, initializing, ready) for better visibility into deployment progress (CLI, API)
    - **set-api-key Command:** The `set-api-key` command is now visible in firectl help output (CLI)
    - **DPO Resume:** Added support for resuming cancelled DPO fine-tuning jobs (API)
    - **Reasoning Effort Normalization:** `reasoning_effort` parameter now accepts boolean values in addition to strings and integers (API)
  </Accordion>
</Update>

<Update label="2025-12-08">
  # DeepSeek V3.2 on Serverless, Cached Token Pricing, and New Models

  ## ☁️ Serverless

  * **[DeepSeek V3.2](https://app.fireworks.ai/models/fireworks/deepseek-v3p2)** is now available on serverless

  ## **Cached Token Pricing Display**

  The Model Library and model detail pages now display cached and uncached input
  token pricing for serverless models that support prompt caching. This gives you
  better visibility into potential cost savings when using prompt caching with
  supported models.

  ## **Evaluations Dashboard Improvements**

  The Evaluations dashboard has been enhanced with new filtering and status tracking capabilities:

  * Status column showing evaluator build state (Active, Building, Failed)
  * Quick filters to filter evaluators and evaluation jobs by status
  * Improved table layout with actions integrated into the status column

  ## ✨ New Models

  * **[DeepSeek V3.2](https://app.fireworks.ai/models/fireworks/deepseek-v3p2)** is now available in the Model Library
  * **[Ministral 3 14B Instruct 2512](https://app.fireworks.ai/models/fireworks/ministral-3-14b-instruct-2512)** is now available in the Model Library
  * **[Ministral 3 8B Instruct 2512](https://app.fireworks.ai/models/fireworks/ministral-3-8b-instruct-2512)** is now available in the Model Library
  * **[Ministral 3 3B Instruct 2512](https://app.fireworks.ai/models/fireworks/ministral-3-3b-instruct-2512)** is now available in the Model Library
  * **[Mistral Large 3 675B Instruct](https://app.fireworks.ai/models/fireworks/mistral-large-3-fp8)** is now available in the Model Library
  * **[Qwen3-VL-32B-Instruct](https://app.fireworks.ai/models/fireworks/qwen3-vl-32b-instruct)** is now available in the Model Library
  * **[Qwen3-VL-8B-Instruct](https://app.fireworks.ai/models/fireworks/qwen3-vl-8b-instruct)** is now available in the Model Library

  ## 📚 Documentation Updates

  * **Reranking Guide:** Added documentation for using the `/rerank` endpoint and `/embeddings` endpoint with `return_logits` for reranking, including parallel batching examples ([Querying Embeddings Models](/guides/querying-embeddings-models))

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Deployment Page Enhancements:** Redesigned deployment detail page with new model header, quick actions (Playground, Go to Model), tabbed API/Info interface, collapsible code examples, and improved GPU count display (Web App)
    - **Login Page Redesign:** New marketing panel with customer testimonials carousel, highlighted platform capabilities, and improved visual design (Web App)
    - **Playground CTA:** Added "Deploy on Demand" button next to "Try the API" for eligible models, making it easier to deploy models directly from the playground (Web App)
    - **Console Navigation Icons:** Updated sidebar icons with a refreshed icon set for improved visual consistency (Web App)
    - **Reinforcement Fine-Tuning Defaults:** Changed default epochs to 1 and increased maximum inference N from 8 to 32 for rollout configuration (Web App)
    - **Failed Job Visibility:** Training progress and loss curves now display for failed and cancelled fine-tuning jobs, helping with debugging (Web App)
    - **Large Dataset Uploads:** Improved upload handling for large JSONL files with progress tracking and direct-to-storage uploads (Web App)
    - **Dataset Preview:** Fixed page freeze issue when previewing datasets with very long text content (Web App)
    - **Loss Chart Y-Axis:** Fixed y-axis scaling on loss charts to properly display the full range of values (Web App)
    - **Model Deletion Dialog:** Improved custom model delete confirmation dialog with better validation and feedback (Web App)
    - **Billing Date Range:** Fixed date range calculation errors on the first day of the month in billing usage views (Web App)
    - **Login Session:** Fixed an issue where expired sessions required manual cookie clearing to log in again (Web App)
    - **Rollout Detail Panel:** Improved rollout log viewing with resizable split panels and better log formatting (Web App)
    - **Checkpoint Promotion:** Added validation and error messages when promoting checkpoints with missing target modules or base model (Web App)
    - **Model Validation:** Added validation before deploying fine-tuned models to ensure the model ID is valid (Web App)
    - **Quota Error Message:** Improved error message clarity when request quota is exceeded in the playground (Web App)
    - **Safari Layout:** Fixed extra spacing in login page marketing panel on Safari browsers (Web App)
  </Accordion>
</Update>

<Update label="2025-12-01">
  # Audit Logs, Dataset Download, Weighted Training for Reinforcement Fine-Tuning, and New Model

  ## **Audit Logs in Web App**

  You can now view and search audit logs directly from the Fireworks web app. The new Audit Logs page provides:

  * Search and filter logs by status and timeframe
  * Detailed view panel for individual log entries
  * Easy navigation from the console sidebar under Account settings

  See the [Audit Logs documentation](/guides/security_compliance/audit_logs) for more information.

  ## **Dataset Download**

  You can now download datasets directly from the Fireworks web app. The new download functionality allows you to:

  * Download individual files from a dataset
  * Download all files at once with "Download All"
  * Access downloads from the Datasets table in the dashboard

  ## **Weighted Training for Reinforcement Fine-Tuning**

  Reinforcement Fine-Tuning now supports per-example weighting, giving you more control over which samples have greater influence during training. This feature mirrors the weighted training functionality already available in Supervised Fine-Tuning.

  See the [Weighted Training documentation](/fine-tuning/weighted-training) for details on the weight field format.

  ## ✨ New Models

  * **[KAT Coder](https://app.fireworks.ai/models/fireworks/kat-coder)** is now available in the Model Library

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Console Navigation:** Redesigned sidebar with organized groups (CREATE, EXPLORE, MANAGE) for easier navigation (Web App)
    - **Fine-Tuning Progress Display:** Training progress (progress/epoch) now displays inline in the job title while Supervised Fine-Tuning jobs are running (Web App)
    - **Reinforcement Fine-Tuning Evaluator Selection:** Evaluators that are still building are now disabled in the selector with a tooltip and status badge (Web App)
    - **Loss Chart Smoothing:** Large datasets (>1000 metrics) now show EMA smoothing by default for improved visibility (Web App)
    - **Checkpoint Restore:** Fixed base model resolution when promoting checkpoints from fine-tuning jobs (Web App)
    - **Deployment Usage Charts:** Fixed usage graph display on deployment details page (Web App)
    - **Evaluation Job Share Links:** Fixed incorrect output dataset share links and improved deep-link behavior for evaluation jobs (Web App)
    - **Embedding Model Deployments:** Embedding models can now be deployed directly from the UI (Web App)
    - **Dataset Download State:** Download option is now disabled for datasets that are still uploading (Web App)
    - **GPU Hints:** Removed invalid GPU hints from region selector in deployment form (Web App)
    - **PEFT Model Shapes:** Fixed deployment shape lookup for PEFT Addon and Live Merge models (Web App)
    - **LoRA Validation:** Improved error messages when LoRA checkpoints are missing the required `language_model.` prefix, with actionable conversion instructions (API)
    - **Reinforcement Fine-Tuning Timeout:** Extended maximum job timeout from 4 to 7 days for longer training runs (API)
    - **Training Job Cancellation:** Added ability to cancel Supervised Fine-Tuning, DPO, and Reinforcement Fine-Tuning jobs via API (API)
    - **Resource Errors:** Improved error messages for capacity-related issues during training and deployment (API)
  </Accordion>
</Update>

<Update label="2025-11-24">
  # Evaluator Improvements, Kimi K2 Thinking on Serverless, and New API Endpoints

  ## **Improved Evaluator Creation Experience**

  The evaluator creation workflow has been significantly enhanced with GitHub template integration. You can now:

  * Fork evaluator templates directly from GitHub repositories
  * Browse and preview templates before using them
  * Create evaluators with a streamlined save dialog
  * View evaluators in a new sortable and paginated table

  ## **MLOps & Observability Integrations**

  New documentation for integrating Fireworks with MLOps and observability tools:

  * [Weights & Biases (W\&B)](/ecosystem/integrations/wandb) integration for experiment tracking during fine-tuning
  * MLflow integration for model management and experiment logging

  ## ✨ New Models

  * **[Kimi K2 Thinking](https://app.fireworks.ai/models/fireworks/kimi-k2-thinking)** is now available in the Model Library
  * **[KAT Dev 32B](https://app.fireworks.ai/models/fireworks/kat-dev-32b)** is now available in the Model Library
  * **[KAT Dev 72B Exp](https://app.fireworks.ai/models/fireworks/kat-dev-72b-exp)** is now available in the Model Library

  ## ☁️ Serverless

  * **[Kimi K2 Thinking](https://app.fireworks.ai/models/fireworks/kimi-k2-thinking)** is now available on serverless

  ## 📚 New REST API Endpoints

  New REST API endpoints are now available for managing Reinforcement Fine-Tuning Steps and deployments:

  * [Create Reinforcement Fine-Tuning Step](/api-reference/create-reinforcement-fine-tuning-step)
  * [List Reinforcement Fine-Tuning Steps](/api-reference/list-reinforcement-fine-tuning-steps)
  * [Get Reinforcement Fine-Tuning Step](/api-reference/get-reinforcement-fine-tuning-step)
  * [Delete Reinforcement Fine-Tuning Step](/api-reference/delete-reinforcement-fine-tuning-step)
  * [Scale Deployment](/api-reference/scale-deployment)
  * [List Deployment Shape Versions](/api-reference/list-deployment-shape-versions)
  * [Get Deployment Shape Version](/api-reference/get-deployment-shape-version)
  * [Get Dataset Download Endpoint](/api-reference/get-dataset-download-endpoint)

  <Accordion title="Bug Fixes & Minor Improvements">
    - **Deployment Region Selector:** Added GPU accelerator hints to the region selector, with Global set as default for optimal availability (Web App)
    - **Preference Fine-Tuning (DPO):** Added to the Fine-Tuning page for training models with human preference data (Web App)
    - **Redeem Credits:** Credit code redemption is now available to all users from the Billing page (Web App)
    - **Model Library Search:** Improved fuzzy search with hybrid matching for better model discovery (Web App)
    - **Cogito Models:** Added Cogito namespace to the Model Library for easier discovery (Web App)
    - **Custom Model Editing:** You can now edit display name and description inline on custom model detail pages (Web App)
    - **Loss Curve Charts:** Fixed an issue where loss curves were not updating in real-time during fine-tuning jobs (Web App)
    - **Deployment Shapes:** Fixed deployment shape selection for fine-tuned models (PEFT and live-merge) (Web App)
    - **Usage Charts:** Fixed replica calculation in multi-series usage charts (Web App)
    - **Session Management:** Removed auto-logout on inactivity for improved user experience (Web App)
    - **Onboarding:** Updated onboarding survey with improved profile and questionnaire flow (Web App)
    - **Fine-Tuning Form:** Max context length now defaults to and is capped by the selected base model's context length (Web App)
    - **Secrets for Evaluators:** Added documentation for using secrets in evaluators to securely call external services (Docs)
    - **Region Selection:** Deprecated regions are now filtered from deployment options (Web App)
    - **Playground:** Embedding and reranker models are now filtered from playground model selection (Web App)
    - **LoRA Rank:** Updated valid LoRA rank range to 4-32 in documentation (Docs)
    - **SFT Documentation:** Added documentation for batch size, learning rate warmup, and gradient accumulation settings (Docs)
    - **Direct Routing:** Added OpenAI SDK code examples for direct routing (Docs)
    - **Recommended Models:** Updated model recommendations with migration guidance from Claude, GPT, and Gemini (Docs)
  </Accordion>
</Update>

<Update label="2025-11-12">
  ## ☀️ Sunsetting Build SDK

  The Build SDK is being deprecated in favor of a new Python SDK generated
  directly from our REST API. The new SDK is more up-to-date, flexible, and
  continuously synchronized with our REST API. Please note that the last version
  of the Build SDK will be `0.19.20`, and the new SDK will start at `1.0.0`.
  Python package managers will not automatically update to the new SDK, so you
  will need to manually update your dependencies and refactor your code.

  Existing codebases using the Build SDK will continue to function as before and
  will not be affected unless you choose to upgrade to the new SDK version.

  The new SDK replaces the Build SDK's `LLM` and `Dataset` classes with REST
  API-aligned methods. If you upgrade to version `1.0.0` or later, you will need
  to migrate your code.

  ## 🚀 Improved RFT Experience

  We've drastically improved the RFT experience with better reliability,
  developer-friendly SDK for hooking up your existing agents, support for
  multi-turn training, better observability in our Web App, and better overall
  developer experience.

  See [Reinforcement Fine-Tuning](/fine-tuning/reinforcement-fine-tuning-models) for more details.
</Update>

<Update label="2025-08-22">
  ## Supervised Fine-Tuning

  We now support supervised fine tuning with separate thinking traces for reasoning models (e.g. DeepSeek R1, GPT OSS, Qwen3 Thinking etc) that ensures training-inference consistency. An example including thinking traces would look like:

  ```json theme={null}
    {
      "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}, 
        {"role": "assistant", "content": "Paris.", "reasoning_content": "The user is asking about the capital city of France, it should be Paris."}
      ]
    }
    {
      "messages": [
        {"role": "user", "content": "What is 1+1?"},
        {"role": "assistant", "content": "2", "weight": 0, "reasoning_content": "The user is asking about the result of 1+1, the answer is 2."},
        {"role": "user", "content": "Now what is 2+2?"},
        {"role": "assistant", "content": "4", "reasoning_content": "The user is asking about the result of 2+2, the answer should be 4."}
      ]
    }
  ```

  We are also properly supporting multi-turn fine tuning (with or without thinking traces) for GPT OSS model family that ensures training-inference consistency.
</Update>

<Update label="2025-08-10">
  ## Supervised Fine-Tuning

  We now support Qwen3 MoE model (Qwen3 dense models are already supported) and GPT OSS models for supervised fine-tuning. GPT OSS model fine tunning support is single-turn without thinking traces at the moment.
</Update>

<Update label="2025-07-29">
  ## 🎨 Vision-Language Model Fine-Tuning

  You can now fine-tune Vision-Language Models (VLMs) on Fireworks AI using the Qwen 2.5 VL model family.
  This extends our Supervised Fine-tuning V2 platform to support multimodal training with both images and text data.

  **Supported models:**

  * Qwen 2.5 VL 3B Instruct
  * Qwen 2.5 VL 7B Instruct
  * Qwen 2.5 VL 32B Instruct
  * Qwen 2.5 VL 72B Instruct

  **Features:**

  * Fine-tune on datasets containing both images and text in JSONL format with base64-encoded images
  * Support for up to 64K context length during training
  * Built on the same Supervised Fine-tuning V2 infrastructure as text models

  See the [VLM fine-tuning documentation](/fine-tuning/fine-tuning-vlm) for setup instructions and dataset formatting requirements.

  ## 🔧 Build SDK: Deployment Configuration Application Requirement

  The Build SDK now requires you to call `.apply()` to apply any deployment configurations to Fireworks when using `deployment_type="on-demand"` or `deployment_type="on-demand-lora"`. This change ensures explicit control over when deployments are created and helps prevent accidental deployment creation.

  **Key changes:**

  * `.apply()` is now required for on-demand and on-demand-lora deployments
  * Serverless deployments do not require `.apply()` calls
  * If you do not call `.apply()`, you are expected to set up the deployment through the deployment page at [https://app.fireworks.ai/dashboard/deployments](https://app.fireworks.ai/dashboard/deployments)

  **Migration guide:**

  * Add `llm.apply()` after creating LLM instances with `deployment_type="on-demand"` or `deployment_type="on-demand-lora"`
  * No changes needed for serverless deployments
  * See updated documentation for examples and best practices

  This change improves deployment management and provides better control over resource creation.

  <Note>
    This applies to Python SDK version `>=0.19.14`.
  </Note>
</Update>

<Update label="2025-07-23">
  ## 🚀 Bring Your Own Rollout and Reward Development for Reinforcement Learning

  You can now develop your own custom rollout and reward functionality while using
  Fireworks to manage the training and deployment of your models. This gives you
  full control over your reinforcement learning workflows while leveraging
  Fireworks' infrastructure for model training and deployment.

  See the new [LLM.reinforcement\_step()](/tools-sdks/python-client/sdk-reference#reinforcement_step) method and [ReinforcementStep](/tools-sdks/python-client/sdk-reference#reinforcementstep) class for usage examples and details.
</Update>

<Update label="2025-07-16">
  ## Supervised Fine-Tuning V2

  We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only).
</Update>

<Update label="2025-07-10">
  ## 🏗️ Build SDK `LLM` Deployment Logic Refactor

  Based on early feedback from users and internal testing, we've refactored the
  `LLM` class deployment logic in the Build SDK to make it easier to understand.

  **Key changes:**

  * The `id` parameter is now required when `deployment_type` is `"on-demand"`
  * The `base_id` parameter is now required when `deployment_type` is `"on-demand-lora"`
  * The `deployment_display_name` parameter is now optional and defaults to the filename where the LLM was instantiated

  A new deployment will be created if a deployment with the same `id` does not
  exist. Otherwise, the existing deployment will be reused.
</Update>

<Update label="2025-07-02">
  ## 🚀 Support for Responses API in Python SDK

  You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications.

  See the [Responses API guide](/guides/response-api) for usage examples and details.
</Update>

<Update label="2025-07-01">
  ## Support for LinkedIn authentication

  You can now log in to Fireworks using your LinkedIn account. This is useful if
  you already have a LinkedIn account and want to use it to log in to Fireworks.

  To log in with LinkedIn, go to the [Fireworks login
  page](https://fireworks.ai/login) and click the "Continue with LinkedIn"
  button.

  You can also log in with LinkedIn from the CLI using the `firectl login`
  command.

  **How it works:**

  * Fireworks uses your LinkedIn primary email address for account identification
  * You can switch between different Fireworks accounts by changing your LinkedIn primary email
  * See our [LinkedIn authentication FAQ](/faq-new/account-access/what-email-does-linkedin-authentication-use) for detailed instructions on managing email addresses
</Update>

<Update label="2025-06-30">
  ## Support for GitHub authentication

  You can now log in to Fireworks using your GitHub account. This is useful if
  you already have a GitHub account and want to use it to log in to Fireworks.

  To log in with GitHub, go to the [Fireworks login
  page](https://fireworks.ai/login) and click the "Continue with GitHub"
  button.

  You can also log in with GitHub from the CLI using the `firectl login`
  command.

  ## 🚨 Document Inlining Deprecation

  Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending `#transform=inline` to document URLs.

  **Migration recommendations:**

  * For image processing: Use Vision Language Models (VLMs) like [Qwen2.5-VL 32B Instruct](https://app.fireworks.ai/models/fireworks/qwen2p5-vl-32b-instruct)
  * For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs
  * For structured extraction: Leverage our [structured responses](/structured-responses/structured-response-formatting) capabilities

  For assistance with migration, please contact our support team or visit our [Discord community](https://discord.gg/fireworks-ai).
</Update>

<Update label="2025-06-24">
  ## 🎯 Build SDK: Reward-kit integration for evaluator development

  The Build SDK now natively integrates with [reward-kit](https://github.com/fw-ai-external/reward-kit) to simplify evaluator development for [Reinforcement Fine-Tuning (RFT)](/fine-tuning/reinforcement-fine-tuning-models). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure.

  **Key features:**

  * Native reward-kit integration for evaluator development
  * Automatic packaging of dependencies from `pyproject.toml` or `requirements.txt`
  * Local testing capabilities before deployment
  * Direct integration with Fireworks datasets and evaluation jobs
  * Support for third-party libraries and complex evaluation logic

  See our [Developing Evaluators](/tools-sdks/python-client/developing-evaluators) guide to get started with your first evaluator in minutes.

  ## Added new Responses API for advanced conversational workflows and integrations

  * Continue conversations across multiple turns using the `previous_response_id` parameter to maintain context without resending full history
  * Stream responses in real time as they are generated for responsive applications
  * Control response storage with the `store` parameter—choose whether responses are retrievable by ID or ephemeral

  See the [Response API guide](/guides/response-api) for usage examples and details.
</Update>

<Update label="2025-06-13">
  ## Supervised Fine-Tuning V2

  Supervised Fine-Tuning V2 released.

  **Key features:**

  * Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1
  * Longer context window up to full context length of the supported models
  * Multi-turn function calling fine-tuning
  * Quantization aware training

  More details in the [blogpost](https://fireworks.ai/blog/supervised-finetuning-v2).

  ## Reinforcement Fine-Tuning (RFT)

  Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in [blospost](https://fireworks.ai/blog/reinforcement-fine-tuning-models).
</Update>

<Update label="2025-05-20">
  ## Diarization and batch processing support added to audio inference

  See our [blog post](https://fireworks.ai/blog/audio-summer-updates-and-new-features) for details.
</Update>

<Update label="2025-05-19">
  ## 🚀 Easier & faster LoRA fine-tune deployments on Fireworks

  You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model:

  ```bash theme={null}
  firectl deployment create "accounts/fireworks/models/<MODEL_ID of lora model>"
  ```

  Previously, this involved two distinct steps, and the resulting deployment was slower than the base model:

  1. Create a deployment using `firectl deployment create "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons`
  2. Then deploy the addon to the deployment: `firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>`

  For more information, see our [deployment documentation](https://docs.fireworks.ai/models/deploying#deploying-to-on-demand).

  <Note>
    This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment as described in the documentation.
  </Note>
</Update>