2025-07-16

Supervised Fine-Tuning V2

We now support Llama 4 MoE model supervised fine-tuning (Llama 4 Scout, Llama 4 Maverick, Text only).
2025-07-10

🏗️ Build SDK LLM Deployment Logic Refactor

Based on early feedback from users and internal testing, we’ve refactored the LLM class deployment logic in the Build SDK to make it easier to understand.Key changes:
  • The id parameter is now required when deployment_type is "on-demand"
  • The base_id parameter is now required when deployment_type is "on-demand-lora"
  • The deployment_display_name parameter is now optional and defaults to the filename where the LLM was instantiated
A new deployment will be created if a deployment with the same id does not exist. Otherwise, the existing deployment will be reused.
2025-07-02

🚀 Support for Responses API in Python SDK

You can now use the Responses API in the Python SDK. This is useful if you want to use the Responses API in your own applications.See the Responses API guide for usage examples and details.
2025-07-01

Support for LinkedIn authentication

You can now log in to Fireworks using your LinkedIn account. This is useful if you already have a LinkedIn account and want to use it to log in to Fireworks.To log in with LinkedIn, go to the Fireworks login page and click the “Continue with LinkedIn” button.You can also log in with LinkedIn from the CLI using the firectl login command.How it works:
  • Fireworks uses your LinkedIn primary email address for account identification
  • You can switch between different Fireworks accounts by changing your LinkedIn primary email
  • See our LinkedIn authentication FAQ for detailed instructions on managing email addresses
2025-06-30

Support for GitHub authentication

You can now log in to Fireworks using your GitHub account. This is useful if you already have a GitHub account and want to use it to log in to Fireworks.To log in with GitHub, go to the Fireworks login page and click the “Continue with GitHub” button.You can also log in with GitHub from the CLI using the firectl login command.

🚨 Document Inlining Deprecation

Document Inlining has been deprecated and is no longer available on the Fireworks platform. This feature allowed LLMs to process images and PDFs through the chat completions API by appending #transform=inline to document URLs.Migration recommendations:
  • For image processing: Use Vision Language Models (VLMs) like Qwen2.5-VL 32B Instruct
  • For PDF processing: Use dedicated PDF processing libraries combined with text-based LLMs
  • For structured extraction: Leverage our structured responses capabilities
For assistance with migration, please contact our support team or visit our Discord community.
2025-06-24

🎯 Build SDK: Reward-kit integration for evaluator development

The Build SDK now natively integrates with reward-kit to simplify evaluator development for Reinforcement Fine-Tuning (RFT). You can now create custom evaluators in Python with automatic dependency management and seamless deployment to Fireworks infrastructure.Key features:
  • Native reward-kit integration for evaluator development
  • Automatic packaging of dependencies from pyproject.toml or requirements.txt
  • Local testing capabilities before deployment
  • Direct integration with Fireworks datasets and evaluation jobs
  • Support for third-party libraries and complex evaluation logic
See our Developing Evaluators guide to get started with your first evaluator in minutes.

Added new Responses API for advanced conversational workflows and integrations

  • Continue conversations across multiple turns using the previous_response_id parameter to maintain context without resending full history
  • Stream responses in real time as they are generated for responsive applications
  • Control response storage with the store parameter—choose whether responses are retrievable by ID or ephemeral
See the Response API guide for usage examples and details.
2025-06-13

Supervised Fine-Tuning V2

Supervised Fine-Tuning V2 released.Key features:
  • Supports Qwen 2/2.5/3 series, Phi 4, Gemma 3, the Llama 3 family, Deepseek V2, V3, R1
  • Longer context window up to full context length of the supported models
  • Multi-turn function calling fine-tuning
  • Quantization aware training
More details in the blogpost.

Reinforcement Fine-Tuning (RFT)

Reinforcement Fine-Tuning released. Train expert models to surpass closed source frontier models through verifiable reward. More details in blospost.
2025-05-20

Diarization and batch processing support added to audio inference

See our blog post for details.
2025-05-19

🚀 Easier & faster LoRA fine-tune deployments on Fireworks

You can now deploy a LoRA fine-tune with a single command and get speeds that approximately match the base model:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of lora model>"
Previously, this involved two distinct steps, and the resulting deployment was slower than the base model:
  1. Create a deployment using firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
  2. Then deploy the addon to the deployment: firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>
For more information, see our deployment documentation.
This change is for dedicated deployments with a single LoRA. You can still deploy multiple LoRAs on a deployment or deploy LoRA(s) on some Serverless models as described in the documentation.