1. Deployment hanging or crashing

2. LoRA adapters vs full models

3. Performance optimization factors

Custom model issues

Autoscaling

Performance questions

Additional resources

Troubleshooting and resolving common issues with on-demand deployments.

Deployment issues

Fireworks AI Docs

Changelog

Community

Blog

API Reference

API Reference DLDE

Cookbook

Tools & SDK

Integrations

DeepSeek

Evaluators (RewardKit)

Get Started

Status

Support

Start building with open source AI models

Introduction

Fireworks AI Developer Platform

Get started in minutes with an OpenAI-compatible endpoint

Quickstart

A list of recommended open models for common use cases

Recommended Open Models

Hitchhikers guide to open models

This document outlines basic Fireworks AI concepts.

Concepts

Querying text models

Querying vision-language models

Using function-calling

Using JSON mode

Using grammar mode

Use Predicted Outputs to boost output generation speeds for editing / rewriting use cases

Using Predicted Outputs

Querying embedding models

Instructions for using test voice agent endpoints

Voice Agent Platform

On-demand deployments

Uploading a custom base model

Deploying models

Quantization

Fireworks runs a global fleet of hardware on which you can deploy your models.

Regions

Reserved capacity

Direct routing enables enterprise users reduce latency to their deployments.

Direct routing

Introduction to fine-tuning

Supervised Fine-Tuning (SFT)

Reinforcement Fine-Tuning (RFT)

Using Multi-LoRA

Importing fine-tuned models

Rate limits, spend limits and quotas for serverless inference and on-demand deployments

Rate limits, spend limits and quotas

Prompt caching

This page lists common error codes encountered during inference requests using the Fireworks API, their meanings, and potential resolutions.

Inference errors

Data privacy & security

Add and delete additional users in your Fireworks account

Managing users

Set up custom Single Sign-On (SSO) authentication for Fireworks AI

Custom SSO

Creates a model response for the given chat conversation.

Create Chat Completion

Creates a completion for the provided prompt and parameters.

Create Completion

Create embeddings

Generate an image

Streaming Transcription

Transcribe audio

Translate audio

Create Batch Request

Check Batch Status

Start here

Step-by-step guides for hands-on exploration, ideal for interactive learning of AI techniques.

Build with Fireworks

Creative user-contributed projects that showcase innovative applications of Fireworks in diverse contexts.

Community showcase

Access information, blog posts, FAQs, and detailed documentation for DeepSeek v3 and R1.

DeepSeek Resources

Introducing the Fireworks Build SDK

Basics of the Build SDK

Tutorial

Reference

Learn to create, deploy, and manage resources using Firectl

Getting Started

Authentication for access to your account

Authentication

List various resources in an Fireworks AI account

List Resources

Create a Deployment on Fireworks AI platform

Create a deployment

Create a fine-tuning job with a base model

Create a fine-tuning job

Create a Dataset on Fireworks AI platform

Create a Dataset

Create Model

Deletes resource(s) in a Fireworks AI account

Delete Resources

Load LoRA

Unload LoRA

Download a model from third-party locations

Download a model

Retrieves model information from Fireworks AI platform

Get Resources

Imports specified model from Fireworks AI Platform

Import Model

Updates Resources on Fireworks AI platform

Update Resources

Undelete Resources on Fireworks AI platform

Undelete Resources

OpenAI compatibility

Learn about the Fireworks Developer Partners Program, including goals, application process, and benefits for tools and platforms in the LLMOps/Gen-Ops ecosystem.

About Fireworks developer partners

Get Account

List Deployments

Get Deployment

Create Deployment

Update Deployment

Delete Deployment

Undelete Deployment

List Models

Get Model

Update Model

Get Model Upload Endpoint

Get Model Download Endpoint

Validate Model Upload

Delete Model

List LoRAs

Get LoRA

Update LoRA

List Supervised Fine-tuning Jobs

Get Supervised Fine-tuning Job

Create Supervised Fine-tuning Job

Delete Supervised Fine-tuning Job

List Reinforcement Fine-tuning Jobs

Get Reinforcement Fine-tuning Job

Create Reinforcement Fine-tuning Job

Delete Reinforcement Fine-tuning Job

List Datasets

Get Dataset

Create Dataset

Provides a streamlined way to upload a dataset file in a single API request. This path can handle file sizes up to 150Mb. For larger file sizes use [Get Dataset Upload Endpoint](get-dataset-upload-endpoint).


Upload Dataset Files

Get Dataset Upload Endpoint

Validate Dataset Upload

Update Dataset

Delete Dataset

List Users

Get User

Create User

Update User

List API Keys

Create API Key

Delete API Key

List Batch Jobs

Get Batch Job

Create Batch Job

Update Batch Job

Cancels an existing batch job if it is queued, pending, or running.

Cancel Batch Job

Get Batch Job Logs

Delete Batch Job

Batch Delete Batch Jobs

List Clusters

Get Cluster

Create Cluster

Update Cluster

Delete Cluster

Retrieve connection settings for the cluster to be put in kubeconfig

Get Cluster Connection Info

List Environments

Create Environment

Get Environment

Update Environment

Connects the environment to a node pool.
Returns an error if there is an existing pending connection.

Connect Environment

Disconnects the environment from the node pool. Returns an error
if the environment is not connected to a node pool.

FAQ

​Custom model issues

​1. Deployment hanging or crashing

​2. LoRA adapters vs full models

​3. Performance optimization factors

​Autoscaling

​Performance questions

​Additional resources