Deploying Fine Tuned Models

After fine-tuning your model on Fireworks, deploy it to make it available for inference.

Fine-tuned LoRA models, whether created on the Fireworks platform or imported, can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported for LoRA addons.

You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See importing fine-tuned models for details.

Single-LoRA deployment

Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.

Quick deployment

Deploy your LoRA fine-tuned model with one simple command:

firectl deployment create "accounts/fireworks/models/<MODEL_ID of lora model>"

Your deployment will be ready to use once it completes, with performance that matches the base model.

Multi-LoRA deployment

If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.

Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.

Deploy with CLI

Create base model deployment

Deploy the base model with addons enabled:

firectl deployment create "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons

Load LoRA addons

Once the deployment is ready, load your LoRA models onto the deployment:

firectl load-lora <FINE_TUNED_MODEL_ID> --deployment <DEPLOYMENT_ID>

You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs.

When to use multi-LoRA deployment

Use multi-LoRA deployment when you:

Need to serve multiple fine-tuned models based on the same base model
Want to maximize deployment utilization
Can accept some performance tradeoff compared to single-LoRA deployment
Are managing multiple variants or experiments of the same model

Routing requests to LoRA addons

Deprecation notice: The deployedModel request key for routing to LoRA addons is deprecated and will not be supported for any new deployments. Please migrate to the model field with the <model_name>#<deployment_name> format shown below.

To send inference requests to a specific LoRA addon on a multi-LoRA deployment, set the model field in your request payload to <model_name>#<deployment_name>. The # separator tells Fireworks to route the request to the specified LoRA addon loaded on the given deployment.

Python (Fireworks SDK)
Python (OpenAI SDK)
JavaScript
curl

from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
  messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

const response = await client.chat.completions.create({
  model: "accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
  messages: [
    {
      role: "user",
      content: "Hello!",
    },
  ],
});

console.log(response.choices[0].message.content);

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Single-LoRA deployment

Quick deployment

Multi-LoRA deployment

Deploy with CLI

When to use multi-LoRA deployment

Routing requests to LoRA addons

Next steps

On-Demand Deployments

Import Fine-Tuned Models

Get Started

Developer Pass

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Single-LoRA deployment

​Quick deployment

​Multi-LoRA deployment

​Deploy with CLI

​When to use multi-LoRA deployment

​Routing requests to LoRA addons

​Next steps

On-Demand Deployments

Import Fine-Tuned Models

Single-LoRA deployment

Quick deployment

Multi-LoRA deployment

Deploy with CLI

When to use multi-LoRA deployment

Routing requests to LoRA addons

Next steps