After fine-tuning your model on Fireworks, deploy it to make it available for inference.
Fine-tuned LoRA models, whether created on the Fireworks platform or imported, can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported for LoRA addons.
Single-LoRA deployment
Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.
Quick deployment
Deploy your LoRA fine-tuned model with one simple command:
firectl deployment create "accounts/fireworks/models/<MODEL_ID of lora model>"
Your deployment will be ready to use once it completes, with performance that matches the base model.
Multi-LoRA deployment
If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.
Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.
Deploy with CLI
Create base model deployment
Deploy the base model with addons enabled:firectl deployment create "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
Load LoRA addons
Once the deployment is ready, load your LoRA models onto the deployment:firectl load-lora <FINE_TUNED_MODEL_ID> --deployment <DEPLOYMENT_ID>
You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs.
When to use multi-LoRA deployment
Use multi-LoRA deployment when you:
- Need to serve multiple fine-tuned models based on the same base model
- Want to maximize deployment utilization
- Can accept some performance tradeoff compared to single-LoRA deployment
- Are managing multiple variants or experiments of the same model
Routing requests to LoRA addons
Deprecation notice: The deployedModel request key for routing to LoRA addons is deprecated and will not be supported for any new deployments. Please migrate to the model field with the <model_name>#<deployment_name> format shown below.
To send inference requests to a specific LoRA addon on a multi-LoRA deployment, set the model field in your request payload to <model_name>#<deployment_name>. The # separator tells Fireworks to route the request to the specified LoRA addon loaded on the given deployment.
Python (Fireworks SDK)
Python (OpenAI SDK)
JavaScript
curl
from fireworks import Fireworks
client = Fireworks()
response = client.chat.completions.create(
model="accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("FIREWORKS_API_KEY"),
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
const response = await client.chat.completions.create({
model: "accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
messages: [
{
role: "user",
content: "Hello!",
},
],
});
console.log(response.choices[0].message.content);
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/<ACCOUNT_ID>/models/<FINE_TUNED_MODEL_ID>#accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
Next steps