Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.

Quick deployment

Deploy your LoRA fine-tuned model with one simple command:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of lora model>"
Your deployment will be ready to use once it completes, with performance that matches the base model.

Alternative deployment method

This two-step method is the standard approach for multi-LoRA deployments where multiple LoRA models share the same base model. While it can also be used for single LoRA deployments, it provides slower performance compared to live merge and is not recommended for single LoRA use cases.
You can also deploy single LoRA models using a two-step process:
1

Create base model deployment

Deploy the base model with addons enabled:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
2

Load LoRA addon

Once the deployment is ready, load the LoRA model onto the deployment:
firectl load-lora <MODEL_ID> --deployment <DEPLOYMENT_ID>

Deployment with the Build SDK

You can also deploy your LoRA fine-tuned model using the Build SDK:
from fireworks import LLM

# Deploy a fine-tuned model with on-demand deployment (live merge)
fine_tuned_llm = LLM(
    model="accounts/your-account/models/your-fine-tuned-model-id",
    deployment_type="on-demand",
    id="my-fine-tuned-deployment"  # Simple string identifier
)

# Apply the deployment to ensure it's ready
fine_tuned_llm.apply()

# Use the deployed model
response = fine_tuned_llm.chat.completions.create(
    messages=[{"role": "user", "content": "Hello!"}]
)

# Track deployment in web dashboard
print(f"Track at: {fine_tuned_llm.deployment_url}")
The id parameter can be any simple string - it does not need to follow the format "accounts/account_id/deployments/model_id".

When to use live merge

Use live merge deployment when you:
  • Have a single fine-tuned model to serve
  • Need optimal performance that matches the base model
  • Want the simplest deployment process
  • Don’t require sharing a base model across multiple LoRA models
The live merge deployment method is designed for dedicated deployments with a single LoRA model. For multiple LoRA models sharing the same base model, consider using multi-LoRA deployment.