Skip to main content
After fine-tuning your model, you’ll need to deploy it to make it available for inference. Fireworks supports two deployment patterns depending on your use case:
  • Single-LoRA deployment: Optimal for serving one fine-tuned model with performance matching the base model
  • Multi-LoRA deployment: Share a single base model deployment across multiple LoRA models for higher utilization
You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See importing fine-tuned models for details.

Single-LoRA deployment

Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.

Quick deployment

Deploy your LoRA fine-tuned model with one simple command:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of lora model>"
Your deployment will be ready to use once it completes, with performance that matches the base model.

Deployment with the Build SDK

You can also deploy your LoRA fine-tuned model using the Build SDK:
from fireworks import LLM

# Deploy a fine-tuned model with on-demand deployment (live merge)
fine_tuned_llm = LLM(
    model="accounts/your-account/models/your-fine-tuned-model-id",
    deployment_type="on-demand",
    id="my-fine-tuned-deployment"  # Simple string identifier
)

# Apply the deployment to ensure it's ready
fine_tuned_llm.apply()

# Use the deployed model
response = fine_tuned_llm.chat.completions.create(
    messages=[{"role": "user", "content": "Hello!"}]
)

# Track deployment in web dashboard
print(f"Track at: {fine_tuned_llm.deployment_url}")
The id parameter can be any simple string - it does not need to follow the format "accounts/account_id/deployments/model_id".

When to use single-LoRA deployment

Use single-LoRA deployment when you:
  • Have a single fine-tuned model to serve
  • Need optimal performance that matches the base model
  • Want the simplest deployment process
  • Don’t require sharing a base model across multiple LoRA models

Multi-LoRA deployment

If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.
Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.

Deploy with CLI

1

Create base model deployment

Deploy the base model with addons enabled:
firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
2

Load LoRA addons

Once the deployment is ready, load your LoRA models onto the deployment:
firectl load-lora <FINE_TUNED_MODEL_ID> --deployment <DEPLOYMENT_ID>
You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs.

Deploy with the Build SDK

You can also use multi-LoRA deployment with the Build SDK:
from fireworks import LLM

# Create a base model deployment with addons enabled
base_model = LLM(
    model="accounts/fireworks/models/base-model-id",
    deployment_type="on-demand",
    id="shared-base-deployment",  # Simple string identifier
    enable_addons=True
)
base_model.apply()

# Deploy multiple fine-tuned models using the same base deployment
fine_tuned_model_1 = LLM(
    model="accounts/your-account/models/fine-tuned-model-1",
    deployment_type="on-demand-lora",
    base_id=base_model.deployment_id
)

fine_tuned_model_2 = LLM(
    model="accounts/your-account/models/fine-tuned-model-2", 
    deployment_type="on-demand-lora",
    base_id=base_model.deployment_id
)

# Apply deployments
fine_tuned_model_1.apply()
fine_tuned_model_2.apply()

# Use the deployed models
response_1 = fine_tuned_model_1.chat.completions.create(
    messages=[{"role": "user", "content": "Hello from model 1!"}]
)

response_2 = fine_tuned_model_2.chat.completions.create(
    messages=[{"role": "user", "content": "Hello from model 2!"}]
)
When using deployment_type="on-demand-lora", you need to provide the base_id parameter that references the deployment ID of your base model deployment.

When to use multi-LoRA deployment

Use multi-LoRA deployment when you:
  • Need to serve multiple fine-tuned models based on the same base model
  • Want to maximize deployment utilization
  • Can accept some performance tradeoff compared to single-LoRA deployment
  • Are managing multiple variants or experiments of the same model

Next steps

I