- Single-LoRA deployment: Optimal for serving one fine-tuned model with performance matching the base model
- Multi-LoRA deployment: Share a single base model deployment across multiple LoRA models for higher utilization
You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See importing fine-tuned models for details.
Single-LoRA deployment
Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.Quick deployment
Deploy your LoRA fine-tuned model with one simple command:Your deployment will be ready to use once it completes, with performance that matches the base model.
Deployment with the Build SDK
You can also deploy your LoRA fine-tuned model using the Build SDK:The
id parameter can be any simple string - it does not need to follow the format "accounts/account_id/deployments/model_id".When to use single-LoRA deployment
Use single-LoRA deployment when you:- Have a single fine-tuned model to serve
- Need optimal performance that matches the base model
- Want the simplest deployment process
- Don’t require sharing a base model across multiple LoRA models
Multi-LoRA deployment
If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.
Deploy with CLI
1
Create base model deployment
Deploy the base model with addons enabled:
2
Load LoRA addons
Once the deployment is ready, load your LoRA models onto the deployment:You can load multiple LoRA models onto the same deployment by repeating this command with different model IDs.
Deploy with the Build SDK
You can also use multi-LoRA deployment with the Build SDK:When using
deployment_type="on-demand-lora", you need to provide the base_id parameter that references the deployment ID of your base model deployment.When to use multi-LoRA deployment
Use multi-LoRA deployment when you:- Need to serve multiple fine-tuned models based on the same base model
- Want to maximize deployment utilization
- Can accept some performance tradeoff compared to single-LoRA deployment
- Are managing multiple variants or experiments of the same model