Merging LoRA adapters with base models
A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks.
Merging LoRA adapters with base models on Fireworks
A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks.
Prerequisites:
- Fireworks account and
firectl
installed - Python environment with necessary packages
- Local LoRA adapter or access to HuggingFace
- Python 3.9 or later (< 3.13)
Follow the steps below to merge and deploy your models.
1. Access and download base model
1.1 List available models
View all models in your Fireworks account:
Example output:
Recall the supported base models:
- Gemma
- Phi, Phi-3
- Llama 1, 2, 3, 3.1
- LLaVa
- Mistral & Mixtral
- Qwen2
- StableLM
- Starcoder (GPTBigCode) & Starcoder2
- DeepSeek V1 & V2
- GPT NeoX
1.2 Download base model
Download your chosen model to a local directory:
Example:
Available flags:
--quiet
: Suppress progress bar-h, --help
: Display help information
2. Obtain LoRA adapter
2.1 Download LoRA adapter from Fireworks
The easiest way to obtain a LoRA adapter is to download it directly from Fireworks. LoRA adapters are listed alongside models when using firectl list models
and are denoted with the type HF_PEFT_ADDON
. Download a LoRA adapter using the same command as downloading a model.
2.2 Download from HuggingFace (Optional)
If you need to download a LoRA adapter from HuggingFace, follow these steps:
Requirements
Install the required package:
Download code
Important notes:
- Replace
adapter_id
with your desired LoRA adapter - Ensure
output_path
is a valid directory path - The function returns the local path where files are downloaded
3. Merging base model with LoRA adapter
3.1 Installation requirements
First, ensure you have the necessary libraries installed:
3.2 Merging script
Create a Python script (merge_model.py
) with the following code:
3.3 Running the merge
Execute the script after setting your paths:
Important: After merging, verify that all necessary tokenizer files are present in the output directory. The merging process might skip some essential tokenizer files. You may need to manually copy these files from the base model:
tokenizer_config.json
tokenizer.json
special_tokens_map.json
These files can be found in the original base model directory or the model’s HuggingFace repository (e.g., meta-llama/Llama-3.1-70B-Instruct).
3.4 Important Notes
- Ensure sufficient disk and GPU memory for all models
- Check your cache directory (~/.cache/huggingface/hub) as models may already be downloaded there
- Verify LoRA adapter compatibility with base model
- All paths must exist and have proper permissions
- Memory issues can be resolved by setting
device_map="cpu"
4. Uploading and deploying merged model
4.1 Create model in Fireworks
Upload your merged model to Fireworks:
Example:
For additional options:
4.2 Create deployment
Deploy your uploaded model:
Basic deployment:
Using full model path:
Example:
Recall, for additional deployment parameters/configuration options:
4.3 Verification
After deployment, you can verify the status using:
Complete workflow summary
- Download base model from Fireworks using
firectl
- Download LoRA adapter to local device (e.g. using HuggingFace)
- Merge models using provided Python script
- Upload merged model to Fireworks
- Create deployment
Was this page helpful?