Merging LoRA adapters with base models on Fireworks
You can now directly merge a LoRA into the base model at deployment time, instead of manually merging LoRA into the base model in advance.
A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks.
Prerequisites:
- Fireworks account and
firectl
installed
- Python environment with necessary packages
- Local LoRA adapter or access to HuggingFace
- Python 3.9 or later (< 3.13)
Follow the steps below to merge and deploy your models.
1. Access and download base model
1.1 List available models
View all models in your Fireworks account:
Example output:
Code Llama 13B (code-llama-13b) 2024-02-29 20:36:24 HF_BASE_MODEL
CodeGemma 7B (codegemma-7b) 2024-06-19 22:57:22 HF_BASE_MODEL
... ... ...
Recall the supported base models:
- Gemma
- Phi, Phi-3
- Llama 1, 2, 3, 3.1
- LLaVa
- Mistral & Mixtral
- Qwen2
- StableLM
- Starcoder (GPTBigCode) & Starcoder2
- DeepSeek V1 & V2
- GPT NeoX
1.2 Download base model
Download your chosen model to a local directory:
firectl download model <model-id> <output-directory>
Example:
firectl download model code-llama-13b ./base_model
Available flags:
--quiet
: Suppress progress bar
-h, --help
: Display help information
2. Obtain LoRA adapter
2.1 Download LoRA adapter from Fireworks
The easiest way to obtain a LoRA adapter is to download it directly from Fireworks. LoRA adapters are listed alongside models when using firectl list models
and are denoted with the type HF_PEFT_ADDON
. Download a LoRA adapter using the same command as downloading a model.
2.2 Download from HuggingFace (Optional)
If you need to download a LoRA adapter from HuggingFace, follow these steps:
Requirements
Install the required package:
pip install huggingface_hub
Download code
from huggingface_hub import snapshot_download
# Configure download parameters
adapter_id = "hf-account/adapter-name" # Your HuggingFace adapter path
output_path = "./path/to/save/adapter" # Local directory to save adapter
# Download the adapter
local_path = snapshot_download(
repo_id=adapter_id,
local_dir=output_path
)
Important notes:
- Replace
adapter_id
with your desired LoRA adapter
- Ensure
output_path
is a valid directory path
- The function returns the local path where files are downloaded
3. Merging base model with LoRA adapter
3.1 Installation requirements
First, ensure you have the necessary libraries installed:
pip install torch transformers peft
3.2 Merging script
Create a Python script (merge_model.py
) with the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
def merge_lora_with_base_model(base_model_path: str, lora_path: str, output_path: str):
"""
Merge a LoRA adapter with a base model and save the result.
Args:
base_model_path (str): Path to the base model directory
lora_path (str): Path to the LoRA adapter directory
output_path (str): Directory to save the merged model
"""
# Load base model
print(f"Loading base model from {base_model_path}")
base_model = AutoModelForCausalLM.from_pretrained(
base_model_path,
torch_dtype=torch.float16,
device_map="auto"
)
# Load and apply LoRA adapter
print(f"Loading LoRA adapter from {lora_path}")
model = PeftModel.from_pretrained(
base_model,
lora_path
)
# Merge adapter with base model
print("Merging LoRA adapter with base model...")
merged_model = model.merge_and_unload()
# Save merged model
print(f"Saving merged model to {output_path}")
merged_model.save_pretrained(output_path)
# Save tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
tokenizer.save_pretrained(output_path)
print("Merge completed successfully!")
if __name__ == "__main__":
# Example usage
merge_lora_with_base_model(
base_model_path="./base_model", # Directory containing the base model
lora_path="./lora_adapter", # Directory containing the LoRA adapter
output_path="./merged_model" # Output directory for merged model
)
If you downloaded the base model from Fireworks AI, then you might need to update the base_model_path
to ./base_model/hf
because required files such as config.json
might be within the hf
directory.
3.3 Running the merge
Execute the script after setting your paths:
Important: After merging, verify that all necessary tokenizer files are present in the output directory. The merging process might skip some essential tokenizer files. You may need to manually copy these files from the base model:
tokenizer_config.json
tokenizer.json
special_tokens_map.json
These files can be found in the original base model directory or the model’s HuggingFace repository (e.g., meta-llama/Llama-3.1-70B-Instruct).
3.4 Important Notes
- Ensure sufficient disk and GPU memory for all models
- Check your cache directory (~/.cache/huggingface/hub) as models may already be downloaded there
- Verify LoRA adapter compatibility with base model
- All paths must exist and have proper permissions
- Memory issues can be resolved by setting
device_map="cpu"
4. Uploading and deploying merged model
4.1 Create model in Fireworks
Upload your merged model to Fireworks:
firectl create model <model-name> <path/to/merged/model>
Example:
firectl create model sql-enhanced-model ./merged_model
For additional options:
4.2 Create deployment
Deploy your uploaded model:
Basic deployment:
firectl create deployment <model-name>
Using full model path:
firectl create deployment accounts/<account-name>/models/<model-name>
Example:
firectl create deployment sql-enhanced-model
# OR
firectl create deployment accounts/myaccount/models/sql-enhanced-model
Recall, for additional deployment parameters/configuration options:
firectl create deployment -h
4.3 Verification
After deployment, you can verify the status using:
Complete workflow summary
- Download base model from Fireworks using
firectl
- Download LoRA adapter to local device (e.g. using HuggingFace)
- Merge models using provided Python script
- Upload merged model to Fireworks
- Create deployment