Merging LoRA adapters with base models on Fireworks
You can now directly merge a LoRA into the base model at deployment time, instead of manually merging LoRA into the base model in advance.  
A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks.
Prerequisites:
- Fireworks account and 
firectl installed 
- Python environment with necessary packages
 
- Local LoRA adapter or access to HuggingFace
 
- Python 3.9 or later (< 3.13)
 
Follow the steps below to merge and deploy your models.
1. Access and download base model
1.1 List available models
View all models in your Fireworks account:
Example output:
Code Llama 13B (code-llama-13b)    2024-02-29 20:36:24    HF_BASE_MODEL
CodeGemma 7B (codegemma-7b)        2024-06-19 22:57:22    HF_BASE_MODEL
...                                ...                    ...
 
Recall the supported base models:
- Gemma
 
- Phi, Phi-3
 
- Llama 1, 2, 3, 3.1
 
- LLaVa
 
- Mistral & Mixtral
 
- Qwen2
 
- StableLM
 
- Starcoder (GPTBigCode) & Starcoder2
 
- DeepSeek V1 & V2
 
- GPT NeoX
 
1.2 Download base model
Download your chosen model to a local directory:
firectl download model <model-id> <output-directory>
 
Example:
firectl download model code-llama-13b ./base_model
 
Available flags:
--quiet: Suppress progress bar 
-h, --help: Display help information 
2. Obtain LoRA adapter
2.1 Download LoRA adapter from Fireworks
The easiest way to obtain a LoRA adapter is to download it directly from Fireworks. LoRA adapters are listed alongside models when using firectl list models and are denoted with the type HF_PEFT_ADDON. Download a LoRA adapter using the same command as downloading a model.
2.2 Download from HuggingFace (Optional)
If you need to download a LoRA adapter from HuggingFace, follow these steps:
Requirements
Install the required package:
pip install huggingface_hub
 
Download code
from huggingface_hub import snapshot_download
# Configure download parameters
adapter_id = "hf-account/adapter-name"       # Your HuggingFace adapter path
output_path = "./path/to/save/adapter"       # Local directory to save adapter
# Download the adapter
local_path = snapshot_download(
    repo_id=adapter_id,
    local_dir=output_path
)
 
Important notes:
- Replace 
adapter_id with your desired LoRA adapter 
- Ensure 
output_path is a valid directory path 
- The function returns the local path where files are downloaded
 
3. Merging base model with LoRA adapter
3.1 Installation requirements
First, ensure you have the necessary libraries installed:
pip install torch transformers peft
 
3.2 Merging script
Create a Python script (merge_model.py) with the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
def merge_lora_with_base_model(base_model_path: str, lora_path: str, output_path: str):
    """
    Merge a LoRA adapter with a base model and save the result.
    Args:
        base_model_path (str): Path to the base model directory
        lora_path (str): Path to the LoRA adapter directory
        output_path (str): Directory to save the merged model
    """
    # Load base model
    print(f"Loading base model from {base_model_path}")
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    # Load and apply LoRA adapter
    print(f"Loading LoRA adapter from {lora_path}")
    model = PeftModel.from_pretrained(
        base_model,
        lora_path
    )
    # Merge adapter with base model
    print("Merging LoRA adapter with base model...")
    merged_model = model.merge_and_unload()
    # Save merged model
    print(f"Saving merged model to {output_path}")
    merged_model.save_pretrained(output_path)
    # Save tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    tokenizer.save_pretrained(output_path)
  
    print("Merge completed successfully!")
if __name__ == "__main__":
    # Example usage
    merge_lora_with_base_model(
        base_model_path="./base_model",  # Directory containing the base model
        lora_path="./lora_adapter",      # Directory containing the LoRA adapter
        output_path="./merged_model"     # Output directory for merged model
    )
 
If you downloaded the base model from Fireworks AI, then you might need to update the base_model_path to ./base_model/hf because required files such as config.json might be within the hf directory.
 
3.3 Running the merge
Execute the script after setting your paths:
Important: After merging, verify that all necessary tokenizer files are present in the output directory. The merging process might skip some essential tokenizer files. You may need to manually copy these files from the base model:
tokenizer_config.json 
tokenizer.json 
special_tokens_map.json 
These files can be found in the original base model directory or the model’s HuggingFace repository (e.g., meta-llama/Llama-3.1-70B-Instruct).
3.4 Important Notes
- Ensure sufficient disk and GPU memory for all models
 
- Check your cache directory (~/.cache/huggingface/hub) as models may already be downloaded there
 
- Verify LoRA adapter compatibility with base model
 
- All paths must exist and have proper permissions
 
- Memory issues can be resolved by setting 
device_map="cpu" 
4. Uploading and deploying merged model
4.1 Create model in Fireworks
Upload your merged model to Fireworks:
firectl create model <model-name> <path/to/merged/model>
 
Example:
firectl create model sql-enhanced-model ./merged_model
 
For additional options:
4.2 Create deployment
Deploy your uploaded model:
Basic deployment:
firectl create deployment <model-name>
 
Using full model path:
firectl create deployment accounts/<account-name>/models/<model-name>
 
Example:
firectl create deployment sql-enhanced-model
# OR
firectl create deployment accounts/myaccount/models/sql-enhanced-model
 
Recall, for additional deployment parameters/configuration options:
firectl create deployment -h
 
4.3 Verification
After deployment, you can verify the status using:
Complete workflow summary
- Download base model from Fireworks using 
firectl 
- Download LoRA adapter to local device (e.g. using HuggingFace)
 
- Merge models using provided Python script
 
- Upload merged model to Fireworks
 
- Create deployment