Merging LoRA adapters with base models on Fireworks

A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks.

Prerequisites:

  • Fireworks account and firectl installed
  • Python environment with necessary packages
  • Local LoRA adapter or access to HuggingFace
  • Python 3.9 or later (< 3.13)

Follow the steps below to merge and deploy your models.

1. Access and download base model

1.1 List available models

View all models in your Fireworks account:

firectl list models

Example output:

Code Llama 13B (code-llama-13b)    2024-02-29 20:36:24    HF_BASE_MODEL
CodeGemma 7B (codegemma-7b)        2024-06-19 22:57:22    HF_BASE_MODEL
...                                ...                    ...

Recall the supported base models:

  • Gemma
  • Phi, Phi-3
  • Llama 1, 2, 3, 3.1
  • LLaVa
  • Mistral & Mixtral
  • Qwen2
  • StableLM
  • Starcoder (GPTBigCode) & Starcoder2
  • DeepSeek V1 & V2
  • GPT NeoX

1.2 Download base model

Download your chosen model to a local directory:

firectl download model <model-id> <output-directory>

Example:

firectl download model code-llama-13b ./base_model

Available flags:

  • --quiet: Suppress progress bar
  • -h, --help: Display help information

2. Obtain LoRA adapter

2.1 Download LoRA adapter from Fireworks

The easiest way to obtain a LoRA adapter is to download it directly from Fireworks. LoRA adapters are listed alongside models when using firectl list models and are denoted with the type HF_PEFT_ADDON. Download a LoRA adapter using the same command as downloading a model.

2.2 Download from HuggingFace (Optional)

If you need to download a LoRA adapter from HuggingFace, follow these steps:

Requirements

Install the required package:

pip install huggingface_hub

Download code

from huggingface_hub import snapshot_download

# Configure download parameters
adapter_id = "hf-account/adapter-name"       # Your HuggingFace adapter path
output_path = "./path/to/save/adapter"       # Local directory to save adapter

# Download the adapter
local_path = snapshot_download(
    repo_id=adapter_id,
    local_dir=output_path
)

Important notes:

  • Replace adapter_id with your desired LoRA adapter
  • Ensure output_path is a valid directory path
  • The function returns the local path where files are downloaded

3. Merging base model with LoRA adapter

3.1 Installation requirements

First, ensure you have the necessary libraries installed:

pip install torch transformers peft

3.2 Merging script

Create a Python script (merge_model.py) with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

def merge_lora_with_base_model(base_model_path: str, lora_path: str, output_path: str):
    """
    Merge a LoRA adapter with a base model and save the result.

    Args:
        base_model_path (str): Path to the base model directory
        lora_path (str): Path to the LoRA adapter directory
        output_path (str): Directory to save the merged model
    """
    # Load base model
    print(f"Loading base model from {base_model_path}")
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        torch_dtype=torch.float16,
        device_map="auto"
    )

    # Load and apply LoRA adapter
    print(f"Loading LoRA adapter from {lora_path}")
    model = PeftModel.from_pretrained(
        base_model,
        lora_path
    )

    # Merge adapter with base model
    print("Merging LoRA adapter with base model...")
    merged_model = model.merge_and_unload()

    # Save merged model
    print(f"Saving merged model to {output_path}")
    merged_model.save_pretrained(output_path)

    # Save tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    tokenizer.save_pretrained(output_path)
  
    print("Merge completed successfully!")

if __name__ == "__main__":
    # Example usage
    merge_lora_with_base_model(
        base_model_path="./base_model",  # Directory containing the base model
        lora_path="./lora_adapter",      # Directory containing the LoRA adapter
        output_path="./merged_model"     # Output directory for merged model
    )

3.3 Running the merge

Execute the script after setting your paths:

python merge_model.py

Important: After merging, verify that all necessary tokenizer files are present in the output directory. The merging process might skip some essential tokenizer files. You may need to manually copy these files from the base model:

  • tokenizer_config.json
  • tokenizer.json
  • special_tokens_map.json

These files can be found in the original base model directory or the model’s HuggingFace repository (e.g., meta-llama/Llama-3.1-70B-Instruct).

3.4 Important Notes

  • Ensure sufficient disk and GPU memory for all models
  • Check your cache directory (~/.cache/huggingface/hub) as models may already be downloaded there
  • Verify LoRA adapter compatibility with base model
  • All paths must exist and have proper permissions
  • Memory issues can be resolved by setting device_map="cpu"

4. Uploading and deploying merged model

4.1 Create model in Fireworks

Upload your merged model to Fireworks:

firectl create model <model-name> <path/to/merged/model>

Example:

firectl create model sql-enhanced-model ./merged_model

For additional options:

firectl create model -h

4.2 Create deployment

Deploy your uploaded model:

Basic deployment:

firectl create deployment <model-name>

Using full model path:

firectl create deployment accounts/<account-name>/models/<model-name>

Example:

firectl create deployment sql-enhanced-model
# OR
firectl create deployment accounts/myaccount/models/sql-enhanced-model

Recall, for additional deployment parameters/configuration options:

firectl create deployment -h

4.3 Verification

After deployment, you can verify the status using:

firectl list deployments

Complete workflow summary

  1. Download base model from Fireworks using firectl
  2. Download LoRA adapter to local device (e.g. using HuggingFace)
  3. Merge models using provided Python script
  4. Upload merged model to Fireworks
  5. Create deployment