Merging LoRA adapters with base models on Fireworks

You can now directly merge a LoRA into the base model at deployment time, instead of manually merging LoRA into the base model in advance.

A guide for downloading base models, merging them with LoRA adapters, and deploying the result using Fireworks. Prerequisites:

Fireworks account and firectl installed
Python environment with necessary packages
Local LoRA adapter or access to HuggingFace
Python 3.9 or later (< 3.13)

Follow the steps below to merge and deploy your models.

1. Access and download base model

1.1 List available models

View all models in your Fireworks account:

firectl list models

Example output:

Code Llama 13B (code-llama-13b)    2024-02-29 20:36:24    HF_BASE_MODEL
CodeGemma 7B (codegemma-7b)        2024-06-19 22:57:22    HF_BASE_MODEL
...                                ...                    ...

Recall the supported base models:

Gemma
Phi, Phi-3
Llama 1, 2, 3, 3.1
LLaVa
Mistral & Mixtral
Qwen2
StableLM
Starcoder (GPTBigCode) & Starcoder2
DeepSeek V1 & V2
GPT NeoX

1.2 Download base model

Download your chosen model to a local directory:

firectl download model <model-id> <output-directory>

Example:

firectl download model code-llama-13b ./base_model

Available flags:

--quiet: Suppress progress bar
-h, --help: Display help information

2. Obtain LoRA adapter

2.1 Download LoRA adapter from Fireworks

The easiest way to obtain a LoRA adapter is to download it directly from Fireworks. LoRA adapters are listed alongside models when using firectl list models and are denoted with the type HF_PEFT_ADDON. Download a LoRA adapter using the same command as downloading a model.

2.2 Download from HuggingFace (Optional)

If you need to download a LoRA adapter from HuggingFace, follow these steps: Requirements Install the required package:

pip install huggingface_hub

Download code

from huggingface_hub import snapshot_download

# Configure download parameters
adapter_id = "hf-account/adapter-name"       # Your HuggingFace adapter path
output_path = "./path/to/save/adapter"       # Local directory to save adapter

# Download the adapter
local_path = snapshot_download(
    repo_id=adapter_id,
    local_dir=output_path
)

Important notes:

Replace adapter_id with your desired LoRA adapter
Ensure output_path is a valid directory path
The function returns the local path where files are downloaded

3. Merging base model with LoRA adapter

3.1 Installation requirements

First, ensure you have the necessary libraries installed:

pip install torch transformers peft

3.2 Merging script

Create a Python script (merge_model.py) with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

def merge_lora_with_base_model(base_model_path: str, lora_path: str, output_path: str):
    """
    Merge a LoRA adapter with a base model and save the result.

    Args:
        base_model_path (str): Path to the base model directory
        lora_path (str): Path to the LoRA adapter directory
        output_path (str): Directory to save the merged model
    """
    # Load base model
    print(f"Loading base model from {base_model_path}")
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_path,
        torch_dtype=torch.float16,
        device_map="auto"
    )

    # Load and apply LoRA adapter
    print(f"Loading LoRA adapter from {lora_path}")
    model = PeftModel.from_pretrained(
        base_model,
        lora_path
    )

    # Merge adapter with base model
    print("Merging LoRA adapter with base model...")
    merged_model = model.merge_and_unload()

    # Save merged model
    print(f"Saving merged model to {output_path}")
    merged_model.save_pretrained(output_path)

    # Save tokenizer
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    tokenizer.save_pretrained(output_path)
  
    print("Merge completed successfully!")

if __name__ == "__main__":
    # Example usage
    merge_lora_with_base_model(
        base_model_path="./base_model",  # Directory containing the base model
        lora_path="./lora_adapter",      # Directory containing the LoRA adapter
        output_path="./merged_model"     # Output directory for merged model
    )

If you downloaded the base model from Fireworks AI, then you might need to update the base_model_path to ./base_model/hf because required files such as config.json might be within the hf directory.

3.3 Running the merge

Execute the script after setting your paths:

python merge_model.py

Important: After merging, verify that all necessary tokenizer files are present in the output directory. The merging process might skip some essential tokenizer files. You may need to manually copy these files from the base model:

tokenizer_config.json
tokenizer.json
special_tokens_map.json

These files can be found in the original base model directory or the model’s HuggingFace repository (e.g., meta-llama/Llama-3.1-70B-Instruct).

3.4 Important Notes

Ensure sufficient disk and GPU memory for all models
Check your cache directory (~/.cache/huggingface/hub) as models may already be downloaded there
Verify LoRA adapter compatibility with base model
All paths must exist and have proper permissions
Memory issues can be resolved by setting device_map="cpu"

4. Uploading and deploying merged model

4.1 Create model in Fireworks

Upload your merged model to Fireworks:

firectl create model <model-name> <path/to/merged/model>

Example:

firectl create model sql-enhanced-model ./merged_model

For additional options:

firectl create model -h

4.2 Create deployment

Deploy your uploaded model: Basic deployment:

firectl create deployment <model-name>

Using full model path:

firectl create deployment accounts/<account-name>/models/<model-name>

Example:

firectl create deployment sql-enhanced-model
# OR
firectl create deployment accounts/myaccount/models/sql-enhanced-model

Recall, for additional deployment parameters/configuration options:

firectl create deployment -h

4.3 Verification

After deployment, you can verify the status using:

firectl list deployments

Complete workflow summary

Download base model from Fireworks using firectl
Download LoRA adapter to local device (e.g. using HuggingFace)
Merge models using provided Python script
Upload merged model to Fireworks
Create deployment

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Merging LoRA adapters with base models

Merging LoRA adapters with base models on Fireworks

1. Access and download base model

1.1 List available models

1.2 Download base model

2. Obtain LoRA adapter

2.1 Download LoRA adapter from Fireworks

2.2 Download from HuggingFace (Optional)

3. Merging base model with LoRA adapter

3.1 Installation requirements

3.2 Merging script

3.3 Running the merge

3.4 Important Notes

4. Uploading and deploying merged model

4.1 Create model in Fireworks

4.2 Create deployment

4.3 Verification

Complete workflow summary

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Merging LoRA adapters with base models on Fireworks

​1. Access and download base model

​1.1 List available models

​1.2 Download base model

​2. Obtain LoRA adapter

​2.1 Download LoRA adapter from Fireworks

​2.2 Download from HuggingFace (Optional)

​3. Merging base model with LoRA adapter

​3.1 Installation requirements

​3.2 Merging script

​3.3 Running the merge

​3.4 Important Notes

​4. Uploading and deploying merged model

​4.1 Create model in Fireworks

​4.2 Create deployment

​4.3 Verification

​Complete workflow summary

Merging LoRA adapters with base models on Fireworks

1. Access and download base model

1.1 List available models

1.2 Download base model

2. Obtain LoRA adapter

2.1 Download LoRA adapter from Fireworks

2.2 Download from HuggingFace (Optional)

3. Merging base model with LoRA adapter

3.1 Installation requirements

3.2 Merging script

3.3 Running the merge

3.4 Important Notes

4. Uploading and deploying merged model

4.1 Create model in Fireworks

4.2 Create deployment

4.3 Verification

Complete workflow summary