Fine-tuning models

These docs describe Fireworks’ legacy fine-tuning service. We’ve created a new tuning service with with improved speed, usability and reliability, see docs.

Quick Links to Legacy Service Guides

Fine-tuning models via API
API reference for listing fine-tuning jobs
API reference for getting fine-tuning jobs
API reference for creating fine-tuning jobs
API reference for deleting fine-tuning jobs

Introduction

We utilize LoRA (Low-Rank Adaptation) for efficient and effective fine-tuning of large language models. LoRA is used for fine-tuning all models besides our 70B models, which uses qLoRA (quantized) to improve training speeds. Take advantage of this opportunity to enhance your models with our cutting-edge technology!

Fine-tuning a model with a dataset can be useful for several reasons:

Enhanced Precision: It allows the model to adapt to the unique attributes and trends within the dataset, leading to significantly improved precision and effectiveness.
Domain Adaptation: While many models are developed with general data, fine-tuning them with specialized, domain-specific datasets ensures they are finely attuned to the specific requirements of that field.
Bias Reduction: General models may carry inherent biases. Fine-tuning with a well-curated, diverse dataset aids in reducing these biases, fostering fairer and more balanced outcomes.
Contemporary Relevance: Information evolves rapidly, and fine-tuning with the latest data keeps the model current and relevant.
Customization for Specific Applications: This process allows for the tailoring of the model to meet unique objectives and needs, an aspect not achievable with standard models.

In essence, fine-tuning a model with a specific dataset is a pivotal step in ensuring its enhanced accuracy, relevance, and suitability for specific applications. Let’s hop on a journey of fine-tuning a model!

Fine-tuned model inference on Serverless is slower than base model inference on Serverless. For use cases that need low latency, we recommend using on-demand deployments. For on-demand deployements, fine-tuned model inference speeds are significant closer to base model speeds (but still slightly slower). If you are only using 1 LoRA on-demand, merging fine-tuned weights into the base model when using on-demand deployments will provide identical speed to base model inference. If you have an enterprise use case that needs fast fine-tuned models, please contact us!

Installing firectl

firectl is the command-line (CLI) utiliy to manage, and deploy various resources on the Fireworks AI Platform. Use firectl to manage fine-tuning jobs and their resulting models.

Please visit the Firectl Getting Started Guide on installing and using firectl.

Preparing your dataset

To fine-tune a model, we need to first upload a dataset. Once uploaded, this dataset can be used to create one or more fine-tuning jobs. A dataset consists of a single JSONL file, where each line is a separate training example.

Limits:

Minimum number of examples is 1.
Maximum number of examples is 3,000,000.

Format:

Each line of the file must be a valid JSON object.

For the rest of this tutorial, we will use the databricks/databricks-dolly-15k dataset as an example. Each record in this dataset has keys of category, instruction, an optional context, and the expected response, where these keys are expected across all examples. Here are a few sample records:

{"instruction": "When did Virgin Australia start operating?", "context": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.", "response": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.", "category": "closed_qa"}
{"instruction": "Which is a species of fish? Tope or Rope", "context": "", "response": "Tope", "category": "classification"}
{"instruction": "Why can camels survive for long without water?", "context": "", "response": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time.", "category": "open_qa"}
{"instruction": "Alice's parents have three daughters: Amy, Jessy, and what\u2019s the name of the third daughter?", "context": "", "response": "The name of the third daughter is Alice", "category": "open_qa"}
{"instruction": "When was Tomoaki Komorida born?", "context": "Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.", "response": "Tomoaki Komorida was born on July 10,1981.", "category": "closed_qa"}

To create a dataset, run:

firectl create dataset <DATASET_ID> path/to/dataset.jsonl

and you can check the dataset with:

firectl get dataset <DATASET_ID>

To use an existing Hugging Face dataset, please refer to the script below for conversion. Datasets are private and cannot be viewed by other accounts.

Starting your tuning job

Fireworks supports three types of fine-tuning depending on the modeling objective:

Text completion - used to train a text generation model
Text classification - used to train a text classification model
Conversation - used to train a chat/conversation model

There are two ways to specify settings for your tuning job. You can create a settings YAML file and/or specify them using command-line flags. If a setting is present in both, the command-line flag takes precedence.

To start a job, run:

firectl create fine-tuning-job --settings-file path/to/settings.yaml --display-name "My Job"

firectl will return the fine-tuning job ID.

Starting from a base model or a PEFT addon model

When creating a fine-tuning job, you can start tuning from a base model, or from a model you tuned earlier (PEFT addon):

Base model: Use the base_model parameter to start from a pre-trained base model.
PEFT addon model: Use the warm_start_from parameter to start from an existing PEFT addon model.

You must specify either base_model or warm_start_from in your settings file or command-line flags.

The following sections provide examples of a settings file for the given tasks.

Text completion

To train a text completion model, you need to define an input template and output template from your JSON fields. To directly use a field as inputs or outputs, simply set the input and output templates as the field names.

You can also add additional text to the input and output templates. For example, this example demonstrates training on context, instruction and the response fields with added text around the fields, values of these fields will be injected during training. We won’t use the category field at all.

# The ID of the dataset you created above.
dataset: my-dataset

text_completion:
  # How the fields of the JSON dataset should be formatted into the input text.
  input_template: "### GIVEN THE CONTEXT: {context}  ### INSTRUCTION: {instruction}  ### RESPONSE IS: "

  # How the fields of the JSON dataset should be formatted into the output text.
  output_template: "ANSWER: {response}"

# The Fireworks model name of the base model.
base_model: accounts/fireworks/models/llama-v3p1-8b-instruct

Conversation

To train a conversation model, the dataset must conform to the schema expected by the Chat Completions API. Each JSON object of the dataset must contain a single array field called messages. Each message is an object containing two fields:

role - one of “system”, “user”, or “assistant”.
content - the content of the message.

A message with the “system” role is optional, but if specified, must be the first message of the conversation. Subsequent messages start with “user” and alternate between “user” and “assistant”. For example:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "blue"}]}
{"messages": [{"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2"}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4"}]}

The settings file for tuning a conversation model looks like:

# The ID of the dataset you created above.
dataset: my-dataset

conversation: {}

# The Fireworks model name of the base model.
base_model: accounts/fireworks/models/llama-v3p1-8b-instruct

Or, you can optionally pass in a Jinja template to digest the messages, settings file look like:

# The ID of the dataset you created above.
dataset: my-dataset

conversation:
  jinja_template: <jinja template string>

# The Fireworks model name of the base model.
base_model: accounts/fireworks/models/llama-v3p1-8b-instruct

an example of template string will look like:

  {%- set _mode = mode | default('generate', true) -%}
  {%- set stop_token = '<|eot_id|>' -%}
  {%- set message_roles = ['USER', 'ASSISTANT'] -%}
  {%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
  {%- for message in ns.messages -%}
      {%- if loop.last and message['role'] | upper == 'ASSISTANT' -%}
          {%- set ns.last_assistant_index_for_eos = loop.index0 -%}
      {%- endif -%}
  {%- endfor -%}
  {%- if _mode == 'generate' -%}
      {{ bos_token }}
  {%- endif -%}
  {%- for message in ns.messages -%}
      {%- if message['role'] | upper == 'SYSTEM' and not ns.initial_system_message_handled -%}
          {%- set ns.initial_system_message_handled = true -%}
          {{ '<|start_header_id|>system<|end_header_id|>\n\n' + message['content'] + stop_token }}
      {%- elif message['role'] | upper != 'SYSTEM' -%}
          {%- if (message['role'] | upper == 'USER') != ((loop.index0 - (1 if ns.initial_system_message_handled else 0)) % 2 == 0) -%}
              {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
          {%- endif -%}
          {%- if message['role'] | upper == 'USER' -%}
              {{ '<|start_header_id|>user<|end_header_id|>\n\n' + message['content'] + stop_token }}
          {%- elif message['role'] | upper == 'ASSISTANT' -%}
              {%- if _mode == 'train' -%}
                  {{ '<|start_header_id|>assistant<|end_header_id|>\n\n' + unk_token + message['content'] + stop_token + unk_token }}
              {%- else -%}
                  {{ '<|start_header_id|>assistant<|end_header_id|>\n\n' + message['content'] + (stop_token if loop.index0 != ns.last_assistant_index_for_eos else '') }}
              {%- endif -%}
          {%- endif -%}
      {%- endif -%}
  {%- endfor -%}
  {%- if _mode == 'generate' and ns.last_assistant_index_for_eos == -1 -%}
      {{ '<|start_header_id|>assistant<|end_header_id|>' }}
  {%- endif -%}

Notice: To use conversation settings, default polished Jinja templates will be provided for models that are recommended for chat tuning to guarantee the quality, see the specs in the conversation recommended column at the model spec section. Otherwise, we will still provide a default generic template if no is template provided to overwrite, but the tuned model quality might not be optimal.

Text classification

In this example, we’ll only be training on the instruction and the category field. We won’t use the context and response field at all

# The ID of the dataset you created above.
dataset: my-dataset

text_classification:
  # The JSON field containing the input text to be classified.
  text: instruction

  # The JSON field containing the classification label.
  label: category

# The Hugging Face model name of the base model.
base_model: accounts/fireworks/models/llama-v3p1-8b-instruct

Checking the job status

You can monitor the progress of the tuning job by running

firectl get fine-tuning-job <JOB_ID>

Once the job successfully completes, a model will be created in your account. You can see a list of models by running:

firectl list models

Or if you specified a model ID when creating the fine-tuning job, you can get the model directly:

firectl get model <MODEL_ID>

Deploying and using a model

Before using your fine-tuned model for inference, you must deploy it. Please refer to our guides on Deploying a model and Querying text models for detailed instructions.

Some base models may not support serverless addons. To check:

Run firectl -a fireworks get <base-model-id>
Look under Deployed Model Refs to see if a fireworks-owned deployment exists, e.g. accounts/fireworks/deployments/3c7a68b0
If so, then it is supported

If the base model doesn’t support serverless addons, you will need use an on-demand deployment to deploy it.

Additional tuning options

Evaluation

By default, the fine-tuning job will not run any post-training evaluation. If enabled:

For classification tasks, we measure the number of examples that match the expected label.
For these conversation and text completion tasks, we use perplexity to measure how well the model generates responses.

You can enable model evaluation by specifying one of two options:

evaluation_split: The percentage of the dataset to use for evaluation.

Sample usage:

# ...
evaluation_split: 0.2

evaluation_dataset: The ID of a separate dataset to use for evaluation.

# ...
evaluation_dataset: my-evaluation-dataset

Epochs

Epochs is the number of epochs (i.e. passes over the training data) the job should train for. Non-integer values are supported. If not specified, a reasonable default number will be chosen for you.

notice: we have the max value of 3 millions of dataset examples * epochs

# ...
epochs: 2.0

firectl create fine-tuning-job \
  --epochs 2.0 \
  ...

Learning rate

The learning rate used in training can be configured. If not specified, a reasonable default value will be chosen.

# ...
learning_rate: 0.0001

firectl create fine-tuning-job \
  --learning-rate 0.0001 \
  ...

Warmup Steps

The number of steps to warm up the learning rate. If not specified, a reasonable default value will be chosen.

# ...
warmup_steps: 1000

firectl create fine-tuning-job \
  --warmup-steps 1000 \
  ...

LrScheduler Type(Enterprise accounts only)

The learning rate scheduler type can be configured. If not specified, a reasonable default value will be chosen.

Supported values: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup

# ...
lr_scheduler_type: cosine

firectl create fine-tuning-job \
  --lr-scheduler-type cosine \
  ...

Batch size

The batch size of dataset used in training can be configured with a positive integer less than 1024 and in power of 2. If not specified, a reasonable default value will be chosen.

# ...
batch_size: 32

firectl create fine-tuning-job \
  --batch-size 32 \
  ...

Micro Batch Size(Enterprise accounts only)

Micro batch size is the number of examples to process in each GPU instance. If not specified, a reasonable default value will be chosen.

# ...
micro_batch_size: 2

firectl create fine-tuning-job \
  --micro-batch-size 2 \
  ...

Lora Rank

LoRA rank refers to the dimensionality of trainable matrices in Low-Rank Adaptation fine-tuning, balancing model adaptability and computational efficiency in fine-tuning large language models. The LoRA rank used in training can be configured with a positive integer with a max value of 32. If not specified, a reasonable default value will be chosen.

# ...
lora_rank: 16

firectl create fine-tuning-job \
  --lora-rank 16 \
  ...

Lora Alpha(Enterprise accounts only)

The LoRA alpha parameter controls the effective learning rate of the LoRA updates by scaling the trainable matrices during fine-tuning. A higher alpha value increases the impact of the LoRA updates, while a lower value makes the updates more conservative. If not specified, the system will use an optimized default value.

# ...
lora_alpha: 16

firectl create fine-tuning-job \
  --lora-alpha 16 \
  ...

Lora Target Modules(Enterprise accounts only)

The LoRA target modules parameter specifies the layers of the model to apply LoRA to. If not specified, the system will use an optimized default value.

# ...
lora_target_modules:
  - q_proj
  - v_proj

firectl create fine-tuning-job \
  --lora-target-modules q_proj,v_proj \
  ...

Training progress and monitoring

The fine-tuning service integrates with Weights & Biases to provide observability into the tuning process. To use this feature, you must have a Weights & Biases account and have provisioned an API key.

wandb_entity: my-org
wandb_api_key: xxx
wandb_project: My Project

firectl create fine-tuning-job \
  --wandb-entity my-org \
  --wandb-api-key xxx \
  --wandb-project "My Project" \
  ...

Model ID

By default, the fine-tuning job will generate a random unique ID for the model. This ID is used to refer to the model at inference time. You can optionally specify a custom ID, within (ID constraints)[https://docs.fireworks.ai/getting-started/concepts#resource-names-and-ids].

model_id: my-model

firectl create fine-tuning-job \
  --model-id my-model \
  ...

Job ID

By default, the fine-tuning job will generate a random unique ID for the fine-tuning job. You can optionally choose a custom ID.

job_id: my-fine-tuning-job

firectl create fine-tuning-job \
  --job-id my-fine-tuning-job \
  ...

Downloading model weights

We are opening model weights download to everyone now! simply following the command below

firectl download model <model-id> <target local filepath>

Appendix

Supported base models

The following base models are supported for parameter-efficient fine-tuning (PEFT) and can be deployed as PEFT add-ons on Fireworks serverless and on-demand deployments, using the default parameters below. Serverless deployment is only available for a subset of fine-tuned models - run “get (<model id>)[https://docs.fireworks.ai/models/overview#introduction]” or check the models (page)[https://fireworks.ai/models] to see if there’s an active serverless deployment.

The cut-off length is the maximum limit on the sum of input tokens and generated output tokens.

Model	Batch Size	LoRA Rank	Epochs	Learning Rate	Cut-off Length	Conversation Recommended
accounts/fireworks/models/llama-v3p2-1b-instruct	32	4	1	1.00E-04	16384	Yes
accounts/fireworks/models/llama-v3p2-3b-instruct	32	4	1	1.00E-04	16384	Yes
accounts/fireworks/models/llama-v3p1-70b-instruct	8	4	1	2.00E-05	8192	Yes
accounts/fireworks/models/llama-v3p1-8b-instruct	16	8	1	1.00E-04	8192	Yes
accounts/fireworks/models/llama-v3-70b-instruct-hf	8	4	1	2.00E-05	8192	Yes
accounts/fireworks/models/llama-v3-8b-instruct-hf	16	8	1	1.00E-04	8192	Yes
accounts/fireworks/models/mixtral-8x7b-instruct-hf	16	8	1	1.00E-04	32768	Yes
accounts/fireworks/models/mixtral-8x22b-instruct-hf	16	8	1	1.00E-04	8192	Yes
accounts/fireworks/models/mixtral-8x22b-hf	8	8	1	1.00E-04	8192	No
accounts/fireworks/models/mixtral-8x7b	16	8	1	1.00E-04	8192	No
accounts/fireworks/models/mistral-7b-instruct-v0p2	16	8	1	1.00E-04	4096	Yes
accounts/fireworks/models/mistral-7b	16	8	1	1.00E-04	4096	Yes
accounts/fireworks/models/code-qwen-1p5-7b	16	8	1	3.00E-04	65536	No
accounts/fireworks/models/deepseek-coder-v2-lite-base	16	8	1	3.00E-04	16384	No
accounts/fireworks/models/deepseek-coder-7b-base	16	8	1	3.00E-04	16384	No
accounts/fireworks/models/deepseek-coder-1b-base	16	8	1	3.00E-04	16384	No
accounts/fireworks/models/codegemma-7b	16	8	1	3.00E-04	8192	No
accounts/fireworks/models/codegemma-2b	16	8	1	3.00E-04	8192	No
accounts/fireworks/models/starcoder2-15b	16	8	1	1.00E-04	16384	No
accounts/fireworks/models/starcoder2-7b	16	8	1	1.00E-04	16384	No
accounts/fireworks/models/starcoder2-3b	16	8	1	1.00E-04	16384	No
accounts/fireworks/models/stablecode-3b	16	8	1	3.00E-04	16384	No
accounts/fireworks/models/qwen2-72b-instruct	8	8	1	3.00E-04	16384	Yes

Hugging Face dataset to JSONL

To convert a Hugging Face dataset to the JSONL format supported by our fine-tuning service, you can use the following Python script:

import json
from datasets import load_dataset

dataset = load_dataset("<DATASET_NAME>")

# Replace 'dataset_split' with the appropriate split you want to export, e.g., 'train', 'test', etc.
split_data = dataset["<SPLIT_NAME>"]

counter = 0
with open("<OUTPUT_FILE>.jsonl", "w") as f:
    for item in split_data:
        json.dump(item, f)
        counter += f.write("\n")

print(f"{counter} lines converted")

Support

We’d love to hear what you think! Please connect with the team, ask questions, and share your feedback in the #fine-tuning Discord channel.

Pricing

We charge based on the total number of tokens processed (dataset tokens * number of epochs). Please see our Pricing page for details.

On this page

Quick Links to Legacy Service Guides
Introduction
Installing firectl
Preparing your dataset
Starting your tuning job
Starting from a base model or a PEFT addon model
Text completion
Conversation
Text classification
Checking the job status
Deploying and using a model
Additional tuning options
Evaluation
Epochs
Learning rate
Warmup Steps
LrScheduler Type(Enterprise accounts only)
Batch size
Micro Batch Size(Enterprise accounts only)
Lora Rank
Lora Alpha(Enterprise accounts only)
Lora Target Modules(Enterprise accounts only)
Training progress and monitoring
Model ID
Job ID
Downloading model weights
Appendix
Supported base models
Hugging Face dataset to JSONL
Support
Pricing

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Quick Links to Legacy Service Guides

Introduction

Installing firectl

Preparing your dataset

Starting your tuning job

Starting from a base model or a PEFT addon model

Text completion

Conversation

Text classification

Checking the job status

Deploying and using a model

Additional tuning options

Evaluation

Epochs

Learning rate

Warmup Steps

LrScheduler Type(Enterprise accounts only)

Batch size

Micro Batch Size(Enterprise accounts only)

Lora Rank

Lora Alpha(Enterprise accounts only)

Lora Target Modules(Enterprise accounts only)

Training progress and monitoring

Model ID

Job ID

Downloading model weights

Appendix

Supported base models

Hugging Face dataset to JSONL

Support

Pricing

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Quick Links to Legacy Service Guides

​Introduction

​Installing firectl

​Preparing your dataset

​Starting your tuning job

​Starting from a base model or a PEFT addon model

​Text completion

​Conversation

​Text classification

​Checking the job status

​Deploying and using a model

​Additional tuning options

​Evaluation

​Epochs

​Learning rate

​Warmup Steps

​LrScheduler Type(Enterprise accounts only)

​Batch size

​Micro Batch Size(Enterprise accounts only)

​Lora Rank

​Lora Alpha(Enterprise accounts only)

​Lora Target Modules(Enterprise accounts only)

​Training progress and monitoring

​Model ID

​Job ID

​Downloading model weights

​Appendix

​Supported base models

​Hugging Face dataset to JSONL

​Support

​Pricing

Quick Links to Legacy Service Guides

Introduction

Installing firectl

Preparing your dataset

Starting your tuning job

Starting from a base model or a PEFT addon model

Text completion

Conversation

Text classification

Checking the job status

Deploying and using a model

Additional tuning options

Evaluation

Epochs

Learning rate

Warmup Steps

LrScheduler Type(Enterprise accounts only)

Batch size

Micro Batch Size(Enterprise accounts only)

Lora Rank

Lora Alpha(Enterprise accounts only)

Lora Target Modules(Enterprise accounts only)

Training progress and monitoring

Model ID

Job ID

Downloading model weights

Appendix

Supported base models

Hugging Face dataset to JSONL

Support

Pricing