This guide will focus on using supervised fine-tuning to fine-tune and deploy a model with on-demand and serverless hosting. To try reinforcement fine-tuning, please refer to reinforcement fine-tuning.

Fine-tuning a model using SFT

1

Confirm model support for fine-tuning

You can confirm that a base model is available to fine-tune by looking for the Tunnable tag in the model library or by using:

firectl get model -a fireworks <MODEL-ID>

And looking for Tunable: true.

Some base models cannot be tuned on Fireworks (Tunable: false) but still list support for LoRA (Supports Lora: true). This means that users can tune a LoRA for this base model on a separate platform and upload it to Fireworks for inference. Consult importing fine-tuned models for more information.

2

Prepare a dataset

Datasets must be in JSONL format, where each line represents a complete JSON-formatted training example. Make sure your data conforms to the following restrictions:

  • Minimum examples: 3
  • Maximum examples: 3 million per dataset
  • File format: .jsonl
  • Message schema: Each training sample must include a messages array, where each message is an object with two fields:
    • role: one of system, user, or assistant. A message with the system role is optional, but if specified, it must be the first message of the conversation
    • content: a string representing the message content

Here is an example conversation dataset:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris."}]}
{"messages": [{"role": "user", "content": "What is 1+1?"}, {"role": "assistant", "content": "2"}, {"role": "user", "content": "Now what is 2+2?"}, {"role": "assistant", "content": "4"}]}
3

Create and upload a dataset

Use the following command to upload your dataset to Fireworks. Ensure the dataset ID conforms to the resource id restrictions.

firectl create dataset <DATASET_ID> path/to/training_dataset.jsonl
firectl get dataset <DATASET_ID>
4

Launch a fine-tuning job

To launch the supervised fine-tuning job, run the below command. Ensure the fine tuned model ID conforms to the resource id restrictions. This will return a fine-tuning job ID. For a full explanation of the settings available to control the fine-tuning process, including learning rate and epochs, consult additional SFT job settings.

firectl create sftj --base-model <MODEL_ID> --dataset <DATASET_ID> --output-model <FINE_TUNED_MODEL_ID>

Instead of tuning a base model, you can also start tuning from a previous LoRA model using

firectl create sftj --warm-start-from <FINE_TUNED_MODEL_ID> --dataset <DATASET_ID> --output-model <FINE_TUNED_MODEL_ID>

Notice that we use --warm-start-from instead of --base-model when creating this job.

You can monitor the progress of the tuning job by running

Once the job successfully completes, you will see the new LoRA model in your model list

firectl list models

Deploying a fine-tuned model using an on-demand deployment

Use the following command to deploy your fine-tuned model using an on-demand deployment:

firectl create deployment <FINE_TUNED_MODEL_ID>

All parameters available to configure base model on-demand deployments, such as autoscaling policy, are configurable for these deployments as well. For a full list of these options, see on-demand deployments

If you have several fine-tuned versions of the same base model and want them to share the same deployment to increase utilization, you can enable multi-LoRA instead.

Deploying a fine-tuned model serverlessly

For some base models, Fireworks offers support for serverless LoRA, allowing you to serverlessly host your fine-tuned model. To confirm if a base model is available serverlessly, look for the Serverless tag in the model library or use:

firectl -a fireworks get model <BASE_MODEL_ID>

and look under Deployed Model Refs to see if there exists a Fireworks-owned deployment (accounts/fireworks/deployments/{SOME_DEPLOYMENT_ID}) and Supports LoRA: true

If this is the case, then you can use

firectl load-lora <FINE_TUNED_MODEL_ID>   

to deploy the fine-tuned model serverlessly instead of creating your own on-demand deployment. Note that the same rate limits apply to serverless fine-tuned models as serverless base models and that the performance of a fine-tuned serverless model will be lower than both deploying the fine-tuned model on-demand (previous section) and calling the original base model serverlessly.

Unused addons may be automatically unloaded after a week.

Additional SFT job settings

Additional tuning settings are available when starting a fine-tuning job. All of the below settings are optional and will have reasonable defaults if not specified. For settings that affect tuning quality like epochs and learning rate, we recommend using default settings and only changing hyperparameters if results are not as desired. All tuning options must be specified via command line flags as shown in the below example:

firectl create sftj \
--base-model llama-v3p1-8b-instruct \
--dataset cancerset \
--output-model my-tuned-model \
--job-id my-fine-tuning-job \
--learning-rate 0.0001 \
--epochs 2 \
--early-stop \
--evaluation-dataset my-eval-set

Evaluation

By default, the fine-tuning job will run evaluation by running the fine-tuned model against an evaluation set that’s created by automatically carving out a portion of your training set. You have the option to explicitly specify a separate evaluation dataset to use instead of carving out training data.

evaluation_dataset: The ID of a separate dataset to use for evaluation. Must be pre-uploaded via firectl

firectl create sftj \
 ...
  --evaluation-dataset my-eval-set \
  ...

Early stopping

Early stopping stops training early in the validation loss does not improve. It is off by default

firectl create sftj \
 ...
  --early-stop \
  ...

Max Context Length

By default, fine-tuned models support a max context length of 8k. Increase max context length if your use case requires context above 8k. Maximum context length can be increased up to the default context length of your selected model. For models with over 70B parameters, we only support up to 65536 max context length.

firectl create sftj \
 ...
  --max-context-length 65536
  ...

Epochs

Epochs are the number of passes over the training data. Our default value is 1. If the model does not follow the training data as much as expected, increase the number of epochs by 1 or 2. Non-integer values are supported.

Note: we set a max value of 3 million dataset examples * epochs

firectl create sftj \
 ...
  --epochs 2.0 \
  ...

Learning rate

Learning rate controls how fast the model updates from data. We generally do not recommend changing learning rate. The default value set is automatically based on your selected model.

firectl create sftj \
  ...
  --learning-rate 0.0001 \
  ...

Lora Rank

LoRA rank refers to the number of parameters that will be tuned in your LoRA add-on. Higher LoRA rank increases the amount of information that can be captured while tuning. LoRA rank must be a power of 2 up to 64. Our default value is 8.

firectl create sftj \
...
  --lora-rank 16 \
  ...

Training progress and monitoring

The fine-tuning service integrates with Weights & Biases to provide observability into the tuning process. To use this feature, you must have a Weights & Biases account and have provisioned an API key.

firectl create sftj \
 ...
  --wandb-entity my-org \
  --wandb-api-key xxx \
  --wandb-project "My Project" \
  ...

Model ID

By default, the fine-tuning job will generate a random unique ID for the model. This ID is used to refer to the model at inference time. You can optionally specify a custom ID, within ID constraints.

firectl create sftj \
  ...
  --output-model-id my-model \
  ...

Job ID

By default, the fine-tuning job will generate a random unique ID for the fine-tuning job. You can optionally choose a custom ID.

firectl create sftj \
  ...
  --job-id my-fine-tuning-job \
  ...

Turbo Mode

By default, the fine-tuning job will use a single GPU. You can optionally enable the turbo mode to accelerate with multiple GPUs (only for non-Deepseek models)

firectl create sftj \
  ...
  --turbo \
  ...