Fine-tuning models
We’re introducing an upgraded tuning service with improved speed, usability and reliability! The new service utilizes different commands and model coverage. The new service is offered for free as we’re in public preview.
See these docs to use our legacy service instead.
Introduction
Fireworks’ offers a LoRA-based fine-tuning method designed for usability, reliability and efficiency. LoRA is used for fine-tuning all models besides our 70B models, which uses qLoRA (quantized) to improve training speeds.
The fine-tuning service is provide hassle-free quality improvements through intelligent defaults and little configuration. Models fine-tuned with our service can be seamlessly deployed for inference on Fireworks or downloaded for local usage.
Fine-tuning a model with a dataset can be useful for several reasons:
-
Enhanced Precision: It allows the model to adapt to the unique attributes and trends within the dataset, leading to significantly improved precision and effectiveness.
-
Domain Adaptation: While many models are developed with general data, fine-tuning them with specialized, domain-specific datasets ensures they are finely attuned to the specific requirements of that field.
-
Bias Reduction: General models may carry inherent biases. Fine-tuning with a well-curated, diverse dataset aids in reducing these biases, fostering fairer and more balanced outcomes.
-
Contemporary Relevance: Information evolves rapidly, and fine-tuning with the latest data keeps the model current and relevant.
-
Customization for Specific Applications: This process allows for the tailoring of the model to meet unique objectives and needs, an aspect not achievable with standard models.
In essence, fine-tuning a model with a specific dataset is a pivotal step in ensuring its enhanced accuracy, relevance, and suitability for specific applications. Let’s hop on a journey of fine-tuning a model!
Fine-tuned model inference on Serverless is slower than base model inference on Serverless. For use cases that need low latency, we recommend using on-demand deployments. For on-demand deployements, fine-tuned model inference speeds are significant closer to base model speeds (but still slightly slower). If you are only using 1 LoRA on-demand, merging fine-tuned weights into the base model when using on-demand deployments will provide identical speed to base model inference. If you have an enterprise use case that needs fast fine-tuned models, please contact us!
Pricing
Our new tuning service is currently free but will be charged based on the total number of tokens processed (dataset tokens * number of epochs). Running inference on fine-tuned models incurs no extra costs outside of base inference fees.
See our Pricing page for pricing details on our legacy tuning service.
Installing firectl
firectl
is the command-line (CLI) utility to manage, and deploy various resources on the Fireworks AI Platform. Use firectl
to manage fine-tuning jobs and their resulting models.
Please visit the Firectl Getting Started Guide on installing and using firectl
.
Preparing your dataset
To fine-tune a model, we need to first upload a dataset. Once uploaded, this dataset can be used to create one or more fine-tuning jobs. A dataset consists of a single JSONL file, where each line is a separate training example.
Limits:
-
Minimum number of examples is 3.
-
Maximum number of examples is 3,000,000.
Format:
- Each line of the file must be a valid JSON object.
Each dataset must conform to the schema expected by our OpenAI-compatible Chat Completions API. Each JSON object of the dataset must contain a single array field called messages
. Each message is an object containing two fields:
-
role
- one of “system”, “user”, or “assistant”. -
content
- the content of the message.
A message with the “system” role is optional, but if specified, must be the first message of the conversation. Subsequent messages start with “user” and alternate between “user” and “assistant”. See below for example training examples:
Creating your dataset
To create a dataset, run:
and you can check the dataset with:
Starting your tuning job
To start a structured fine-tuning job (sftj), run:
For example:
firectl will return the fine-tuning job ID.
When creating a fine-tuning job, you can start tuning from a base model, or from a model you tuned earlier (LoRA add-on):
-
Base model: Use the
base-model
parameter to start from a pre-trained base model. -
Existing LoRA add-on: Use the
warm-start-from
parameter to start from an existing LoRA addon model, where the LoRA is specified with the format “accounts/<account-id>/models/<addon-model-id>”
You must specify either base-model
or warm-start-from
in your command-line flags.
Checking the job status
You can monitor the progress of the tuning job by running
Once the job successfully completes, a model will be created in your account. You can see a list of models by running:
Or if you specified a model ID when creating the fine-tuning job, you can get the model directly:
Deploying and using a model
Before using your fine-tuned model for inference, you must deploy it. Please refer to our guides on Deploying a model and Querying text models for detailed instructions.
Some base models may not support serverless addons. To check:
-
Run
firectl -a fireworks get <base-model-id>
-
Look under
Deployed Model Refs
to see if afireworks
-owned deployment exists, e.g.accounts/fireworks/deployments/3c7a68b0
-
If so, then it is supported
If the base model doesn’t support serverless addons, you will need use an on-demand deployment to deploy it.
Additional tuning options
Tuning settings are specified when starting a fine-tuning job. All of the below settings are optional and will have reasonable defaults if not specified. For settings that affect tuning quality like epochs learning rate, we recommend using default settings and only changing hyperparameters if results are not as desired. All tuning options must be specified via command line flags as shown in the below example command with multiple flags.
Evaluation
By default, the fine-tuning job will run evaluation by running the fine-tuned model against an evaluation set that’s created by automatically carving out a portion of your training set. You have the option to explicitly specify a separate evaluation dataset to use instead of carving out training data.
evaluation_dataset
: The ID of a separate dataset to use for evaluation. Must be pre-uploaded via firectl
Early stopping
Early stopping stops training early in the validation loss does not improve. It is off by default
Max Context Length
By default, fine-tuned models support a max context length of 8k. Increase max context length if your use case requires context above 8k. Maximum context length can be increased up to the default context length of your selected model. For models with over 70B parameters, we only support up to 32k max context length.
Epochs
Epochs are the number of passes over the training data. Our default value is 1. If the model does not follow the training data as much as expected, increase the number of epochs by 1 or 2. Non-integer values are supported.
Note: we set a max value of 3 million dataset examples * epochs
Learning rate
Learning rate controls how fast the model updates from data. We generally do not recommend changing learning rate. The default value set is automatically based on your selected model.
Lora Rank
LoRA rank refers to the number of parameters that will be tuned in your LoRA add-on. Higher LoRA rank increases the amount of information that can be captured while tuning. LoRA rank must be a power of 2 up to 64. Our default value is 8.
Training progress and monitoring
The fine-tuning service integrates with Weights & Biases to provide observability into the tuning process. To use this feature, you must have a Weights & Biases account and have provisioned an API key.
Model ID
By default, the fine-tuning job will generate a random unique ID for the model. This ID is used to refer to the model at inference time. You can optionally specify a custom ID, within (ID constraints)[https://docs.fireworks.ai/getting-started/concepts#resource-names-and-ids].
Job ID
By default, the fine-tuning job will generate a random unique ID for the fine-tuning job. You can optionally choose a custom ID.
Downloading model weights
To download model weights run
Appendix
Supported base models - tuning
Fireworks tuning service is limited to select models where we’re confident in providing intelligent defaults for a hassle-free experience. Currently, we only support tuning models with the following architectures:
-
Llama 1,2,3.x architectures are supported. Llama vision models and Llama 405B currently not supported
-
Qwen2 architectures are supported.
Supported base models - LoRAs on dedicated deployment
LoRAs can be deployed for inference on dedicated deployments (on-demand or enterprise reserved) for the following models:
-
All models supported for tuning
-
accounts/fireworks/models/mixtral-8x7b-instruct-hf
-
accounts/fireworks/models/mixtral-8x22b-instruct-hf
-
accounts/fireworks/models/mixtral-8x22b-hf
-
accounts/fireworks/models/mixtral-8x7b
-
accounts/fireworks/models/mistral-7b-instruct-v0p2
-
accounts/fireworks/models/mistral-7b
-
accounts/fireworks/models/code-qwen-1p5-7b
-
accounts/fireworks/models/deepseek-coder-v2-lite-base
-
accounts/fireworks/models/deepseek-coder-7b-base
-
accounts/fireworks/models/deepseek-coder-1b-base
-
accounts/fireworks/models/codegemma-7b
-
accounts/fireworks/models/codegemma-2b
-
accounts/fireworks/models/starcoder2-15b
-
accounts/fireworks/models/starcoder2-7b
-
accounts/fireworks/models/starcoder2-3b
-
accounts/fireworks/models/stablecode-3b
This means that up to 100 LoRAs can be deployed to a dedicated instance for no extra fees compared to the base deployment costs.
Supported base models - LoRAs on serverless
The following base models are supported for low-rank adaptation (LoRA) and can be deployed as LoRA add-ons on Fireworks serverless and on-demand deployments, using the default parameters below. Serverless deployment is only available for a subset of fine-tuned models - run “get (<model id>)[https://docs.fireworks.ai/models/overview#introduction]” or check the models (page)[https://fireworks.ai/models] to see if there’s an active serverless deployment.
A limited number of models are available for serverless LoRA deployment, meaning that up to 100 LoRAs can be deployed to serverless and are constantly available on a pay-per-token basis.
-
accounts/fireworks/models/llama-v3p1-8b-instruct
-
accounts/fireworks/models/llama-v3p1-70b-instruct
-
accounts/fireworks/models/llama-v3p2-3b-instruct
Support
We’d love to hear what you think! Please connect with the team, ask questions, and share your feedback in the #fine-tuning Discord channel.
Was this page helpful?