Direct Preference Optimization (DPO) is a modern and efficient method for fine-tuning large language models (LLMs) to align with human preferences. In essence, DPO allows developers to refine the behavior of a pre-trained LLM, guiding it to produce responses that are more desirable and helpful, while reducing the likelihood of generating less favored or potentially incorrect outputs. This approach simplifies the process of training models to generate desired responses, particularly useful in scenarios where subjective elements like tone, style, or specific content preferences are crucial and may not have a clearly defined “correct” answer.

How DPO Works

DPO utilizes pairwise comparisons to refine the model’s behavior. Essentially, for a given prompt, the model is presented with two responses: one that is considered the “preferred” or positive example, and another that is labeled as “non-preferred” or negative. The model is then trained to increase the probability of generating the preferred response and decrease the probability of generating the non-preferred response. This process effectively teaches the model to replicate the preference patterns observed in the provided comparison data.

Step-by-Step Guide to Fine-Tuning with Fireworks AI

1

Prepare dataset

Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.Minimum Requirements:
  • Minimum examples needed: 3
  • Maximum examples: Up to 3 million examples per dataset
  • File format: JSONL (each line is a valid JSON object)
  • Dataset Schema: Each training sample must include the following fields:
    • An input field containing a messages array, where each message is an object with two fields:
      • role: one of system, user, or assistant
      • content: a string representing the message content
    • A preferred_output field containing an assistant message with an ideal response
    • A non_preferred_output field containing an assistant message with a suboptimal response
Here’s an example conversation dataset (one training example):
einstein_dpo.jsonl
{
  "input": {
    "messages": [
      {
        "role": "user",
        "content": "What is Einstein famous for?"
      }
    ],
    "tools": []
  },
  "preferred_output": [
    {
      "role": "assistant",
      "content": "Einstein is renowned for his theory of relativity, especially the equation E=mc²."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "content": "He was a famous scientist."
    }
  ]
}
We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
Save this dataset as jsonl file locally, for example einstein_dpo.jsonl.
2

Create and upload the dataset

There are a couple ways to upload the dataset to Fireworks platform for fine tuning: firectl, Restful API , builder SDK or UI.
  • You can simply navigate to the dataset tab, click Create Dataset and follow the wizard. Dataset Pn
While all of the above approaches should work, UI is more suitable for smaller datasets < 500MB while firectl might work better for bigger datasets.Ensure the dataset ID conforms to the resource id restrictions.
3

Create a DPO Job

Simple use firectl to create a new DPO job:
firectl create dpoj \
  --base-model accounts/account-id/models/base-model-id \
  --dataset accounts/my-account-id/datasets/my-dataset-id \
  --output-model new-model-id
for our example, we might run the following command:
firectl create dpoj \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --dataset accounts/pyroworks/datasets/einstein-dpo \
  --output-model einstein-dpo-model
to fine-tune a Llama 3.1 8b Instruct model with our Einstein dataset.
4

Monitor the DPO Job

Use firectl to monitor progress updates for the DPO fine-tuning job.
firectl get dpoj dpo-job-id
Once the job is complete, the STATE will be set to JOB_STATE_COMPLETED, and the fine-tuned model can be deployed.
5

Deploy the DPO fine-tuned model

Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to deploying a fine-tuned model for more details.

Next Steps

Fireworks AI provides multiple options for fine-tuning models. Explore other fine-tuning methods to improve model output.

Appendix

Python builder SDK references Restful API references firectl references