How it works
During training, each sample’s loss is multiplied by its weight before being used to update model parameters. Higher weights mean stronger learning signals—the model pays more attention to these examples. Lower weights (including negative) reduce or reverse a sample’s influence.Dataset format
Add aweight field at the root level of each JSON object in your JSONL dataset:
Weight values
Theweight field accepts any floating-point number:
| Weight | Effect |
|---|---|
> 1.0 | Increased importance—model learns more from this sample |
1.0 | Default behavior (same as omitting weight) |
0.0 - 1.0 | Reduced importance—sample has less influence |
0.0 | Sample is effectively ignored during training |
< 0.0 | Negative weight—reverses the learning signal |
Use cases
Upweight high-quality examples
When you have samples of varying quality, give more weight to your best examples:Balance dataset distribution
If certain prompt types are underrepresented, upweight them to ensure the model learns them well:De-emphasize noisy samples
If you have samples that may contain noise but can’t easily be filtered, reduce their weight:Message filtering
For multi-turn conversations, you can also control which assistant messages to include in training by adding aweight field to individual messages. This uses a binary format following the OpenAI fine-tuning specification.
0 or 1:
1: Include this assistant message in training (default)0: Exclude this assistant message from training
Message-level weights are for filtering which turns to train on, not for adjusting training influence. Use sample-level weights (at the root of the JSON object) for weighted importance.
Example dataset
Here’s a complete example of a weighted RFT dataset:dataset.jsonl