Accuracy length overview
Accuracy + Length Reward Examples
This directory contains examples demonstrating the use of combined accuracy and length-based reward functions.
Overview
These examples show how to use the cosine_scaled_accuracy_length_reward
function to evaluate model responses based on both:
- Accuracy (correctness of the answer)
- Length efficiency (brevity of the response)
This combined approach rewards responses that are both accurate and concise, penalizing verbosity in correct answers and providing a clear separation between correct and incorrect responses.
Note: The accuracy detection depends on specific text-extraction mechanisms that may need customization for different types of content using the extract_fn
and compare_fn
parameters.
Examples
Cosine-Scaled Accuracy + Length Example
The cosine_scaled_example.py script demonstrates the reward function’s behavior with different types of responses:
- Short correct answers (highest score)
- Long correct answers (moderate score)
- Short incorrect answers (very low score)
- Long incorrect answers (low score, but still penalized for being wrong)
It also shows how to customize the weighting between accuracy and length components.
Running the Examples
Expected Output
Custom Configurations
You can customize the reward function with various parameters:
Use Cases
This reward function is particularly useful for:
- Factual QA tasks where concise, correct answers are preferred
- Text summarization evaluation
- Mathematical problem-solving with step-by-step reasoning
- Any task where both accuracy and brevity are important
Further Reading
For more information, see: