Accuracy + Length Reward Examples

This directory contains examples demonstrating the use of combined accuracy and length-based reward functions.

Overview

These examples show how to use the cosine_scaled_accuracy_length_reward function to evaluate model responses based on both:

Accuracy (correctness of the answer)
Length efficiency (brevity of the response)

This combined approach rewards responses that are both accurate and concise, penalizing verbosity in correct answers and providing a clear separation between correct and incorrect responses. Note: The accuracy detection depends on specific text-extraction mechanisms that may need customization for different types of content using the extract_fn and compare_fn parameters.

Examples

Cosine-Scaled Accuracy + Length Example

The cosine_scaled_example.py script demonstrates the reward function’s behavior with different types of responses:

Short correct answers (highest score)
Long correct answers (moderate score)
Short incorrect answers (very low score)
Long incorrect answers (low score, but still penalized for being wrong)

It also shows how to customize the weighting between accuracy and length components.

Running the Examples

# Make sure you're in the reward-kit directory
cd /path/to/reward-kit

# Activate the virtual environment
source .venv/bin/activate

# Run the example
python examples/accuracy_length/cosine_scaled_example.py

Expected Output

===== Evaluating with Default Parameters =====

Short Correct Answer:
Response (1 words): "Paris..."
Combined Score: 1.00
Accuracy Score: 1.00
Length Score: 1.00

Long Correct Answer:
Response (69 words): "The capital of France is Paris. Paris is located i..."
Combined Score: 0.88
Accuracy Score: 1.00
Length Score: 0.61

Short Incorrect Answer:
Response (1 words): "Lyon..."
Combined Score: 0.00
Accuracy Score: 0.00
Length Score: 0.00

Long Incorrect Answer:
Response (46 words): "I need to identify the capital city of France. Fra..."
Combined Score: 0.04
Accuracy Score: 0.00
Length Score: 0.13

===== Evaluating with Custom Parameters =====

Short Correct Answer (80% accuracy weight, 20% length weight):
Response (1 words): "Paris..."
Combined Score: 1.00
Accuracy Score: 1.00
Length Score: 1.00

Custom Configurations

You can customize the reward function with various parameters:

from reward_kit.rewards.accuracy_length import cosine_scaled_accuracy_length_reward

result = cosine_scaled_accuracy_length_reward(
    messages=messages,
    ground_truth="Expected answer",
    max_length=500,                # Maximum ideal length
    correctness_weight=0.7,        # Weight for accuracy component
    length_weight=0.3,             # Weight for length component
    min_value_correct=0.5,         # Minimum score for correct answers
    max_value_correct=1.0,         # Maximum score for correct answers
    min_value_wrong=0.0,           # Minimum score for wrong answers
    max_value_wrong=0.3,           # Maximum score for wrong answers
    token_method="whitespace"      # Method to count tokens
)

Use Cases

This reward function is particularly useful for:

Factual QA tasks where concise, correct answers are preferred
Text summarization evaluation
Mathematical problem-solving with step-by-step reasoning
Any task where both accuracy and brevity are important

Evaluators

Accuracy length overview

Accuracy + Length Reward Examples

Overview

Examples

Cosine-Scaled Accuracy + Length Example

Running the Examples

Expected Output

Custom Configurations

Use Cases

Further Reading

Evaluators

​Accuracy + Length Reward Examples

​Overview

​Examples

​Cosine-Scaled Accuracy + Length Example

​Running the Examples

​Expected Output

​Custom Configurations

​Use Cases

​Further Reading

Accuracy + Length Reward Examples

Overview

Examples

Cosine-Scaled Accuracy + Length Example

Running the Examples

Expected Output

Custom Configurations

Use Cases

Further Reading