These examples show how to use the cosine_scaled_accuracy_length_reward function to evaluate model responses based on both:
Accuracy (correctness of the answer)
Length efficiency (brevity of the response)
This combined approach rewards responses that are both accurate and concise, penalizing verbosity in correct answers and providing a clear separation between correct and incorrect responses.Note: The accuracy detection depends on specific text-extraction mechanisms that may need customization for different types of content using the extract_fn and compare_fn parameters.
# Make sure you're in the reward-kit directorycd /path/to/reward-kit# Activate the virtual environmentsource .venv/bin/activate# Run the examplepython examples/accuracy_length/cosine_scaled_example.py
===== Evaluating with Default Parameters =====Short Correct Answer:Response (1 words): "Paris..."Combined Score: 1.00Accuracy Score: 1.00Length Score: 1.00Long Correct Answer:Response (69 words): "The capital of France is Paris. Paris is located i..."Combined Score: 0.88Accuracy Score: 1.00Length Score: 0.61Short Incorrect Answer:Response (1 words): "Lyon..."Combined Score: 0.00Accuracy Score: 0.00Length Score: 0.00Long Incorrect Answer:Response (46 words): "I need to identify the capital city of France. Fra..."Combined Score: 0.04Accuracy Score: 0.00Length Score: 0.13===== Evaluating with Custom Parameters =====Short Correct Answer (80% accuracy weight, 20% length weight):Response (1 words): "Paris..."Combined Score: 1.00Accuracy Score: 1.00Length Score: 1.00