Reward functions overview
Reward Functions Overview
This guide provides an overview of all out-of-the-box reward functions available in the Reward Kit library.
Introduction
Reward Kit includes several pre-built reward functions for common evaluation tasks. These functions can be used directly or as building blocks for more complex evaluations.
Available Reward Functions
Format and Structure Rewards
These reward functions evaluate the format and structure of responses.
-
Format Reward: Evaluate responses against a regex pattern (e.g.,
<think>...</think><answer>...</answer>
) -
Tag Count Reward: Check for exactly one of each specified tag
Accuracy and Correctness Rewards
These reward functions evaluate the accuracy of responses against expected answers.
-
Accuracy Reward: Compare answers to ground truth
-
Math Reward: Compare numerical answers with expected values
Language and Style Rewards
These reward functions evaluate linguistic aspects of responses.
-
Language Consistency Reward: Ensure response is in the target language
-
Reasoning Steps Reward: Encourage step-by-step reasoning
Length and Verbosity Rewards
These reward functions evaluate the length and verbosity of responses.
-
Length Reward: Evaluate response against length targets
-
Cosine Length Reward: Scale rewards based on length using cosine schedule
-
Repetition Penalty Reward: Penalize repetitive content
Code Execution Rewards
These reward functions evaluate code by running it and comparing the output to expected results.
-
Binary Code Reward: Binary pass/fail for code execution
-
Fractional Code Reward: Return exact pass rate for code execution
-
IOI C/C++ Code Reward: Evaluate C/C++ code using Piston engine
-
Binary C/C++ Code Reward: Binary pass/fail for C/C++ code
Function Calling Rewards
These reward functions evaluate function calls in LLM responses against expected schemas and behaviors.
-
Schema Jaccard Reward: Compare function calls to expected schema
-
LLM Judge Reward: Use an LLM to evaluate function call quality
-
Composite Function Call Reward: Combine schema validation and LLM judgment
JSON Schema Rewards
These reward functions validate JSON outputs against predefined schemas.
- JSON Schema Reward: Validate JSON against a schema
Combined Metrics Rewards
These reward functions combine multiple evaluation aspects into a single score.
- Cosine-Scaled Accuracy + Length Reward: Combine accuracy with length efficiency
Choosing the Right Reward Function
Here’s a guide to help you choose the appropriate reward function for your task:
Task | Recommended Reward Function |
---|---|
Evaluating format adherence | format_reward |
Checking tag usage and structure | tag_count_reward |
Evaluating factual accuracy | accuracy_reward |
Ensuring consistent language | language_consistency_reward |
Encouraging step-by-step reasoning | reasoning_steps_reward |
Controlling response length | length_reward |
Optimizing for brevity and correctness | cosine_scaled_accuracy_length_reward |
Reducing repetition | repetition_penalty_reward |
Evaluating Python code | fractional_code_reward or binary_code_reward |
Evaluating C/C++ code | ioi_cpp_code_reward or binary_cpp_code_reward |
Validating tool use and function calls | composite_function_call_reward |
Checking structured data outputs | json_schema_reward |
Evaluating mathematical solutions | math_reward |
Evaluating formal proofs in Lean | lean_prover_reward , deepseek_prover_v2_reward |
Lean Theorem Prover Rewards
These reward functions evaluate formal proofs written in the Lean theorem prover language.
-
Lean Prover Reward: Basic evaluation of Lean proofs
-
DeepSeek Prover V2 Reward: Evaluate Lean proofs with focus on subgoal decomposition
-
DeepSeek HuggingFace Prover Benchmark: Evaluate proofs against the DeepSeek-ProverBench dataset
Combining Reward Functions
You can combine multiple reward functions to create comprehensive evaluations:
Pre-Built Combined Metrics
Reward Kit offers pre-built functions that combine multiple metrics:
-
Cosine-Scaled Accuracy + Length: Combines accuracy with length using a cosine schedule
This function:
- Evaluates response accuracy against ground truth
- Measures response length efficiency using a cosine schedule
- Rewards shorter correct answers more than longer ones
- Maintains a clear separation between correct and incorrect answers
- Allows customizable weighting between accuracy and length
Next Steps
- Explore individual reward function documentation:
- Learn how to create your own reward functions
- Read best practices for effective evaluations
- See examples of common evaluation workflows