Reward Kit Examples
This page provides an overview of and links to documentation for various examples demonstrating the capabilities of the Reward Kit. All documentation for these examples is self-contained within thedocs/
folder.
Many examples use Hydra for configuration. Please refer to the specific documentation page for each example for execution instructions.
Available Examples
-
Accuracy Length Example:
- Demonstrates combined accuracy and length rewards.
- View Documentation
-
APPS Coding Example:
- Illustrates evaluation for coding problems from the APPS dataset.
- View Documentation (New file to be created)
-
E2B (Code Execution Sandbox) Examples:
- Covers various E2B integration scenarios for sandboxed code execution.
- View Documentation
-
GCP Cloud Run Deployment Example:
- Shows how to deploy a Reward Kit application on GCP Cloud Run.
- View Documentation (New file to be created)
-
Math Example (GSM8K):
- Focuses on evaluating math word problems using the GSM8K dataset.
- View Documentation
-
Math with Formatting Example:
- Extends math evaluation to handle specific formatting requirements.
- View Documentation (New file to be created)
-
Tool Calling Example:
- Demonstrates evaluation for models that use tools or function calls.
- View Documentation (New file to be created)
-
TRL Integration Example:
- Shows how to integrate reward-kit functions with the TRL library.
- View Documentation
examples/metrics/
and examples/test_tasks/
directories in the root examples/
folder contain supporting resources and are not standalone documented examples here.
General Guides for Examples
While the pages above cover specific examples, these general guides might also be useful:- Reward Functions Overview (To be reviewed/updated)
- Basic Reward Function Concepts (To be reviewed/updated)
- Advanced Reward Function Concepts (To be reviewed/updated)
Next Steps
- See the Developer Guide for comprehensive information.
- Check the Tutorials for step-by-step guides.
- Refer to the API Reference for detailed documentation.