This page provides an overview of and links to documentation for various examples demonstrating the capabilities of the Reward Kit. All documentation for these examples is self-contained within the docs/
folder.
Many examples use Hydra for configuration. Please refer to the specific documentation page for each example for execution instructions.
Accuracy Length Example:
APPS Coding Example:
E2B (Code Execution Sandbox) Examples:
GCP Cloud Run Deployment Example:
Math Example (GSM8K):
Math with Formatting Example:
Tool Calling Example:
TRL Integration Example:
Note: The examples/metrics/
and examples/test_tasks/
directories in the root examples/
folder contain supporting resources and are not standalone documented examples here.
While the pages above cover specific examples, these general guides might also be useful: