Creating your first reward function
Creating Your First Reward Function
This step-by-step tutorial will guide you through the process of creating, testing, and deploying your first reward function using the Reward Kit.
Prerequisites
Before starting this tutorial, make sure you have:
- Python 3.8+ installed on your system
- Reward Kit installed:
pip install reward-kit
- Fireworks API credentials (for deployment)
Step 1: Set Up Your Project
First, let’s create a directory structure for our project:
Step 2: Create a Basic Reward Function
Let’s create a simple reward function that evaluates the relevance of a response to a user’s query.
Create a file at metrics/relevance/main.py
:
Step 3: Create Sample Conversations
Let’s create some sample conversations to test our reward function.
Create a file named samples.jsonl
:
Step 4: Create a Test Script
Let’s create a script to test our reward function locally.
Create a file named test_reward.py
:
Step 5: Run Local Tests
Run your test script to see how your reward function performs:
You should see output similar to:
Step 6: Preview Using the CLI
Now let’s use the Reward Kit CLI to preview our evaluation with the sample data:
You should see preview results from the Fireworks API.
Step 7: Deploy Your Reward Function
Once you’re satisfied with your reward function, deploy it to make it available for training workflows:
You should see output confirming that your evaluator was successfully deployed.
Step 8: Create a Deployment Script
For more control over deployment, create a deployment script:
Run the deployment script:
Step 9: Use Your Reward Function in Training
Finally, you can use your deployed reward function in an RL training job:
Improving Your Reward Function
Now that you have a basic reward function, consider these improvements:
- Better Keyword Matching: Use techniques like TF-IDF or word embeddings
- Context Understanding: Consider the full conversation context
- Question Understanding: Detect question types and verify answer formats
- Domain-Specific Knowledge: Add domain knowledge for specialized topics
- Multi-Component Scoring: Add metrics for informativeness, accuracy, etc.
Next Steps
You’ve successfully created your first reward function! To continue your journey:
- Learn about Advanced Reward Functions
- Explore Core Data Types for more flexibility
- Try integrating Multiple Metrics into a single evaluator