General usage
Reward Kit Examples
This directory contains examples demonstrating how to use the Reward Kit library for evaluating and deploying reward functions for LLM fine-tuning.
Prerequisites
Before running the examples, make sure you have:
- A Fireworks AI account and API key
- The Reward Kit package installed
Setup
1. Create a Virtual Environment
2. Install Reward Kit
3. Configure API Access
For development, use these environment variables:
For production, use:
Example Walkthroughs
Combined Accuracy and Length Evaluation
The accuracy_length/cosine_scaled_example.py
demonstrates the cosine_scaled_accuracy_length_reward
function which evaluates responses based on both accuracy and length efficiency.
This example:
- Demonstrates evaluation of different response types (short correct, long correct, short incorrect, long incorrect)
- Shows how the combined reward function prioritizes short correct answers
- Illustrates customizing the weights between accuracy and length components
See the Accuracy + Length Overview for more details.
Basic Evaluation Example
The evaluation_preview_example.py
demonstrates how to preview and create an evaluation using the Reward Kit.
Step 1: Understand the Metric
Examine the example metric in the metrics/word_count
directory. This metric evaluates responses based on their word count:
Step 2: Prepare Sample Data
Review the sample conversations in samples/samples.jsonl
. Each line contains a JSON object representing a conversation:
Step 3: Run the Preview
Execute the evaluation preview example:
This will:
- Load the word count metric from
examples/metrics/word_count
- Load sample conversations from
examples/samples/samples.jsonl
- Preview the evaluator using the Fireworks API
- Display the evaluation results for each sample
- Create an evaluator named “word-count-eval”
Deployment Example
The deploy_example.py
demonstrates how to deploy a reward function to the Fireworks platform.
Step 1: Examine the Reward Function
Review the informativeness reward function in the deploy example, which evaluates responses based on:
- Length
- Specificity markers
- Content density
Step 2: Run the Deployment
Execute the deployment example:
This will:
- Test the reward function locally with sample data
- Deploy the function to the Fireworks platform
- Display the deployed evaluator ID
Using the CLI
The Reward Kit also provides a command-line interface for common operations.
Preview an Evaluator Using CLI
Deploy an Evaluator Using CLI
Creating Your Own Evaluators
Follow these steps to create your own custom evaluator:
- Create a directory for your metric (e.g.,
my_metrics/coherence
) - Create a
main.py
file with anevaluate
function - Test your evaluator using the preview functionality
- Deploy your evaluator when ready
Example Custom Metric
Next Steps
After exploring these examples, you can:
- Create your own custom metrics
- Integrate reward functions into model training workflows
- Use deployed evaluators to score model outputs
- Combine multiple metrics for comprehensive evaluation