reward-kit run
): Running evaluations locally using Hydra-based configurations to generate responses and score them.reward-kit preview
): Inspecting or re-evaluating generated outputs.conf/dataset/
or an example’s conf/dataset/
directory) to define how raw data is sourced, processed, and formatted (e.g., adding system prompts).
Refer to the Dataset Configuration Guide for detailed instructions.
@reward_function
decorator or by structuring your evaluation logic within a script that can be called by an evaluation configuration.
reward-kit run
reward-kit run
CLI command, which uses Hydra for configuration. This command handles generating model responses (if needed) and evaluating them according to your specified dataset and reward logic.
run_my_eval.yaml
) that specifies:
conf/dataset/
).examples/math_example/conf/run_math_eval.yaml
.
run_my_eval_results.jsonl
) and prompt/response pairs (e.g., preview_input_output_pairs.jsonl
) to a timestamped output directory (usually under outputs/
).reward-kit run
, you can use reward-kit preview
to inspect the generated preview_input_output_pairs.jsonl
or re-evaluate them with different/updated metrics.
*.jsonl
result files programmatically (e.g., with Pandas) for custom analysis, plotting, or reporting.
deploy()
method on a reward function object or the reward-kit deploy
CLI command.
deploy()
Method (Programmatic)@reward_function
), you can deploy it directly:
reward-kit deploy
)reward-kit deploy
command is suitable for deploying reward functions defined in script files. The --metrics-folders
argument should point to the directory containing your reward function script (e.g., a main.py
with the @reward_function
decorator).
reward-kit deploy
, see the CLI Reference.
create_evaluation
Function@reward_function
decorator’s deploy
method, you can use the create_evaluation
function from reward_kit.evaluation
. This is generally for more advanced use cases or internal tooling.
reward-kit run
and reward-kit preview
extensively.