Developer Guide
Agent evaluation
Agent Evaluation Framework
The Agent Evaluation Framework allows you to evaluate agent models with tool-augmented reasoning using “Task Bundles” - self-contained directories that include all the necessary components for testing and evaluation.
Task Bundle Structure
A task bundle is a self-contained directory with all the components needed to evaluate an agent:
CLI Usage
The agent evaluation framework is integrated with the Reward Kit CLI through the agent-eval
command.
Basic Usage
Environment Variables
Models can be specified using environment variables:
Advanced Options
Testing & Debugging
The CLI provides several options for testing and debugging: