Cli overview
Command Line Interface Reference
The Reward Kit provides a command-line interface (CLI) for common operations like previewing evaluations, deploying reward functions, and running agent evaluations.
Installation
When you install the Reward Kit, the CLI is automatically installed:
You can verify the installation by running:
Authentication Setup
Before using the CLI, set up your authentication credentials:
Command Overview
The Reward Kit CLI supports the following main commands:
preview
: Preview an evaluation with sample datadeploy
: Deploy a reward function as an evaluatoragent-eval
: Run agent evaluations on task bundleslist
: List existing evaluators (coming soon)delete
: Delete an evaluator (coming soon)
Preview Command
The preview
command allows you to test an evaluation with sample data before deployment.
Syntax
Options
--metrics-folders
: Specify metrics to use in the format “name=path”--samples
: Path to a JSONL file containing sample conversations--max-samples
: Maximum number of samples to process (optional)--output
: Path to save preview results (optional)--verbose
: Enable verbose output (optional)
Examples
Sample File Format
The samples file should be a JSONL (JSON Lines) file with each line containing a conversation in the following format:
Deploy Command
The deploy
command deploys a reward function as an evaluator on the Fireworks platform.
Syntax
Options
--id
: ID for the deployed evaluator (required)--metrics-folders
: Specify metrics to use in the format “name=path” (required)--display-name
: Human-readable name for the evaluator (optional)--description
: Description of the evaluator (optional)--force
: Overwrite if an evaluator with the same ID already exists (optional)--providers
: List of model providers to use (optional)--verbose
: Enable verbose output (optional)
Examples
Common Workflows
Iterative Development Workflow
A typical development workflow might look like:
- Create a reward function
- Preview it with sample data
- Refine the function based on preview results
- Deploy when satisfied
Comparing Multiple Metrics
You can preview multiple metrics to compare their performance:
Deployment with Custom Providers
You can deploy with specific model providers:
Agent-Eval Command
The agent-eval
command enables you to run agent evaluations using task bundles.
Syntax
Options
Task Specification:
--task-dir
: Path to task bundle directory containing reward.py, tools.py, etc.--dataset
or-d
: Path to JSONL file containing task specifications.
Output and Models:
--output-dir
or-o
: Directory to store evaluation runs (default: ”./runs”).--model
: Override MODEL_AGENT environment variable.--sim-model
: Override MODEL_SIM environment variable for simulated user.
Testing and Debugging:
--no-sim-user
: Disable simulated user (use static initial messages only).--test-mode
: Run in test mode without requiring API keys.--mock-response
: Use a mock agent response (works with —test-mode).--debug
: Enable detailed debug logging.--validate-only
: Validate task bundle structure without running evaluation.--export-tools
: Export tool specifications to directory for manual testing.
Advanced Options:
--task-ids
: Comma-separated list of task IDs to run.--max-tasks
: Maximum number of tasks to evaluate.--registries
: Custom tool registries in format ‘name=path’.--registry-override
: Override all toolset paths with this registry path.--evaluator
: Custom evaluator module path (overrides default).
Examples
Task Bundle Structure
A task bundle is a directory containing the following files:
reward.py
: Reward function with @reward_function decoratortools.py
: Tool registry with tool definitionstask.jsonl
: Dataset rows with task specificationsseed.sql
(optional): Initial database state
See the Agent Evaluation guide for more details.
Environment Variables
The CLI recognizes the following environment variables:
FIREWORKS_API_KEY
: Your Fireworks API key (required for deployment operations)FIREWORKS_API_BASE
: Base URL for the Fireworks API (defaults tohttps://api.fireworks.ai
)FIREWORKS_ACCOUNT_ID
: Your Fireworks account ID (optional, can be configured in auth.ini)MODEL_AGENT
: Default agent model to use (e.g., “openai/gpt-4o-mini”)MODEL_SIM
: Default simulation model to use (e.g., “openai/gpt-3.5-turbo”)
Troubleshooting
Common Issues
-
Authentication Errors:
Solution: Ensure
FIREWORKS_API_KEY
is correctly set. -
Metrics Folder Not Found:
Solution: Check that the path exists and contains a valid
main.py
file. -
Invalid Sample File:
Solution: Verify the sample file is in the correct JSONL format.
-
Deployment Permission Issues:
Solution: Use a production API key with deployment permissions or request additional permissions.
-
Task Bundle Validation Errors:
Solution: Ensure your task bundle has all required files.
-
Model API Key Not Set:
Solution: Set the MODEL_AGENT environment variable or use the —model parameter.
-
Import Errors with Task Bundle:
Solution: Check that the Python path is correct and the module can be imported.
Getting Help
For additional help, use the --help
flag with any command:
Next Steps
- Explore the Developer Guide for conceptual understanding
- Try the Creating Your First Reward Function tutorial
- Learn about Agent Evaluation to create your own task bundles
- See Examples for practical implementations