Code execution with e2b
Code Execution with E2B
This guide demonstrates how to use the E2B code execution reward function to evaluate code by running it in the E2B cloud sandbox.
Overview
The e2b_code_execution_reward
function allows you to:
- Extract code blocks from LLM responses
- Execute the code securely in E2B’s cloud sandbox
- Compare the output with expected results
- Generate a score and detailed metrics
Prerequisites
To use the E2B code execution reward function, you need:
- An E2B API key from E2B Dashboard
- The
e2b_code_interpreter
Python package installed:pip install e2b_code_interpreter
Note: The code will also work with the e2b
package, but e2b_code_interpreter
is recommended as it provides a more stable interface specifically designed for code execution.
Basic Usage
Here’s a simple example of how to use the reward function:
This function uses recursion to calculate the factorial. For n = 5, it computes 5 * 4 * 3 * 2 * 1 = 120.""" } ]
Define expected output
expected_output = “120”
Evaluate the code using E2B
result = e2b_code_execution_reward( messages=messages, expected_output=expected_output, language=“python”, api_key=“your_e2b_api_key”, timeout=10 )
Use the results
print(f”Score: ”) for metric_name, metric in result.metrics.items(): print(f”\n: “)
Fallback to Local Execution
You can gracefully fall back to local execution when an E2B API key is not available:
Parameters
The e2b_code_execution_reward
function accepts the following parameters:
Parameter | Type | Description |
---|---|---|
messages | List[Dict[str, str]] | Generated conversation messages (required) |
original_messages | List[Dict[str, str]] | Original conversation context (optional) |
expected_output | str | Expected output from code execution (optional) |
language | str | Programming language of the code (default: “python”) |
timeout | int | Maximum execution time in seconds (default: 30) |
api_key | str | E2B API key (default: None, uses E2B_API_KEY environment variable) |
Return Value
The reward function returns a RewardOutput
object with:
score
: A float between 0.0 and 1.0 indicating how well the code performedmetrics
: A dictionary ofMetricRewardOutput
objects with detailed information about the execution
Key metrics include:
extracted_code
: The code that was extracted and executedexpected_output
: The expected output (if provided or extracted)execution_result
: Details about the execution (success or failure)output_match
: Comparison between actual and expected outputs
Examples
See the examples/
directory for complete examples:
e2b_reward_example.py
: Basic Python examplee2b_javascript_example.py
: JavaScript examplee2b_auto_extract_example.py
: Automatic output extraction examplee2b_fallback_example.py
: Fallback to local execution example