| title | RL Code Review Agent |
|---|---|
| emoji | 🤖 |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| pinned | false |
| app_port | 8000 |
This environment is a specialized Reinforcement Learning (RL) curriculum designed to evaluate and train agents in automated code review. The curriculum focuses on three distinct pillars of software engineering: PEP8 compliance, security best practices, and algorithmic complexity.
- Action Space: The agent provides a
stringcontaining the full, corrected Python source code. - Observation Space: *
messy_code: The current state of the source code requiring review.feedback: Detailed strings containing unit test results and verbose Ruff linting errors (e.g., F401, E302).reward: A scalar value from 0.0 to 1.0 based on functional correctness and code quality.
The environment implements a 3-stage progression. Success in a lower stage (Reward
- EASY (Formatting & Cleanup): Focuses on PEP8 compliance, removing unused imports, and fixing basic syntax.
- MEDIUM (Security): Focuses on secure coding practices, specifically identifying and fixing vulnerabilities in
subprocesscalls (e.g.,shell=True). - HARD (Refactoring): Focuses on code maintainability and complexity. Agents must flatten deeply nested logic using guard clauses.
The reward is calculated using a weighted average:
- Functional Score: Percentage of unit tests passed.
- Quality Score: Calculated via Ruff static analysis, with penalties applied for each linting violation.
The following log demonstrates the baseline agent (Qwen2.5-Coder-32B-Instruct) successfully navigating the curriculum:
[START] task=curriculum_review env=CodeReviewEnvironment-v1 model=Qwen/Qwen2.5-Coder-32B-Instruct
[STEP] step=1 action=def add(a, b):\n return a + b\n\nimport os\n# R... reward=0.80 done=false error=null
[STEP] step=2 action=def add(a, b):\n return a + b\n# Removed unused... reward=0.00 done=false error=null
[STEP] step=3 action=import subprocess\n\n\ndef execute_command(cmd):\n... reward=0.00 done=false error=null
[STEP] step=4 action=def check(x):\n if x <= 0:\n return Fals... reward=0.95 done=true error=null
[END] success=true steps=4 score=0.950 rewards=0.80,0.00,0.00,0.95
- Docker
- Python 3.11+
- uv (Fast Python package manager)
# Build the container
docker build -t code-review-env .
# Run the environment
docker run -p 8000:8000 --env-file .env code-review-env