A lightweight, open-source framework that turns historical GitHub pull requests into reproducible, verifiable software-engineering tasks for training and evaluating coding agents.
python docker benchmark reinforcement-learning evaluation developer-tools software-engineering agents ai-agents rl-environments github-prs llm coding-agents swe-bench
-
Updated
Jun 8, 2026 - Python