Skip to content

Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

License

Notifications You must be signed in to change notification settings

Stars1233/swe-rl

 
 

Repository files navigation

SWE-RL

🧐 About | πŸš€ Quick Start | 🐣 Agentless Mini | πŸ“ Citation | πŸ™ Acknowledgements

🧐 About

Official codebase for our paper: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution (link).

SWE-RL is the first approach to scale reinforcement learning based LLM reasoning for real-world software engineering, leveraging open-source software evolution data and rule-based rewards.

Overview of SWE-RL

Note

We have undertaken significant code refactoring to enhance quality and accessibility. However, this may introduce potential inconsistencies with our internal implementation. If you encounter a bug, please file an issue. We are also gradually updating the repo to include additional information.

πŸš€ Quick start

git clone https://github.com/facebookresearch/swe-rl && cd swe-rl
pip install -e ".[dev]"
pytest

The code currently provides our prompt templates and the implementation of the reward function based on sequence similarity. You can find them in src/swerl/core/prompts.py and src/swerl/core/reward.py respectively.

We provide three reward function API:

  1. calculate_search_replace_reward: calculates the similarity between search/replace changes and oracle changes (this's what we used in the paper);
  2. calculate_reward_unidiff: calculates the similarity between two sets of unified diffs;
  3. calculate_reward: a more general API that can be paired with any editing format.

Reward for search/replace changes

A toy example on how you can use the reward function in your own project:

import swerl

file = """
def sort_list(lst):
    return sorted(lst)
""".strip()

oracle_file = """
def sort_list(lst: list[int]) -> list[int]:
    return sorted(lst)
""".strip()

context = {"example.py": file}
oracle = {"example.py": oracle_file}

output = """
<think>
...thoughts by LLM
</think>
<solution>
```python
### example.py
<<<<<<< SEARCH
def sort_list(lst):
=======
def sort_list(lst: list[int]) -> list[int]:
>>>>>>> REPLACE
```
</solution>
""".strip()

reward, metadata = swerl.core.reward.calculate_search_replace_reward(context, oracle, output)
assert reward == 1.0
print(metadata)

Reward for unified diff

Check swerl.core.reward.calculate_reward_unidiff. Here is the signature:

def calculate_reward_unidiff(
    oracle_patches: list[str], pred_patches: list[str]
) -> tuple[float, dict]:
    """
    Compute the SWE-RL reward given two set of unified diffs.

    The return value is always within the range of [0, 1].

    Args:
        oracle_patches: A list of oracle diffs.
        pred_patches: A list of predicted diffs.

    Returns:
        A float value representing the reward, and a dictionary containing some metadata.
    """

General version

Check swerl.core.reward.calculate_reward. Here is the signature:

def calculate_reward(
    code_context: dict[str, str],
    oracle_new_content: dict[str, str],
    pred_new_content: dict[str, str],
) -> tuple[float, dict]:
    """
    Compute the SWE-RL reward given the code context, oracle patch, and the model output.
    Note that this function is a general version of the reward calculation, which can be used
    for code changes in any form, not just search/replace edits. For search/replace edits, use
    `calculate_search_replace_reward`.

    The return value is always within the range of [0, 1].

    Args:
        code_context: path -> original content of the file. It doesn't need to
            contain the entire codebase, only the files that are affected by the oracle patch.
        oracle_new_content: path -> oracle new content of the file after change.
        pred_new_content: path -> predicted new content of the file after change.

    Returns:
        A float value representing the reward, and a dictionary containing some metadata.
    """

🐣 Agentless Mini

Agentless Mini

Agentless Mini builds on top of Agentless with the following key improvements and functionality changes:

  1. Fast async inference with openai-python.
  2. Code refactoring for better scalability, parallelization, and accessibility.
  3. Only performing file-level localization, and entire file content will be used for repair.
  4. Support of using multiple reproduction tests for reranking.

Environment setup

To get started, run the following command to install the dependencies:

git clone https://github.com/facebookresearch/swe-rl && cd swe-rl
pip install -e ".[agentless]"

Agentless Mini works with any OpenAI-compatible endpoint. If you want to host your own Hugging Face models, popular choices are vLLM and SGLang. Taking vLLM as an example:

# Host Llama-3.3-70B-Instruct with vLLM
pip install vllm
vllm serve meta-llama/Llama-3.3-70B-Instruct --tensor-parallel-size 4 --port 8000
# The endpointn url will be http://localhost:8000/v1

Finally, you would need to set up some environment variables required by Agentless Mini:

# Assume you're doing the above vLLM setup
# Otherwise, just adjust them accordingly
export OPENAI_API_KEY="Empty"
export OPENAI_BASE_URL="http://localhost:8000/v1"

# Whether "thinking" is in model output (yes/no). If so, we need to extract the answer block during parsing
# and ignore the thinking. We assume the answer is enclosed with "<solution>" and "</solution>".
# Check src/swerl/agentless_mini/utils/envs.py to learn how to adjust them.
export THINKING=no
# A temporary directory used to process patches
export PLAYGROUND_DIR="tmp_agentless"
# Please download it from https://github.com/OpenAutoCoder/Agentless/releases/download/v1.5.0/swebench_repo_structure.txt
export PROJECT_FILE_LOC="/path/to/swebench/repo_structures"
# The tokenizer model. Can be either huggingface or tiktoken model name
export TOKENIZER_MODEL="meta-llama/Llama-3.3-70B-Instruct"

Repair with oracle files

Now you can run Agentless Mini if the environment variables are properly configured. Below is the simplest setup where oracle files are provided for repair. This can be a good proxy for end to end result:

# Make sure your are in the root directory of swe-rl
#
# Agentless Mini supports sharding. If you are using a compute cluster, then you can run
# different shards with different compute nodes to parallelize the evaluation.
# Below, we set num_shards to 125, so each shard will have (500 / 125) instances, where
# 500 is the number of problems in SWE-bench Verified.
python -m swerl.agentless_mini.repair \
    --loc_file resources/sweb_verified_gt_loc.jsonl \
    --output_folder demo_gt_repair \
    --shard 0 \
    --num_shards 125 \
    --num_samples 1 \
    --temperature 0.0 \
    --model "meta-llama/Llama-3.3-70B-Instruct"

# Get your all_preds.jsonl
python -m swerl.agentless_mini.rerank \
    --patch_folder ${REPAIR_DIR} \
    --num_samples ${NUM_SAMPLES} \
    --output_file demo_gt_repair/all_preds.jsonl \
    --deduplicate

Full pipeline

Localization + repair

You can also run the full pipeline. We show a greedy-decoding demo below:

NUM_SAMPLES=1
COMMON_ARGS=(
    --shard 0
    --num_shards 125
    --num_samples ${NUM_SAMPLES}
    --temperature 0.0
    --model "meta-llama/Llama-3.3-70B-Instruct"
    # Check --max_concurrent_requests on how to control the concurrency
)

ROOT=demo_agentless
LOC_FILE=${ROOT}/loc.jsonl
REPAIR_DIR=${ROOT}/repair
PRED_FILE=${ROOT}/all_preds.jsonl

# Localization
python -m swerl.agentless_mini.localize \
    --output_file ${LOC_FILE} \
    ${COMMON_ARGS[@]}

# Optionally, check localization performance
python -m swerl.agentless_mini.tools.check_loc_perf --locfile ${LOC_FILE}

# Repair
python -m swerl.agentless_mini.repair \
    --loc_file ${LOC_FILE} \
    --output_folder ${REPAIR_DIR} \
    ${COMMON_ARGS[@]}

# Rerank
python -m swerl.agentless_mini.rerank \
    --patch_folder ${REPAIR_DIR} \
    --num_samples ${NUM_SAMPLES} \
    --output_file ${PRED_FILE} \
    --deduplicate

# Now the ${PRED_FILE} will be ready. If you get all empty outputs, it means
# the model isn't generating correctly formatted edits. Then you should consider
# changing your base model or sampling more locations & repairs.

Reproduction test generation

Note

Reproduction test generation, regression test selection, and test execution are WIP due to refactoring and infra difference. They will be updated shortly.

πŸ“ Citation

@article{wei2025swerl,
  title={SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution}, 
  author={Yuxiang Wei and Olivier Duchenne and Jade Copet and Quentin Carbonneaux and Lingming Zhang and Daniel Fried and Gabriel Synnaeve and Rishabh Singh and Sida I. Wang},
  year={2025},
  journal={arXiv preprint arXiv:2502.18449}
}

πŸ™ Acknowledgements

Agentless, SWE-Gym, SWE-Fixer, SWE-bench, Moatless EvalTools, Nebius SWE-agent.

License

The majority of SWE-RL is licensed under CC BY-NC 4.0, however portions of the project are available under separate license terms: Agentless Mini is licensed under the MIT license.

About

Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%