SmolCoder: An Open Source LLM-based coding agent that works with human interaction (WIP)

This project was developed as part of the research-lab "Interactive Learning" at the Karlsruhe Institute of Technology (KIT) in the summer of 2024.

Description

The scope of this project is to develop an autonomous coding agent similar to Devin, SWE-Agent and AutoCodeRover.

All these agents have in common that they are using GPT4 or some other high-end (and thus expensive) LLMs as a backbone. In this project, we want to test the feasibility of using smaller models as agents.

The ambitious goal of this project is to get onto the leaderboard of SWEBench - a by-now well-established benchmark for coding agents.

This project has three distinct parts:

SmolCoder, which can be found on the master Branch is loosely based on SWE-Agent.
Agentless. which can be found on the agentless branch is based on same named paper Agentless.
InteractiveLearning, which can be found in the evaluation notebookb.

This project is a work-in-progress.

Roadmap

Writing an Eval Pipeline for SWEBench ✅
Creating the Coding Agent Framework ✅
Definining and programming the Tools that the Agent will use ✅
Creating an interface between the Agent and the Computer ✅

Evaluating:

Phi3 out-of-the-box
Phi3 as coding agent
Phi3 finetuned on code
Phi3 as coding agent, finetuned on code/tool-use
Phi3 as coding agent, finetuned on code/tool-use with human interaction

SWE-Bench Evaluation

Agentless and Interactive-Learning: Evaluation

To run the evaluation for agentless or interactive-learning, check out the evaluation notebook.

SmolCoder: Evaluation

Python version needs to be at least 3.11 and docker needs to be installed (alternative) and docker needs to run as daemon.

Test the SWE-Bench installation

Navigate inside the folder and make sure the requirements are installed:

cd Evaluation
cd SWE-bench
pip install -e .

Test the installation:

python -m swebench.harness.run_evaluation \
    --predictions_path gold \
    --max_workers 1 \
    --instance_ids sympy__sympy-20590 \
    --run_id validate-gold

Run the SWE-Bench evaluation

Install the required python packages, I would recommend doing it using conda: conda create --name <env> --file requirements.txt and activate it conda activate <env>.
Get your predictions by running the appropiate part of the Evaluation.ipynb, make sure to choose the correct dataset (either swe-bench.json for the full dataset or swe-bench-lite.json for a smaller version). Alternatively you can also run the evaluation.py inside the Evaluation folder.
To evaluate the predictions, navigate to the SWE-Bench folder

cd Evaluation
cd SWE-bench

Run the evaluation with the following command, you may need to customize the command:

python -m swebench.harness.run_evaluation \
 --predictions_path ../prediction.json \
 --max_workers 1 \
 --dataset_name ../swe-bench-lite.json \
 --run_id YOUR_ID

Your should find a json report, listing the evaluation result of your predictions,with YOUR_ID inside the SWE-bench directory.

Run SWE-Bench-Evaluation on "BwUniCluster" or other Slurm Batch System

Connect and login in your server.
Clone this repository and put an ollama binary into the folder.
Create a new file vi evaluate.sh with following content:

#!/bin/bash
#SBATCH --job-name=evaluate_gemma_2
#SBATCH --mem 20000
#SBATCH --nodes 1
#SBATCH --time 5
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
module load devel/miniconda
OLLAMA_LLM_LIBRARY="cpu_avx2" ./ollama-linux-amd64 serve &
python evaluate.py --logging_enabled=True --model_name="gemma2" --output_file="prediction_gemma2B.json"

Set the memory depending on the model, e.g. 2B memory <= 10GB, 8B memory <= 20GB
Remove the line set CUDA_VISIBLE_DEVICES and remvoe OLLAMA_LLM_LIBRARY="cpu_avx2" if you want to use cuda.
Modify job-name, model_name, output_file
The sleep 60 is because, downlaoding the model takes some time.

Queue the job with sbatch -p single evaluate.sh
To check the progress: squeue

For more on Slurm jobs, check this website out.

Resources

Very Relevant Papers:

Less Relevant Papers:

Misc:

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
Evaluation		Evaluation
SmolCoder		SmolCoder
legacy		legacy
repos		repos
test_codebase		test_codebase
tests		tests
.gitignore		.gitignore
Evaluation.ipynb		Evaluation.ipynb
README.md		README.md
evaluate.py		evaluate.py
notes.md		notes.md
requirements.txt		requirements.txt
smolcoder.webp		smolcoder.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmolCoder: An Open Source LLM-based coding agent that works with human interaction (WIP)

Description

Roadmap

SWE-Bench Evaluation

Agentless and Interactive-Learning: Evaluation

SmolCoder: Evaluation

Test the SWE-Bench installation

Run the SWE-Bench evaluation

Run SWE-Bench-Evaluation on "BwUniCluster" or other Slurm Batch System

Resources

About

Releases

Packages

Contributors 2

Languages

theonetruekn/interactive-learning

Folders and files

Latest commit

History

Repository files navigation

SmolCoder: An Open Source LLM-based coding agent that works with human interaction (WIP)

Description

Roadmap

SWE-Bench Evaluation

Agentless and Interactive-Learning: Evaluation

SmolCoder: Evaluation

Test the SWE-Bench installation

Run the SWE-Bench evaluation

Run SWE-Bench-Evaluation on "BwUniCluster" or other Slurm Batch System

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages