In [1]:
import pandas as pd
from pandas import Series
import os
import validators
import shutil

df = pd.read_json("Evaluation/swe-bench-lite.json")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype              
---  ------                    --------------  -----              
 0   repo                      225 non-null    object             
 1   instance_id               225 non-null    object             
 2   base_commit               225 non-null    object             
 3   patch                     225 non-null    object             
 4   test_patch                225 non-null    object             
 5   problem_statement         225 non-null    object             
 6   hints_text                225 non-null    object             
 7   created_at                225 non-null    datetime64[ns, UTC]
 8   version                   225 non-null    float64            
 9   FAIL_TO_PASS              225 non-null    object             
 10  PASS_TO_PASS              225 non-null    object             
 11  environment_setup_c

# Evaluating the LLM-Agen on SWE-Benchmark

We have two datasets we can use for predicting `swe-bench.json` which has 2200 entries and `swe-bench-lite.json` which has 224 entries, they are from the [SWE-Bench](https://github.com/princeton-nlp/SWE-bench/tree/main).

In [2]:
df = pd.read_json("Evaluation/swe-bench-lite.json")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype              
---  ------                    --------------  -----              
 0   repo                      225 non-null    object             
 1   instance_id               225 non-null    object             
 2   base_commit               225 non-null    object             
 3   patch                     225 non-null    object             
 4   test_patch                225 non-null    object             
 5   problem_statement         225 non-null    object             
 6   hints_text                225 non-null    object             
 7   created_at                225 non-null    datetime64[ns, UTC]
 8   version                   225 non-null    float64            
 9   FAIL_TO_PASS              225 non-null    object             
 10  PASS_TO_PASS              225 non-null    object             
 11  environment_setup_c

In [3]:
df.iloc[1]

repo                                                        sqlfluff/sqlfluff
instance_id                                           sqlfluff__sqlfluff-2862
base_commit                          447ecf862a4d2b977d0add9f444655357b9c4f1f
patch                       diff --git a/src/sqlfluff/core/linter/common.p...
test_patch                  diff --git a/test/api/simple_test.py b/test/ap...
problem_statement           fix keep adding new line on wrong place \n### ...
hints_text                  > Version\r\n> sqlfluff, version 0.6.2\r\n\r\n...
created_at                                          2022-03-14 19:46:08+00:00
version                                                                   0.1
FAIL_TO_PASS                [test/api/simple_test.py::test__api__lint_stri...
PASS_TO_PASS                [test/api/simple_test.py::test__api__lint_stri...
environment_setup_commit             3d52e8270d82aeccf4c516d059a80a6947919aea
Name: 1, dtype: object

After we used our LLM on the dataset to generate solutions to the problems, our output needs to be in the following format:
```
{
    "instance_id": "<Unique task instance ID>",
    "model_patch": "<.patch file content string>",
    "model_name_or_path": "<Model name here (i.e. SWE-Llama-13b)>",
}
```
With multiple prediction like this `[<prediction 1>, <prediction 2>,... <prediction n>]`.

**Example:**
```
{
    "instance_id": "django__django-15127",
    "model_name_or_path": "test",
    "model_patch": "--- a/django/contrib/messages/storage/base.py\n+++ b/django/contrib/messages/storage/base.py\n@@ -52,6 +52,7 @@\n                 if self._loaded_data is None:\n                     self._loaded_data = self.load()\n                 level, message, extra_tags = self._loaded_data\n+                extra_tags.update(self.get_level_tags())\n                 return {\n                     'message': message,\n                     'level': level,\n"
  },
``` 

# Testing SmolCoder

This requires starting the `phi3:latest` model, with ollama.

In [4]:
import sys
import os

sys.path.append(str(os.path.abspath('SmolCoder')))
print(sys.path)

['/home/lupos/Agentless', '/home/lupos/miniconda3/envs/llm/lib/python311.zip', '/home/lupos/miniconda3/envs/llm/lib/python3.11', '/home/lupos/miniconda3/envs/llm/lib/python3.11/lib-dynload', '', '/home/lupos/miniconda3/envs/llm/lib/python3.11/site-packages', '/home/lupos/interactive-learning/SmolCoder']


In [5]:
from pathlib import Path
import pandas as pd

from SmolCoder.src.agent import SmolCoder
from SmolCoder.src.agent_wrapper import AgentWrapper
from SmolCoder.src.llm_wrapper import LLM
from SmolCoder.src.toolkit import Toolkit

from SmolCoder.src.tools.list_methods import ListMethods
from SmolCoder.src.tools.list_classes import ListClasses
from SmolCoder.src.tools.list_files import ListFiles
from SmolCoder.src.tools.replace_method import ReplaceMethod
from SmolCoder.src.tools.finish import Finish
from SmolCoder.src.tools.execute_python import ExecutePythonCode
from SmolCoder.src.tools.show_method import ShowMethodBody
from SmolCoder.src.tools.move_folder import MoveFolder
from SmolCoder.src.tools.human_interaction import HumanInteraction

In [6]:
# Tool Definition
class_sumary = ListMethods()
list_classes = ListClasses()
list_files = ListFiles()
replace_method = ReplaceMethod()
finish = Finish()
execute_python = ExecutePythonCode()
show_method = ShowMethodBody()
move_folder = MoveFolder()
human_interaction = HumanInteraction()

## Testing Execute Python Tool

In [7]:
tools = Toolkit([execute_python])

agent = AgentWrapper(agent_name="SmolCoder",
                     toolkit=tools,
                     mode=0,
                     model="phi3:latest",
                     working_directory="repos",
                     logging_enabled=True
                    )

prompt = df.iloc[0]

TypeError: AgentWrapper.__init__() missing 1 required positional argument: 'dummy_model'

In [8]:
#result = agent.predict(prompt)
#print("RESULT: " + str(result))

In [9]:
#print(smolCoder.inspect_history(n=5))

# SmolCoder on SWE

This tests SmolCoder on a single Instance of the SWE-Benchmark.
This is without first trying to reproduce the bug, just barebones ReAct with tools.

In [10]:
# toolkit = Toolkit([human_interaction, finish])
toolkit = Toolkit([list_classes, list_files, replace_method, show_method, move_folder, finish])

agent = AgentWrapper(
                     agent_name="SmolCoder",
                     toolkit=toolkit,
                     mode=0,
                     model="phi3:latest",
                     working_directory="repos",
                     logging_enabled=True
                    )

TypeError: AgentWrapper.__init__() missing 1 required positional argument: 'dummy_model'

In [11]:
print(agent.name)
print("----------------")
print(agent.predict(df.iloc[0]))

NameError: name 'agent' is not defined

In [12]:
# print(smol_coder.in# toolkit = Toolkit([human_interaction, finish])
toolkit = Toolkit([human_interaction, list_classes, list_files, replace_method, show_method, move_folder, finish])

agent = AgentWrapper(
                     agent_name="SmolCoder",
                     toolkit=toolkit,
                     mode=0,
                     model="phi3:latest",
                     working_directory="repos",
                     logging_enabled=True
                    )

print(agent.name)
print("----------------")
print(agent.predict(df.iloc[0]))

TypeError: AgentWrapper.__init__() missing 1 required positional argument: 'dummy_model'

## Generating all Predictions

When running this on a server, it could happen that something crashed or an error is thrown which doesn't get catches, as such it is important to write the changes to disk for each entry in the dataset.


In [13]:
# This implementation uses checkpoints, this means if the program 
# is interuppted it can start again, where it left oft.

import tempfile
import json

#tools = Toolkit([class_sumary, list_classes, list_files, finish])
#model = LLM("phi3:latest")
#smol_coder = SmolCoder(model, Path("repos"), tools)
#agent = AgentWrapper(smol_coder, working_directory="repos", name="SmolCoder")

stub = AgentStub()
agent = AgentWrapper(stub, "repos")

checkpoint_file = 'checkpoint.txt'
resume_index = 0

activated = 1

if activated:
    # Check if checkpoint file exists and read the last processed index
    try:
        with open(checkpoint_file, 'r') as f:
            resume_index = int(f.read().strip())
    except FileNotFoundError:
        pass
    except Exception as e:
        print(f"Error reading checkpoint file: {e}")
    
    if resume_index < len(df) - 1:
        # Open a file to save predictions
        with open('predictions.json', 'a', encoding="utf-8-sig") as json_file:
            if resume_index == 0:
                json_file.write('[')  # Start of JSON array
                json_file.write('\n')
            # Generating our solution
            for index, row in df.iterrows():
                if index % 10 == 0: print("Current idx: " + str(index))
                # Skip rows that were already processed
                if index < resume_index:
                    continue
        
                predictions = {
                    "instance_id": row["instance_id"],
                    "model_patch": agent.predict(row),
                    "model_name_or_path": agent.name
                }
                # Convert the dictionary to a JSON formatted string and write to file
                json_data = json.dumps(predictions, indent=4)
                json_file.write(json_data)
                if index < len(df) - 1:
                    json_file.write(',')
                json_file.write('\n')
        
                with open(checkpoint_file, 'w') as f:
                    f.write(str(index))
                    
            if index == len(df) - 1:
                json_file.write(']')

NameError: name 'AgentStub' is not defined

# Meta Tokenizer

In [8]:
from pathlib import Path

from SmolCoder.src.llm_wrapper import LLM
from SmolCoder.src.prompting_strategy import PromptingStrategy
from SmolCoder.src.toolkit import Toolkit
from SmolCoder.src.tools.list_methods import ListMethods
from SmolCoder.src.tools.list_files import ListFiles
from SmolCoder.src.tools.list_classes import ListClasses
from SmolCoder.src.tools.finish import Finish
from SmolCoder.src.meta_tokenizer import MetaTokenizer

from SmolCoder.src.agent import SmolCoder

import pandas as pd
from pandas import Series
import os
import validators
import shutil

df = pd.read_json("Evaluation/swe-bench-lite.json")
df.info()

list_methods = ListMethods()
list_classes = ListClasses()
list_files = ListFiles()
finish = Finish()

toolkit = Toolkit([list_methods, list_classes, list_files, finish])

smol = SmolCoder(phase=3, model=LLM("llama3.1", openai=[False, "None"], logger=None), codebase_dir= Path("test_codebase/"), logger=None)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype              
---  ------                    --------------  -----              
 0   repo                      225 non-null    object             
 1   instance_id               225 non-null    object             
 2   base_commit               225 non-null    object             
 3   patch                     225 non-null    object             
 4   test_patch                225 non-null    object             
 5   problem_statement         225 non-null    object             
 6   hints_text                225 non-null    object             
 7   created_at                225 non-null    datetime64[ns, UTC]
 8   version                   225 non-null    float64            
 9   FAIL_TO_PASS              225 non-null    object             
 10  PASS_TO_PASS              225 non-null    object             
 11  environment_setup_c

In [7]:
smol(df.iloc[0]["problem_statement"], start_cwd="./repos/sqlfluff")

SUS CODE SNIPPET PHASE: 


Error while extracting code for function '_iter_segments' in file './repos/sqlfluff/src/sqlfluff/core/parser/lexer.py'
Error while extracting code for function '_handle_zero_length_slice' in file './repos/sqlfluff/src/sqlfluff/core/parser/lexer.py'
Error while extracting code for class 'BlockTracker' in file './repos/sqlfluff/src/sqlfluff/core/parser/lexer.py'
You will be given a description of a `GitHub issue` and it's corresponding codebase and your task is, to solve this issue. First you will be given a tree structure of the codebase, your task is it based on the description of the issue to select relevant files of it for closer inspection. After this you will be provided with a skeleten for each of your slected file, this skeleton will consist out of class and method headers and your task will be to select the classes and methods that are relevant to the described issue. At the end you will be provided with the source code of your selected classes and met