# Project-Diary

# IR-based fault localization tool

## 2024-04-01 discuss with RQ1 evaluation

**input/out pairs**

input: bug report, buggy program

output: bug-id, suspicious file list, probability

**iFixR**

* ref: https://github.com/TruX-DTF/iFixR/tree/master
* https://arxiv.org/abs/1902.02703
* https://github.com/TruX-DTF/d-and-c

iFixR IR-based file localization is based on [d&c](https://github.com/TruX-DTF/d-and-c), which use LightGBM to do the machine learning, get a predict approach(multi-classifiers, which accept code and bug report to calculate similarity), rank the suspicious file list.

d-and-c didn't offer too many guidelines and was a little tricky to study, so we just used iFixR manual.

from step1 to step12: clone -> collect -> fix -> bugPoints -> brDownload -> brParser -> brFeatures -> verify -> simi -> features -> predict -> eval

main idea: get code/get bug -> translate to features -> calculate similarity -> rank all results(probability), generate suspicious file list

step 3/4 Require: `bugreport.pickle` file, which contains:

```text
bugID
summary
description
created
updated
resolved
reporterDN
reporterEmail
hasAttachment
attachmentTime
hasPR
codeElements
stackTraces
summaryHints
descHints
```

iFixR already uploaded three bugreport.pickle files, so we can directly use them to generate three expected output

**evaluate**

* ref: https://github.com/rjust/defects4j
* https://bitbucket.org/rjust/fault-localization-data/src/d4j-2.0/analysis/pipeline-scripts/

from the last step, we got the suspicious file list: https://github.com/zehaowang00/LLMRepair/blob/feng/dataset/fault_localization/LANG-file-predict.zip, send it to LLM and get the statement-level predict results. The evaluation metrics are recall and precision.

## 2024-03-31 try run

* ref:
    * https://github.com/TruX-DTF/iFixR/tree/master
    * https://arxiv.org/abs/1902.02703
    * https://github.com/TruX-DTF/d-and-c

some notes:

* commend: `bash startPy.sh . predict ALL`
* install failure:
    * lightbm: https://github.com/microsoft/LightGBM/issues/6035, `brew install libomp`
    * pandas: pip install "pandas<2.0.0"
    * `commons.py`: from `cfg = yaml.load(ymlfile)` to `cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)`
    * `predict.py`: from `ddf = dd.concat([dd.from_array(c) for c in series], axis=1)` to `ddf = dd.concat([dd.from_pandas(c,chunksize=50000) for c in series], axis=1)`
    * need information:
```text
bugID
summary
description
created
updated
resolved
reporterDN
reporterEmail
hasAttachment
attachmentTime
hasPR
codeElements
stackTraces
summaryHints
descHints
```

one file not found: File /Users/watch/PycharmProjects/iFixR/data/singlePred/finalmultic_LANG-607.json not found!

# before 2024-03-31 first try

## Step0: preparation

* venv: requirement.txt
* defects4j: download
* conf.json: api-key, defects4j_path

## Step1: fault localization

In [1]:
%cd script
%run fault_localization.py

  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


/Users/watch/PycharmProjects/LLMRepair/script


FileNotFoundError: [Errno 2] No such file or directory: '/Users/wang/Documents/project/defects4j/Data/Lang/lang_57_b/src/java/org/apache/commons/lang/LocaleUtils.java'

[2024-03-26] failed to run, wait for the source code

## Step2: patch generation

In [8]:
%run patch_generation.py

{'Role': 'As a professional developers. You are responsible for fixing the bug in bug report and generating program repair patch.', 'Instruction': 'You should check the bug report information. The location of buggy code is provided. There are two type of information: method include bug, and suspicious buggy code statements. \n                    One is the method that includes a buggy code. Another is the suspicous buggy code that tirgger the bug in bug report. \n                    The suspicious buggy code statements may be not very accurate. \n                    You need to fix the bug in the bug report and provide the fix patch. Please provide the fix patch refer the example format. Output in json format.', 'Bug report description': "<p>FindBugs pointed out:</p>\n\n<p>   UwF: Field not initialized in constructor: org.apache.commons.lang.LocaleUtils.cAvailableLocaleSet</p>\n\n<p>cAvailableSet is used directly once in the source - and if availableLocaleSet() hasn't been called it wi

In [9]:
# %load ../analysis_result/GPT_response/patch_generation/Lang/patch/LANG_57.json
{
    "Fix Patch": "diff --git a/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java b/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java\nindex abcdef1..1234567 100644\n--- a/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java\n+++ b/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java\n@@ -1,5 +1,5 @@\n public static boolean isAvailableLocale(Locale locale) { \n-        return cAvailableLocaleSet.contains(locale); \n+        return cAvailableLocaleSet != null && cAvailableLocaleSet.contains(locale); \n     }"
}

In [10]:
# %load ../analysis_result/GPT_response/patch_generation/Lang/patch/LANG_57.txt
diff --git a/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java b/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java
index abcdef1..1234567 100644
--- a/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java
+++ b/src/main/java/org/apache/commons/lang3/text/translate/CharSequenceTranslator.java
@@ -1,5 +1,5 @@
 public static boolean isAvailableLocale(Locale locale) { 
-        return cAvailableLocaleSet.contains(locale); 
+        return cAvailableLocaleSet != null && cAvailableLocaleSet.contains(locale); 
     }

## Step3: patch validation

In [11]:
# %load patch_validation.py
import pandas as pd
import os 
import json

def get_completion(client, prompt):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        # model="gpt-4",
        messages=messages,
        response_format={"type": "json_object"},
        temperature=0.3,
    )
    return response.choices[0].message.content

def patch_validate(client, prompt, few_shots, save_file_path ):
    response = get_completion(client, json.dumps(prompt))
    with open(save_file_path, 'w') as file:
       json.dump(json.loads(response), file, indent=4)

prompt_validation = {
  "Role": "As a professional developers. You are responsible for generating program repair patch.",
  "Instruction": "Read ",
  "Question": """ Question1: 
              """
}

seems not upload the script yet