On the Reliability and Explainability of Language Models for Program Generation

Overview

This repository contains all the necessary materials for replicating the research conducted in our study "On the Reliability and Explainability of Language Models for Program Generation." The study focuses on assessing the reliability and explainability of various language models in the context of program generation tasks. Our study reveals significant flaws in model performance and uncovers severe data duplication, leading to over-optimistic results. Our findings highlight the critical need for more rigorous evaluation methods and benchmarks to enhance the reliability and explainability of these models in practical applications.

Datasets Used in the Study

Dataset Name	Task Types	Venus	Paper Link and Source Code
Tufano et al.	Code Review	ICSE'19	[data], [Paper]
Bugs2Fix	Code Repair	TOSEM'19	[data], [Paper]
CodeReview	Code Review	FSE'22	[data], [Paper]
CodeTrans-Dataset	Code Translation	NIPS'19	[data], [Paper]
CONCODE	Code Generation	NIPS'19	[data], [Paper]

Getting Started

To get started with replicating our study, please follow the steps below:

Prerequisite and Setup

Python 3.6 +
Packages:

pip install -r requirements.txt

Choose a diretory and:

git clone https://github.com/yueyueL/ProgramGen-LMs-Reliability.git

cd ProgramGen-LMs-Reliability/

Contribution

We welcome contributions and suggestions! Please open an issue or submit a pull request for any enhancements.

Citation

If you use the resources provided in this repository, please cite our paper

@misc{liu2023reliability,
      title={On the Reliability and Explainability of Language Models for Program Generation}, 
      author={Yue Liu and Chakkrit Tantithamthavorn and Yonghui Liu and Li Li},
      year={2023},
      eprint={2302.09587},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
RQ2-reliability		RQ2-reliability
ecco		ecco
tasks		tasks
README.md		README.md
get_exp.py		get_exp.py
requirements.txt		requirements.txt
save_tokens_input_from_exp.py		save_tokens_input_from_exp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RQ2-reliability

RQ2-reliability

ecco

ecco

tasks

tasks

README.md

README.md

get_exp.py

get_exp.py

requirements.txt

requirements.txt

save_tokens_input_from_exp.py

save_tokens_input_from_exp.py

Repository files navigation

On the Reliability and Explainability of Language Models for Program Generation

Overview

Datasets Used in the Study

Getting Started

Prerequisite and Setup

Contribution

Citation

About

Releases

Packages

Languages

yueyueL/ProgramGen-LMs-Reliability

Folders and files

Latest commit

History

Repository files navigation

On the Reliability and Explainability of Language Models for Program Generation

Overview

Datasets Used in the Study

Getting Started

Prerequisite and Setup

Contribution

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages