DetectCodeGPT

Welcome to the repository for the research paper: "Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers." Here, we present DetectCodeGPT, a novel approach to distinguish between machine- and human-generated code snippets. This README will guide you through setting up and using the DetectCodeGPT framework.

Getting Started

Prerequisites

Experiments are conducted using Python 3.9.7 on a Ubuntu 22.01.1 server.

To install all required packages, navigate to the root directory of this project and run:

pip install -r requirements.txt

Data Preparation

To prepare the datasets used in our study:

Navigate to the code-generation directory.
Obtain datasets from either:
- CodeSearchNet
- Preprocessed version of The Stack (The Vault)
Update the data paths and model specifications in generate.py to reflect your local setup.
Execute the data generation script with:
```
python generate.py
```

Conducting the Empirical Study

After data preparation, you can proceed to the empirical analysis:

Navigate to the code-analysis directory.
Analyze code length by running:
```
python analyze_length.py
```
Verify Zipf's and Heaps' laws, and compute token frequencies with:
```
python analyze_law_and_frequency.py
```
Analyze the proportion of different token categories by executing:
```
python analyze_proportion.py
```
Study the naturalness of code snippets via:
```
python analyze_naturalness.py
```

Using DetectCodeGPT

To evaluate our DetectCodeGPT model:

Navigate to the code-detection directory.
Configure main.py with the appropriate model and dataset paths.
Run the model evaluation script with:
```
python main.py
```

Support

If you encounter any issues or have questions, please feel free to contact us!

We hope that our work will aid in advancing the field of machine learning in code generation and detection. Thank you for your interest in DetectCodeGPT!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code-analysis		code-analysis
code-detection		code-detection
code-generation		code-generation
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code-analysis

code-analysis

code-detection

code-detection

code-generation

code-generation

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

DetectCodeGPT

Getting Started

Prerequisites

Data Preparation

Conducting the Empirical Study

Using DetectCodeGPT

Support

About

Releases

Packages

Languages

YerbaPage/DetectCodeGPT

Folders and files

Latest commit

History

Repository files navigation

DetectCodeGPT

Getting Started

Prerequisites

Data Preparation

Conducting the Empirical Study

Using DetectCodeGPT

Support

About

Resources

Stars

Watchers

Forks

Languages