LLM4ModuleSum

This repository contains the code for the paper "Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization".

Overview

The goal of this project is to explore different approaches to generating comments for higher-level code units such as modules or classes. This includes working with:

Full Code: Using the complete code for generating summaries.
Reduced Code: Leveraging compressed or abstracted versions of the code.
Hierarchical Code Summarization: Creating summaries at different levels of granularity.

Installation

Clone the repository:

git clone https://github.com/wssun/LLM4ModuleSum.git

Navigate into the project directory:
```
cd LLM4ModuleSum
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage

Data Processing

The scripts located at the root directory are primarily for data processing tasks, including:

caseslect_file.py: Selects specific cases from the dataset.
compress.py: Compresses the Java code by removing unnecessary comments and whitespace.
slicebycommunity.py: Slices the code based on community detection.
slicebyfunction.py: Slices the code by individual functions.
understand.py: Analyzes and processes the code using the 'understand' tool.

Comment Generation

The scripts used to interact with large language models (LLMs) for comment generation are organized in two levels:

File Level (Located in `file_level/`)

run_all.py: Runs the full process for generating comments on all files.
run_compression.py: Applies compression techniques before generating comments.
run_fuc.py: Focuses on function-level summarization.
run_fuc_summary.py: Generates summaries for each function within the code files.
run_point.py: Extracts key points from the code for summarization.
run_point_summary.py: Generates summaries based on key points extracted from the code.

Module Level (Located in `module_level/`)

find_package.py: Identifies and processes package-level information in the code.
module_summary.py: Generates summaries at the module level.
module_summary_prompt.py: Provides prompts for LLMs to generate module-level summaries.

Example Command

To process files and generate comments:

python file_level/run_all.py

Project Structure

LLM4ModuleSum/
│
├── data/                              # Directory containing JSON and CSV files
├── file_level/                        # File-level comment generation scripts
│   ├── run_all.py
│   ├── run_compression.py
│   ├── run_fuc.py
│   ├── run_fuc_summary.py
│   ├── run_point.py
│   └── run_point_summary.py
├── module_level/                      # Module-level comment generation scripts
│   ├── find_package.py
│   ├── module_summary.py
│   └── module_summary_prompt.py
├── caseslect_file.py                  # Script to select specific cases
├── compress.py                        # Script to compress Java code
├── slicebycommunity.py                # Script to slice code by community
├── slicebyfunction.py                 # Script to slice code by function
├── understand.py                      # Code analysis script using 'understand' tool
├── requirements.txt                   # Required Python packages
└── README.md                          # Readme file

Functions and Scripts

Data Processing Scripts

caseslect_file.py: Selects specific cases from the dataset.
compress.py: Compresses the Java code by removing unnecessary comments and whitespace.
slicebycommunity.py: Slices the code based on community detection.
slicebyfunction.py: Slices the code by individual functions.
understand.py: Analyzes and processes the code using the 'understand' tool.

File-Level Comment Generation

run_all.py: Runs the full process for generating comments on all files.
run_compression.py: Applies compression techniques before generating comments.
run_fuc.py: Focuses on function-level summarization.
run_fuc_summary.py: Generates summaries for each function within the code files.
run_point.py: Extracts key points from the code for summarization.
run_point_summary.py: Generates summaries based on key points extracted from the code.

Module-Level Comment Generation

find_package.py: Identifies and processes module-level information in the code.
module_summary.py: Generates summaries at the module level.
module_summary_prompt.py: Provides prompts for LLMs to generate module-level summaries.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM4ModuleSum

Overview

Table of Contents

Installation

Usage

Data Processing

Comment Generation

File Level (Located in `file_level/`)

Module Level (Located in `module_level/`)

Example Command

Project Structure

Functions and Scripts

Data Processing Scripts

File-Level Comment Generation

Module-Level Comment Generation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
file_level		file_level
module_level		module_level
LICENSE		LICENSE
README.md		README.md
caseselect_file.py		caseselect_file.py
commpress.py		commpress.py
requirements.txt		requirements.txt
slicebycommunity.py		slicebycommunity.py
slicebyfunction.py		slicebyfunction.py
understand.py		understand.py

License

wssun/LLM4ModuleSum

Folders and files

Latest commit

History

Repository files navigation

LLM4ModuleSum

Overview

Table of Contents

Installation

Usage

Data Processing

Comment Generation

File Level (Located in file_level/)

Module Level (Located in module_level/)

Example Command

Project Structure

Functions and Scripts

Data Processing Scripts

File-Level Comment Generation

Module-Level Comment Generation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

File Level (Located in `file_level/`)

Module Level (Located in `module_level/`)

Packages