This repository contains the code for the paper "Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization".
The goal of this project is to explore different approaches to generating comments for higher-level code units such as modules or classes. This includes working with:
- Full Code: Using the complete code for generating summaries.
- Reduced Code: Leveraging compressed or abstracted versions of the code.
- Hierarchical Code Summarization: Creating summaries at different levels of granularity.
- Clone the repository:
git clone https://github.com/wssun/LLM4ModuleSum.git
- Navigate into the project directory:
cd LLM4ModuleSum
- Install the required packages:
pip install -r requirements.txt
The scripts located at the root directory are primarily for data processing tasks, including:
- caseslect_file.py: Selects specific cases from the dataset.
- compress.py: Compresses the Java code by removing unnecessary comments and whitespace.
- slicebycommunity.py: Slices the code based on community detection.
- slicebyfunction.py: Slices the code by individual functions.
- understand.py: Analyzes and processes the code using the 'understand' tool.
The scripts used to interact with large language models (LLMs) for comment generation are organized in two levels:
- run_all.py: Runs the full process for generating comments on all files.
- run_compression.py: Applies compression techniques before generating comments.
- run_fuc.py: Focuses on function-level summarization.
- run_fuc_summary.py: Generates summaries for each function within the code files.
- run_point.py: Extracts key points from the code for summarization.
- run_point_summary.py: Generates summaries based on key points extracted from the code.
- find_package.py: Identifies and processes package-level information in the code.
- module_summary.py: Generates summaries at the module level.
- module_summary_prompt.py: Provides prompts for LLMs to generate module-level summaries.
To process files and generate comments:
python file_level/run_all.py
LLM4ModuleSum/
│
├── data/ # Directory containing JSON and CSV files
├── file_level/ # File-level comment generation scripts
│ ├── run_all.py
│ ├── run_compression.py
│ ├── run_fuc.py
│ ├── run_fuc_summary.py
│ ├── run_point.py
│ └── run_point_summary.py
├── module_level/ # Module-level comment generation scripts
│ ├── find_package.py
│ ├── module_summary.py
│ └── module_summary_prompt.py
├── caseslect_file.py # Script to select specific cases
├── compress.py # Script to compress Java code
├── slicebycommunity.py # Script to slice code by community
├── slicebyfunction.py # Script to slice code by function
├── understand.py # Code analysis script using 'understand' tool
├── requirements.txt # Required Python packages
└── README.md # Readme file
- caseslect_file.py: Selects specific cases from the dataset.
- compress.py: Compresses the Java code by removing unnecessary comments and whitespace.
- slicebycommunity.py: Slices the code based on community detection.
- slicebyfunction.py: Slices the code by individual functions.
- understand.py: Analyzes and processes the code using the 'understand' tool.
- run_all.py: Runs the full process for generating comments on all files.
- run_compression.py: Applies compression techniques before generating comments.
- run_fuc.py: Focuses on function-level summarization.
- run_fuc_summary.py: Generates summaries for each function within the code files.
- run_point.py: Extracts key points from the code for summarization.
- run_point_summary.py: Generates summaries based on key points extracted from the code.
- find_package.py: Identifies and processes module-level information in the code.
- module_summary.py: Generates summaries at the module level.
- module_summary_prompt.py: Provides prompts for LLMs to generate module-level summaries.
This project is licensed under the MIT License. See the LICENSE file for details.