Skip to content

The source code of "Machine learning code snippets semantic classification" (Valeriy Berezovskiy, Anastasia Gorodilova, Ekaterina Trofimova, Andrey Ustyuzhanin) paper.

License

Notifications You must be signed in to change notification settings

vorobeevich/ml-snippets-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Machine learning code snippets semantic classification

This repository contains the source code of experiments from the paper "Machine learning code snippets semantic classification" (Valeriy Berezovskiy, Anastasia Gorodilova, Ekaterina Trofimova, Andrey Ustyuzhanin).

Preparation

Start by cloning the repository:

git clone https://github.com/vorobeevich/ml-snippets-classification

We highly recommend using conda for experiments: Anaconda.

After installation, make a new environment:

conda create --name cssc

conda activate cssc

Install the libraries from the requirements.txt. Torch versions may differ depending on your GPU: Start Locally | PyTorch

Data

Download the marked up data (7947 snippets), as well as the result of the partition algorithm from our Google Drive:

chmod 777 /src/scripts/load_data.sh

./src/scripts/load_data.sh

You can download the full version of Code4ML dataset (marked up data, a total set of 2.5 million snippets, our model predictions on all data) on Zenodo: DOI

Also, you can read the paper about Code4ML Dataset: Code4ML: a Large-scale Dataset of annotated Machine Learning Code.

Usage

To reproduce any experiment from our paper, it is enough to run the training script with the desired config. Note that the result is non-deterministic (even with a fixed random seed) on various platforms due to the nature of libraries such as torch.

python src/scripts/train.py --device [ID OF CUDA DEVICE] --config src/configs/[CHOOSE CONFIG TO RUN]

About

The source code of "Machine learning code snippets semantic classification" (Valeriy Berezovskiy, Anastasia Gorodilova, Ekaterina Trofimova, Andrey Ustyuzhanin) paper.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published