Skip to content
Smelling smells using Deep Learning
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Smelling smells using Deep Learning

We have two objectives to carry out this research. The first objective is to explore the feasibility of applying deep learning models to detect smells without extensive feature engineering, just by feeding the source code elements in tokenized form. Another goal of the work is to investigate the possibility of applying transfer-learning in the context of deep learning models for smell detection by investigating transferability of results between programming languages.

This repository contains supporting material of the experiment.


The figure below provides an overview of the experiment. We download 1,072 C# and 100 Java repositories from GitHub. We use Designite and DesigniteJava to analyze C# and Java code respectively. We use CodeSplit to extract each method and class definition into separate files from C# and Java programs. Then the learning data generator uses the detected smells to bifurcate code fragments into positive or negative samples for a smell - positive samples contains the smell while the negative samples are free from that smell. We apply preprocessing operations on these samples such as removing duplicates and feed the output to Tokenizer. Tokenizer takes a method or class definition and generates integer tokens for each token in the source code. The output of Tokenizer is ready to feed to neural networks.

Overview of the study

What this repository contains


  • Source code for data curation (analyzing source code and detect smells, split the code fragments (method and classes) into separate files, generating positive and negative samples using the detected smells, and tokenize them).
  • Python implementation of our CNN and RNN architecture that we used to detect smells including reading data, preprocess, and passing them to the deep learning models.


  • Detected smells data for both C# and Java
  • Code fragments for both C# and Java
  • Generated positive and negative samples for each considered code smell.
  • Tokenized samples in one and two dimensions
You can’t perform that action at this time.