Flattened Abstract Syntax Trees
You can run fast in your own machine as the docker container of course, but here you don't even need that: all the binary and python dependencies have been provided, including also the trained models and the pre-trained embeddings.
To reproduce the results, all you need is to enable the GitPod app to access your GitHub account so that the commands can run on a remote server belonging to yourself.
- Use of flattened Abstract Syntax Trees in Deep Learning on your own GitPod server
Usage of fAST in Deep Learning for Algorithm Classification
Examples of algorithms in Java and C++ are provided to test the algorithm classification deep learning tool. Once your gitpod machine is running, it will launch the following command:
You will see the predicted probabilistic distribution of the class labels: the correctly classified label will be shown in blue, and the misclassified label will be shown in red.
To understand why, click at the HTML file "datasets/github_java_10/4/1.html" and use the Preview button on the up-right corner of the tab to see visualisation results in a split pane. The colours on the tokens indicate which parts of the code that have got the most attention by the classification algorithm.
To run another example, type:
run.sh datasets/github_java_10/4/3.java run.sh datasets/github_cs_10/4/1.cs run.sh datasets/github_cpp_10/4/1.cpp
In these examples, it shows that even though the model was trained using Java programs, when applying it to other programming languages such as C# or C++, it normally works well too. We call this feature "Cross-Language Algorithm Classification" [Bui et al SANER'19].
Usage of the fAST utility
cd datasets # print the command line options and arguments fast # convert a C++ code into protobuffer representation fast tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc.pb # convert a Java code into flatbuffers representation fast RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.fbs # convert a flatbuffers representation back to C# fast corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs.fbs corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs # slice a program fast -S -G RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests-ggnn.fbs # diff two programs fast -D github_java_10/4/1.java github_java_10/4/3.java
Usage of fAST in Bug Localisation
cd usr/bin java -cp /workspace/demo/usr/config:/workspace/demo/usr/config/lic:/workspace/demo/usr/lib/ConCodeSe-1.0.0.jar com.concodese.ConCodeSeJettyServerStarter SERVER_PORT=8081
You can call fAST anywhere when you have docker installed:
alias fast=”docker run -v $PWD:/e yijun/fast”
Reference and Applications
Yijun Yu. "fAST: Flattening Abstract Syntax Trees for Efficiency". In: 41st ACM/IEEE International Conference on Software Engineering, 25-31 May 2019, Montreal, Canada, ACM and IEEE. demo, paper, poster
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Learning Cross-Language API Mappings with Little Knowledge", In the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Tallinn, Estonia, 26-30 August 2019.
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification", In the 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering, Research Track, Hangzhou, China, February 24-27, 2019. GGNN, DTBCNN
Nghi D. Q. Bui, Lingxiao Jiang, and Yijun Yu. "Cross-Language Learning for Program Classification Using Bilateral Tree-Based Convolutional Neural Networks", In the proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) Workshop on NLP for Software Engineering, New Orleans, Louisiana, USA, 2018. Bi-TBCNN
Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel. "Gated graph sequence neural networks", In: 4th International Conference on Language Representations (ICLR), 2016.
Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin: "Convolutional Neural Networks over Tree Structures for Programming Language Processing". In: AAAI 2016: 1287-1293. TBCNN, datasets/pku_cpp_104/
M. L. Collard and J. I. Maletic, "srcML 1.0: Explore, Analyze, and Manipulate Source Code," 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), Raleigh, NC, 2016, pp. 649-649. srcML
Hakam W. Alomari, Michael L. Collard, Jonathan I. Maletic, Nouh Alhindawi and Omar Meqdadi. “srcSlice: very efficient and scalable forward static slicing”. Software: Evolution and Process, 26(11):931-961, November 2014.
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. "Fine-grained and accurate source code differencing". In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE '14). ACM, New York, NY, USA, 313-324. GumTreeDiff
Yijun Yu, Thein Thun Tun, and Bashar Nuseibeh, "Specifying and detecting meaningful changes in programs," In: Proc. of the 26th IEEE/ACM Conference on Automated Software Engineering, pp. 273-282, 2011. MCT