Flattened Abstract Syntax Trees
You can run fast in your own machine as the docker container of course, but here you don't even need that: all the binary and python dependencies have been provided, including also the trained models and the pre-trained embeddings.
To reproduce the results, all you need is to enable the GitPod app to access your GitHub account so that the commands can run on a remote server belonging to yourself.
- Use of flattened Abstract Syntax Trees in Deep Learning on your own GitPod server
Usage of fAST in Deep Learning for Algorithm Classification
Examples of algorithms in Java and C++ are provided to test the algorithm classification deep learning tool. Once your gitpod machine is running, it will launch the following command:
Looks like Tensorflow 1.15 is no longer supported by default. You need to set up an older python environment that is compatible with this older version.
You will see the predicted probabilistic distribution of the class labels: the correctly classified label will be shown in blue, and the misclassified label will be shown in red.
To understand why, click at the HTML file "datasets/github_java_10/4/1.html" and use the Preview button on the up-right corner of the tab to see visualisation results in a split pane. The colours on the tokens indicate which parts of the code that have got the most attention by the classification algorithm.
To run another example, type:
run.sh datasets/github_java_10/4/3.java run.sh datasets/github_cs_10/4/1.cs run.sh datasets/github_cpp_10/4/1.cpp
In these examples, it shows that even though the model was trained using Java programs, when applying it to other programming languages such as C# or C++, it normally works well too. We call this feature "Cross-Language Algorithm Classification" [Bui et al SANER'19].
Usage of the fAST utility
cd datasets # print the command line options and arguments fast # convert a C++ code into protobuffer representation fast tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc.pb # convert a Java code into flatbuffers representation fast RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.fbs # convert a flatbuffers representation back to C# fast corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs.fbs corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs # slice a program fast -S -G RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests-ggnn.fbs # diff two programs fast -D github_java_10/4/1.java github_java_10/4/3.java
Usage of fAST in Bug Localisation
cd usr/bin java -cp /workspace/demo/usr/config:/workspace/demo/usr/config/lic:/workspace/demo/usr/lib/ConCodeSe-1.0.0.jar com.concodese.ConCodeSeJettyServerStarter SERVER_PORT=8081
You can call fAST anywhere when you have docker installed:
alias fast=”docker run -v $PWD:/e yijun/fast”
Reference and Applications
Yijun Yu. "fAST: Flattening Abstract Syntax Trees for Efficiency". In: 41st ACM/IEEE International Conference on Software Engineering, 25-31 May 2019, Montreal, Canada, ACM and IEEE. demo, paper, poster
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Learning Cross-Language API Mappings with Little Knowledge", In the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Tallinn, Estonia, 26-30 August 2019.
Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification", In the 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering, Research Track, Hangzhou, China, February 24-27, 2019. GGNN, DTBCNN
Nghi D. Q. Bui, Lingxiao Jiang, and Yijun Yu. "Cross-Language Learning for Program Classification Using Bilateral Tree-Based Convolutional Neural Networks", In the proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) Workshop on NLP for Software Engineering, New Orleans, Louisiana, USA, 2018. Bi-TBCNN
Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel. "Gated graph sequence neural networks", In: 4th International Conference on Language Representations (ICLR), 2016.
Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin: "Convolutional Neural Networks over Tree Structures for Programming Language Processing". In: AAAI 2016: 1287-1293. TBCNN, datasets/pku_cpp_104/
M. L. Collard and J. I. Maletic, "srcML 1.0: Explore, Analyze, and Manipulate Source Code," 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), Raleigh, NC, 2016, pp. 649-649. srcML
Hakam W. Alomari, Michael L. Collard, Jonathan I. Maletic, Nouh Alhindawi and Omar Meqdadi. “srcSlice: very efficient and scalable forward static slicing”. Software: Evolution and Process, 26(11):931-961, November 2014.
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. "Fine-grained and accurate source code differencing". In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE '14). ACM, New York, NY, USA, 313-324. GumTreeDiff
Yijun Yu, Thein Thun Tun, and Bashar Nuseibeh, "Specifying and detecting meaningful changes in programs," In: Proc. of the 26th IEEE/ACM Conference on Automated Software Engineering, pp. 273-282, 2011. MCT