When Malware is Packin’ Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features
This repository provides datasets and codes that are needed to reproduce the experiments in the paper When Malware is Packin’ Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features.
In this paper, we have investigated the following question: does static analysis on packed binaries provide a rich enough set of features to a malware classifier? We first observed that the distribution of the packers in the training set must be considered, otherwise the lack of overlap between packers used in benign and malicious samples might cause the classifier to distinguish between packing routines instead of behaviors. Different from what is commonly assumed, packers preserve information when packing programs that is “useful” for malware classification, however, such information does not necessarily capture the sample’s behavior. In addition, such information does not help the classifier to (1) generalize its knowledge to operate on previously unseen packers, and (2) be robust against trivial adversarial attacks. We observed that static machine-learning-based products on VirusTotal produce a high false positive rate on packed binaries, possibly due to the limitations discussed in this work. This issue becomes magnified as we see a trend in the anti-malware industry toward an increasing deployment of machine-learning-based classifiers that only use static features.
If you find this work useful for your research you may want cite our paper.
TO BE FILLED LATER.
We create and use two different datasets in this work, named Wild Dataset and Lab Dataset. The former contains executables found in the wild, from two sources, an anti-malware vendor, and EMBER Dataset. We created the latter by packing executables in the wild with a set of nine packers. We exploited a wide range of techniques, especially dynamic analysis, to determine whether each sample is (1) benign or malicious and (2) packed or not packed. For details, you may read the paper (Section IV).
All these two datasets are stored in a single pickle file (using pandas package). Column
source determines the source of each sample,
wild-ember mean the sample has been seen in the wild, by the anti-malware vendor or Endgame, and
lab means we have created the sample by packing a sample from Wild Dataset.
packed column determines if the sample is packed or not.
malicious column determines if the sample is malicious or not.
packer_name determines the packer which is used to pack the sample,
none is set for unpacked samples. For samples from Lab Dataset, the column
unpacked_sample_sha1 determines the sha1sum of the executable before packing. This might be helpful for some experiments, as we are able to track back the history of the sample. In general, the name of the columns should be self-descriptive.
To download the pickle file, run the following command:
mkdir data; cd data/; wget https://drive.google.com/file/d/1PMCHM46mc4lhjMczfIDP45w_LGKTea8q/view?usp=sharing; cd ..;
To download only Wild Dataset, run the following command:
mkdir data; cd data/; wget https://drive.google.com/file/d/1stVX2-APaiH9XvXhVpySMkRmnLqsSCLM/view?usp=sharing; cd ..;
To download the samples, please contact us. We have all the samples on our server, and we are happy to share it with the community. Unfortunately, the huge size of the executable files prevented us from sharing them on the cloud. As we fully explained in the paper, we used Cuckoo and Deep Packer Inspection tools to create our datasets. All the file related to this process, including the dynamic behavior of samples are available on demand. We are happy to provide that also. We also can provide the VirusTotal reports for all the executables in our datasets.
In order to use our source code in the docker image, you first need to properly install Docker. To download (and load) the docker image that we used for our experiments, run the following command:
wget https://drive.google.com/file/d/1c7lOFLIf4rA2HRqfdRaEvYsbTSRlTjwE/view?usp=sharing; ./load_image.sh packware-docker.tar
Then, run the container:
Now, you can run all the experiments.
You need to execute the following scripts in
code/experiments directory. Roughly speaking, each experiment uses one configuration file (starts with
config*.py) and the main training file (training.py or training-nn.py for neural network).
round means how many times we run an experiments. It is just for collecting more coherent results, it has set to five in our experiments.
We know the code is not very well-written, and we are happy to answer all the questions.
Run the following commands for the experiments in the paper (with the same order as in the paper).
To run Experiment "packer classifier", run:
To run Experiment "good-bad packers", run:
./exp_diffPackedBenign.sh ./exp_diffPackedBenignNN.sh # for neural network
./exp_labDiffPackedBenign.sh ./exp_labDiffPackedBenignNN.sh # for neural network
./exp_singlepacker.sh ./exp_singlepacker-onlyapiimport.sh ./exp_singlepacker-onlyheader.sh ./exp_singlepacker-onlyrich.sh ./exp_singlepacker-onlysections.sh
./exp_wildvspacker.sh ./exp_wildvspacker-rich.sh ./exp_wildvspacker-nn.sh # for neural network
./exp_withheldpacker.sh ./exp_withheldpacker-nongrams.sh ./exp_withheldpacker-nn.sh # for neural network
./exp_dolphin.sh # Strong & Complete Encryption
For the adversarial experiment, use