Skip to content

Generating Adversarial Malware Examples for White-Box Attacks Based on GAN

License

Notifications You must be signed in to change notification settings

zabir-nabil/whitebox-attack-malware-GAN

Repository files navigation

Whitebox/Graybox Attack with Malware GAN

MalGAN

What are malwares, GANs, whitebox attacks❓

Malware (a portmanteau for malicious software) is any software intentionally designed to cause damage to a computer, server, client, or computer network. - wiki

  • Malware detection systems use machine learning models (both in antivirus softwares and cloud) to analyze static DLLs, API features, etc. to detect malwares.

  • But it is possible to fool the machine learning models by using adversarial attacks (generating malwares which look like benign sample to the machine learning model).

That's where GAN becomes useful.

A generative adversarial network is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game. Given a training set, this technique learns to generate new data with the same statistics as the training set. - - wiki

  • It is possible to generate a generative adversarial network (GAN) based algorithm to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models.
White-box 🔲 vs Black-box 🔳

In ideal lab scenario, we have access to the machine learning model which is detecting malwares (let's say, we design an MLP classifier which analyzers boolean features from an API, and makes a prediction). While training the GAN, if have full access to the detection model (directly), we can train our GAN by utilizing predictions from the MLP (while optimizing) to make robust adversarial examples. This is the white-box setup.

But in the real world, we may not always have full access to the detection model directly. The model can be treated as black-box and an alternate model can be used for generating the adversarial examples.

https://www.researchgate.net/publication/337296034_Improving_the_Reliability_of_Deep_Neural_Networks_in_NLP_A_Review

Download dataset

  • gdown https://drive.google.com/uc?id=1PwsY_T0MT4Mbk6g70l-jMpZrA7XweHEZ
  • gdown https://drive.google.com/uc?id=1sz12ejCuV9_yEzVI7qhRfUeTu7b4bsXO

Installation

  • With docker -

    • Build the docker image docker build .
    • nvidia-docker run -it -d -v /home/:/malgan --net=host d1cbaadbc4ea /bin/bash
    • nvidia-docker run -it -d -v /home/:/malgan --net=host d1cbaadbc4ea /bin/bash
  • Without docker

    • Make sure you have Nvidia driver and CUDA >= 9.2 (for GPU support)
    • pip install -r requirements.txt

📝📝

  • Dataset
  • Publish the synthetic data generator