Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run your code with SLEIPNIR dataset #4

Closed
vietvo89 opened this issue Mar 23, 2021 · 5 comments
Closed

How to run your code with SLEIPNIR dataset #4

vietvo89 opened this issue Mar 23, 2021 · 5 comments

Comments

@vietvo89
Copy link

Hi Zay

I have got SLEIPNIR dataset from the author. But your sample code uses a data format differrent from SLEIPNIR dataset which consists of several individual files. So how can I run your malGAN with SLEIPNIR dataset?

Thanks

@ZaydH
Copy link
Owner

ZaydH commented Mar 23, 2021

It has been a few years since I worked on this code, and I am going off of memory.

The basic idea is you need to convert the SLEIPNIR files into a NumPy ndarray tensor. I found the old code I believe I used and uploaded it to a gist for you. Please try that. You may need to modify it to make it work.

@vietvo89
Copy link
Author

Thank you so much. Let me try your code. But one more thing, if I train MalGAN and have a model, how can I use your code to generate malware to evaluate the success rate of your method against the black-box detector? Is it right if I only use the trained Generator to produce benign samples from malware?

@ZaydH
Copy link
Owner

ZaydH commented Mar 24, 2021

I am not sure exactly what you mean. I will answer what is my best guess of what you mean. If this is off base, let me know.

The MalwareGAN code serial trains a blackbox detector (you can specify the type) as well as the GAN. I am not sure what you mean by "have a model". You could in theory replace my blackbox detector with your own if you wanted, but you would need to handle that integration.

To determine teh success rate as I did, I recommend splitting the training set into three parts: training, validation, and test. You use the training set to train the model (with validation for hyperparameter selection). Only then you use the held out test set to see how well your model performed on totally unseen data. This is the standard flow.

@vietvo89
Copy link
Author

Thank Zay.

I read other papers and they demonstrated how to do attack with GAN. But I want to double check with you that if I have trained GAN model, do I need Generator to attack or to make malware evade detectors? The flow may be feeding malware to the generator and then evaluate how its output evade the detector.

Thanks

@ZaydH
Copy link
Owner

ZaydH commented Mar 29, 2021

Yes.

After you train the model, you take a new malware vector, run it through the generator. This will yield a new vector that should evade the detector. To verify your workflow, you can then run that modified vector though the detector to see if it is marked as clean. This secondary sanity check is clearly not possible in practice but works for scientific evaluation/debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants