# Loading and testing pre-trained models
Within `maltorch` it is possible to load pre-trained models and test them right away.
The code to be written is minimal, since **all** the included models provide pre-trained weights that can be downloaded.

In [1]:
%%capture --no-stderr
try:
    import maltorch
except ImportError:
   %pip install git+https://github.com/zangobot/maltorch

In [2]:
from pathlib import Path
from maltorch.data.loader import load_from_folder, create_labels
from maltorch.data_processing.grayscale_preprocessing import GrayscalePreprocessing
from maltorch.zoo.avaststyleconv import AvastStyleConv
from maltorch.zoo.bbdnn import BBDnn
from maltorch.zoo.ember_gbdt import EmberGBDT
from maltorch.zoo.malconv import MalConv
from maltorch.zoo.resnet18 import ResNet18

# To properly work, we need some PE test data 
# that you can insert inside a "data" folder located in the tutorial directory.
exe_folder = Path("insert_here_path")

# We now instantiate the AI-based Windows malware detector we want to evaluate.
# All the parameters of the networks are fetched online, 
# since we are not passing the model_path into the create_model function.
# It is possible also to define which device to use while loading them,
# as all Pytorch-based models can be run in GPU.

device = "cpu"
networks = {
    'EMBER GBDT': EmberGBDT.create_model(),
    'BBDnn': BBDnn.create_model(device=device),
    'Malconv': MalConv.create_model(device=device),
    'AvastStyleConv': AvastStyleConv.create_model(device=device),
    'Grayscale ResNet18': ResNet18.create_model(
        preprocessing=GrayscalePreprocessing(),
        device=device),
}

As shown in the code, all models are instantiated through a `create_model` function, which also accepts parameters to tune the internals of the AI-based detector to instantiate.
Also, we can notice that the ResNet18, which is a CNN that preprocess each PE as an image, is loaded with a Grayscale Preprocessing.
This object acts as a feature extractor, since it will convert all input PEs to grayscale images before computing predictions.

In [3]:
from torch.utils.data import DataLoader, TensorDataset

# We now load all .exe file from the folder, and we load them 
# on the same device as the models.
# All data are used as Pytorch dataloaders, 
# so that inference can be computed in batch (if the model allows it).

X = load_from_folder(exe_folder, "exe", device=device)
y = create_labels(X, 1)
data_loader = DataLoader(TensorDataset(X, y), batch_size=3)

RuntimeError: received an empty list of sequences

In [None]:
from secmlt.metrics.classification import Accuracy

# We can now compute all predictions!
# We just invoke the model as a Pytorch function, which returns a dataloader of labels.
# This is fed to the Accuracy object, which computes the performance on these data.

print("Computing maliciousness of loaded data...")
for k in networks:
    model = networks[k]
    print(f"{k}: {Accuracy()(model, data_loader) * 100:.2f}% malicious")

## What's inside a model?

We have seen how to load a pre-trained model.
But how to create a new one? It is very easy in fact:
* if it is a generic Pytorch mode, you need to extend the `BaseModel` class 
* if it is a Pytorch model that requires an Embedding Layer, you need to extend the `EmbeddingModel` class
* if it is not a Pytorch model, you need to extend the `Model` class and provide the mappings to the underlaying non-Pytorch data structures

While `Model` and `EmbeddingModel` are provided within `maltorch` (`maltorch.zoo.model`), the `BaseModel` is provided by the supporting library `secml-torch` (`secmlt.models.base_model`).

Below an example for creating your own `EmbeddingModel`.


In [None]:
import torch

from maltorch.zoo.model import EmbeddingModel

# Init should contain the parameters to define your model, provide the defaults you wish to use
class YourNewEmbeddingModel(EmbeddingModel):
    def __init__(
            self,
            embedding_size: int = 8,
            min_len: int = 4096,
            max_len: int = 102400,
            threshold: float = 0.5,
            padding_idx: int = 256,
    ):
        super(BBDnn, self).__init__(name="model_name", gdrive_id="gdrive_id", min_len=min_len, max_len=max_len)
        self.max_len = max_len
        self.threshold = threshold
        self.embedding = torch.nn.Embedding(
            num_embeddings=257, embedding_dim=embedding_size, padding_idx=padding_idx
        )
        # Insert here Pytorch model definition
        

    # Must define function for returning the embedding layer
    def embedding_layer(self):
        return self.embedding

    # Must define the logic AFTER the embedding
    def _forward_embed_x(self, x):
        #insert here your Pytorch logic
        return x

    # Must define how embeddings are computed
    def embed(self, x):
        emb_x = self.embedding(x)
        emb_x = emb_x.transpose(1, 2)
        return emb_x

    # Must define a get function for retrieving the weights of embeddings
    def embedding_matrix(self):
        return self.embedding.weight
