<h1 align='center'>Cybersecurity threats detection using Deep Learning Architectures</h1>

### Types of Attacks

- *Denial of service attack (DoS)*: freezing or stopping the service permanently or temporarily, by sending a large amount of traffic
- *Remote to local attack*: unauthorized access is granted by sending packets between the network and the system
- *Probing*: information and data collected by scanning and mapping the network
- *User to root attack*: normal users' password is traced
- *Adversarial Attacks*: Deep Neural network are targeted by integrating noise in training data
- *Integrity Attacks*: system data is corrupted or encrypted
- *Causative Attacks*: neural network decision-making algorithm is attacked leading to miss-classification

### USTC-TK2016 Dataset

USTC-TK2016 is composed by a set of pcap files containing raw network traffic from 10 bening and 10 malware apps as shown at the table below:<br>
![USTC-TK2016](data/img/USTC-TK2016.png)<br>

### Approach

- *CNN*: Pcap files will be transformed to mist images fed to CNN
- *DNN~LSTM*: Argus api will be used to extract features from pcap files

#### Malware Traffic Classification Using CNN

##### Data preprocessing

- Step 1: Install pre-requisites (DO NOT RUN)

In [None]:
# Connect to Drive
from google.colab import drive
drive.mount('/content/drive')

# Update the list of packages
!sudo apt-get update
# Install pre-requisite packages.
!sudo apt-get install -y wget apt-transport-https software-properties-common
# Download the Microsoft repository GPG keys
!wget -q https://packages.microsoft.com/config/ubuntu/16.04/packages-microsoft-prod.deb
# Register the Microsoft repository GPG keys
!sudo dpkg -i packages-microsoft-prod.deb
# Update the list of packages after we added packages.microsoft.com
!sudo apt-get update
# Install PowerShell
!sudo apt-get install -y powershell
# Install SplitCap pre-requisite
!sudo apt install mono-runtime
# Install find dupes
!sudo apt-get install fdupes

%cd drive/MyDrive/UNIPI/DL_Cybersecurity/
# Clone the repository on "ubuntu" branch
!sudo git clone -b ubuntu https://github.com/yungshenglu/USTC-TK2016 USTC-TK2016
# Install the required packages
!pip3 install -r requirements.txt
# Download the traffic dataset
%cd USTC-TK2016/1_Pcap/
!sudo git clone -b master https://github.com/yungshenglu/USTC-TFC2016
# Grand run permission to executable files
%cd ../
!chmod 777 0_Tool/SplitCap_2-1/SplitCap.exe
!chmod 777 1_Pcap2Session.ps1
!chmod 777 2_ProcessSession.ps1


- Step 2: Split the PCAP files by each session (DO NOT RUN)


In [None]:
!pwsh -File ./1_Pcap2Session.ps1

- Step 3: Process Sessions  (DO NOT RUN)

Top 60000 large PCAP files selected and trimmed and randomly distributed into test and train sets.

In [None]:
!pwsh -File ./2_ProcessSession.ps1

- Step 4: PCAP files converted to images (DO NOT RUN)

Trimmed PCAP files into size is 784 bytes (28 x 28) (0x00 element is appended if the PCAP file is shorter than 784 bytes)

In [None]:
!python3 3_Session2Png.py

- Step 5: Png files are labeled and converted to IDX files (DO NOT RUN)

In [None]:
!python3 4_Png2Mnist.py

##### Training and Test

In [2]:
import tensorflow as tf
import gzip
import time
import sys
import numpy as np
import os

IMAGE_SIZE = 28
DATA_DIR = 'drive/MyDrive/UNIPI/DL_Cybersecurity/USTC-TK2016/5_Mnist/'


def extract_data(filename, num_images):
  """Extract the images into a 4D tensor [image index, y, x, channels].
  Values are rescaled from [0, 255] down to [-0.5, 0.5].
  """
  print('Extracting', filename)
  with gzip.open(filename) as bytestream:
    bytestream.read(16)
    buf = bytestream.read(IMAGE_SIZE * IMAGE_SIZE * num_images)
    data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
    #data = (data - (PIXEL_DEPTH / 2.0)) / PIXEL_DEPTH
    data = data.reshape(num_images, IMAGE_SIZE, IMAGE_SIZE, 1)
    return data


def extract_labels(filename, num_images):
  """Extract the labels into a vector of int64 label IDs."""
  print('Extracting', filename)
  with gzip.open(filename) as bytestream:
    bytestream.read(8)
    buf = bytestream.read(1 * num_images)
    labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
  return labels

# Extract it into np arrays.
train_data = extract_data(DATA_DIR + 'train-images-idx3-ubyte.gz', 60000)
train_labels = extract_labels(DATA_DIR + 'train-labels-idx1-ubyte.gz', 60000)
test_data = extract_data(DATA_DIR + 't10k-images-idx3-ubyte.gz', 10000)
test_labels = extract_labels(DATA_DIR + 't10k-labels-idx1-ubyte.gz', 10000)


TypeError: object of type 'module' has no len()