# TCC - NIDS Distribuido

> Tarefa:
Testar 4 Classificadores nos 5 NetFlow V2 Datasets em https://staff.itee.uq.edu.au/marius/NIDS_datasets/#RA6

- [ ] Matriz de Confusão
- [ ] Curva ROC
- [ ] Comparar Classificadores

## Imports

In [None]:
!pip install heatmapz

In [None]:
import pandas as pd
import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
from heatmap import heatmap, corrplot

## Datasets


### General Characteristics

Feature | Description
---|---
**IPV4_SRC_ADDR** | IPv4 source address
**IPV4_DST_ADDR** | IPv4 destination address
**L4_SRC_PORT** | IPv4 source port number
**L4_DST_PORT** | IPv4 destination port number
**PROTOCOL** | IP protocol identifier byte
**L7_PROTO** | Layer 7 protocol (numeric)
**IN_BYTES** | Incoming number of bytes
**OUT_BYTES** | Outgoing number of bytes
**IN_PKTS** | Incoming number of packets
**OUT_PKTS** | Outgoing number of packets
**FLOW_DURATION_MILLISECONDS** | Flow duration in milliseconds
**TCP_FLAGS** | Cumulative of all TCP flags
**CLIENT_TCP_FLAGS** | Cumulative of all client TCP flags
**SERVER_TCP_FLAGS** | Cumulative of all server TCP flags
**DURATION_IN** | Client to Server stream duration (msec)
**DURATION_OUT** | Client to Server stream duration (msec)
**MIN_TTL** | Min flow TTL
**MAX_TTL** | Max flow TTL
**LONGEST_FLOW_PKT** | Longest packet (bytes) of the flow
**SHORTEST_FLOW_PKT** | Shortest packet (bytes) of the flow
**MIN_IP_PKT_LEN** | Len of the smallest flow IP packet observed
**MAX_IP_PKT_LEN** | Len of the largest flow IP packet observed
**SRC_TO_DST_SECOND_BYTES** | Src to dst Bytes/sec
**DST_TO_SRC_SECOND_BYTES** | Dst to src Bytes/sec
**RETRANSMITTED_IN_BYTES** | Number of retransmitted TCP flow bytes (src->dst)
**RETRANSMITTED_IN_PKTS** | Number of retransmitted TCP flow packets rc->ds |
**RETRANSMITTED_OUT_BYTES** | Number of retransmitted TCP flow bytes (dst->src)
**RETRANSMITTED_OUT_PKTS** | Number of retransmitted TCP flow packets st->sr |
**SRC_TO_DST_AVG_THROUGHPUT** | Src to dst average thpt (bps)
**DST_TO_SRC_AVG_THROUGHPUT** | Dst to src average thpt (bps)
**NUM_PKTS_UP_TO_128_BYTES** | Packets whose IP size <= 128
**NUM_PKTS_128_TO_256_BYTES** | Packets whose IP size > 128 and <= 256
**NUM_PKTS_256_TO_512_BYTES** | Packets whose IP size > 256 and <= 512
**NUM_PKTS_512_TO_1024_BYTES** | Packets whose IP size > 512 and <= 1024
**NUM_PKTS_1024_TO_1514_BYTES** | Packets whose IP size > 1024 and <= 1514
**TCP_WIN_MAX_IN** | Max TCP Window (src->dst)
**TCP_WIN_MAX_OUT** | Max TCP Window (dst->src)
**ICMP_TYPE** | ICMP Type * 256 + ICMP code
**ICMP_IPV4_TYPE** | ICMP Type
**DNS_QUERY_ID** | DNS query transaction Id
**DNS_QUERY_TYPE** | DNS query type (e.g. 1=A, 2=NS..)
**DNS_TTL_ANSWER** | TTL of the first A record (if any)
**FTP_COMMAND_RET_CODE** | FTP client command return code

### NF-UNSW-NB15-v2

The publicly available pcaps of the ToN-IoT dataset are utilised to generate its NetFlow records, leading to a NetFlow-based IoT network dataset called NF-ToN-IoT. The total number of data flows is 16,940,496 out of which 10,841,027 (63.99%) are attack samples and 6,099,469 (36.01%), the table below lists and defines the distribution of the NF-ToN-IoT-v2 dataset.

Class |	Count |	Description
--- | --- | ---
Benign	| 2295222 |	Normal unmalicious flows
Fuzzers |	22310 |	An attack in which the attacker sends large amounts of random data which cause a system to crash and also aim to discover security vulnerabilities in a system.
Analysis |	2299 |	A group that presents a variety of threats that target web applications through ports, emails and scripts.
Backdoor |	2169 |	A technique that aims to bypass security mechanisms by replying to specific constructed client applications.
DoS	| 5794 | Denial of Service is an attempt to overload a computer system's resources with the aimof preventing access to or availability of its data.
Exploits |	31551 |	Are sequences of commands controlling the behaviour of a host through a known vulnerability
Generic | 16560 | A method that targets cryptography and causes a collision with each block-cipher.
Reconnaissance | 12779 |	A technique for gathering information about a network host and is also known as a probe.
Shellcode |	1427 |	A malware that penetrates a code to control a victim's host.
Worms |164 | Attacks that replicate themselves and spread to other computers.

### NF-BoT-IoT-v2


An IoT NetFlow-based dataset was generated by expanding the NF-BoT-IoT dataset. The features were extracted from the publicly available pcap files and the flows were labelled with their respective attack categories. The total number of data flows is 37,763,497 out of which 37,628,460 (99.64%) are attack samples and 135,037 (0.36%) are benign. There are four attack categories in the dataset, the table below represents the NF-BoT-IoT-v2 distribution of all flows.

Class | Count | Description
---|---|---
Benign | 135037 | Normal unmalicious flows
Reconnaissance | 2620999 | A technique for gathering information about a network host and is also known as a probe.
DDoS | 18331847 | Distributed Denial of Service is an attempt similar to DoS but has multiple different distributed sources.
DoS | 16673183 | An attempt to overload a computer system's resources with the aim of preventing access to or availability of its data.
Theft | 2431 | A group of attacks that aims to obtain sensitive data such as data theft and keylogging

## Settings

In [None]:
#@markdown <h4>Application</h4> <hr>
APPLICATION = "None" #@param ["None"]
#@markdown <h4>Dataset</h4> <hr>
DATASET = "NF-UNSW-NB15-v2" #@param ["NF-UNSW-NB15-v2", "NF-ToN-IoT-v2", "NF-BoT-IoT-v2", "NF-CSE-CIC-IDS2018-v2", "NF-UQ-NIDS-v2"]

## Database

In [None]:
data = None
if DATASET == "NF-UNSW-NB15-v2":
    ...
elif DATASET == "NF-ToN-IoT-v2"
    ...
elif DATASET == "NF-BoT-IoT-v2"
    ...
elif DATASET == "NF-CSE-CIC-IDS2018-v2"
    ...
elif DATASET == "NF-UQ-NIDS-v2"
    ...

## Matriz de Confusão

## Matriz de Correlação

In [None]:
plt.figure(figsize=(8, 8))
corrplot(data.corr(), size_scale=300);