# Windower Repo Example 01: Kitsune Pipeline Run

This notebook provides steps to preprocess the dataset for the original Kitsune NIDS, as provided by Mirsky et al. [1], run its evaluation pipeline, and analyze its output and performance.

This notebook expects the PCAP dataset variant already prepared inside the `examples/work`. This can be achieved by running the `00_dataset.ipynb` notebook.

[1] Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In proceedings of NDSS Symposium 2018. Available at: <https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-3_Mirsky_paper.pdf>.

In [17]:
import os

In [19]:
# We expect a current directory to be /examples in order for these variables to work
SRC_DIR  = '../src'
WORK_DIR = 'work'

## Data Preprocessing

The original Kitsune model requires special preprocessing for each packet formed based on the tshark output. This section describes these steps.

In [5]:
TSHARK_FIELDS = '-e frame.time_epoch -e frame.len -e eth.src -e eth.dst -e ip.src ' \
    '-e ip.dst -e tcp.srcport -e tcp.dstport -e udp.srcport -e tcp.dstport ' \
    '-e udp.srcport -e udp.dstport -e icmp.type -e icmp.code -e arp.opcode ' \
    '-e arp.src.hw_mac -e arp.src.proto_ipv4 -e arp.dst.hw_mac -e arp.dst.proto_ipv4 ' \
    '-e ipv6.src -e ipv6.dst'

In [6]:
%%time
# Extract Tshark TSV file the train set
!tshark -r $DATA_DIR/ctu13_sc4_train.pcap -T fields -E header=y -E occurrence=f $TSHARK_FIELDS > $WORK_DIR/ctu13_sc4_train.tsv

In [8]:
%%time
# Extract Tshark TSV file for the test set
!tshark -r $DATA_DIR/ctu13_sc4_test.pcap -T fields -E header=y -E occurrence=f $TSHARK_FIELDS > $WORK_DIR/ctu13_sc4_test.tsv

In [9]:
%%time
# Perform per-packet feature to prepare input for Kitsune
!python $SRC_DIR/kitsune/run-extraction-h5.py -o $WORK_DIR/ctu13_sc4_train.h5 $WORK_DIR/ctu13_sc4_train.tsv

python: can't open file '/home/goldy/Documents/papers/cesnet/JK-PG_paper/examples/setup.py': [Errno 2] No such file or directory
INFO:utils:there are 263132 packets
INFO:__main__:running extractor
100%|██████████████████████████████████| 263131/263131 [09:15<00:00, 473.56it/s]
INFO:__main__:extractor finished
CPU times: user 13.2 s, sys: 2.36 s, total: 15.5 s
Wall time: 9min 16s


In [10]:
%%time
# Perform per-packet feature to prepare input for Kitsune
!python $SRC_DIR/kitsune/run-extraction-h5.py -o $WORK_DIR/ctu13_sc4_test.h5 $WORK_DIR/ctu13_sc4_test.tsv

python: can't open file '/home/goldy/Documents/papers/cesnet/JK-PG_paper/examples/setup.py': [Errno 2] No such file or directory
INFO:utils:there are 929102 packets
INFO:__main__:running extractor
100%|██████████████████████████████████| 929101/929101 [34:49<00:00, 444.60it/s]
INFO:__main__:extractor finished
CPU times: user 56.6 s, sys: 25.8 s, total: 1min 22s
Wall time: 34min 51s


## Model Training

In [49]:
# Determine the amount of samples within the training set
train_len = sum(1 for _ in open(os.path.join(WORK_DIR, 'ctu13_sc4_train.tsv'))) - 2  
train_len

263130

In [50]:
# Use 10% of samples for Kitsune scheme traning and the rest for training to AEs themselves
fmgrace = int(train_len * 0.1)
adgrace = train_len - fmgrace

In [51]:
%%time
# Perform Kitsune training
!python $SRC_DIR/kitsune/run-learning.py -o $WORK_DIR/model_kitsune.bin --fmgrace $fmgrace --adgrace $adgrace $WORK_DIR/ctu13_sc4_train.h5

INFO:KitNET.KitNET:Feature-Mapper: train-mode, Anomaly-Detector: off-mode
INFO:__main__:running learning
 10%|███▏                             | 25573/263131 [00:01<00:13, 16984.13it/s]INFO:KitNET.KitNET:The Feature-Mapper found a mapping: 100 features to 17 autoencoders.
INFO:KitNET.KitNET:Feature-Mapper: execute-mode, Anomaly-Detector: train-mode
100%|█████████████████████████████████▉| 263100/263131 [06:47<00:00, 648.29it/s]INFO:KitNET.KitNET:Feature-Mapper: execute-mode, Anomaly-Detector: execute-mode
100%|██████████████████████████████████| 263131/263131 [06:47<00:00, 645.13it/s]
INFO:__main__:learning finished
INFO:__main__:model written


## Evaluation Running

In [54]:
%%time
# Perform Kitsune evaluation
!python $SRC_DIR/kitsune/run-testing.py --model $WORK_DIR/model_kitsune.bin --output $WORK_DIR/predictions_kitsune.pkts $WORK_DIR/ctu13_sc4_test.h5

INFO:__main__:running detector
100%|█████████████████████████████████| 929101/929101 [14:39<00:00, 1055.97it/s]
INFO:__main__:detection finished
CPU times: user 23.4 s, sys: 3.97 s, total: 27.4 s
Wall time: 14min 41s


At this point, we have a trained model `model_kitsune.bin` and a file with per-packet RMSE predictions `predictions_kitsune.pkts`. This file will be used for performance analysis and and comparison with the Windower in the `03_perf_comparison.ipynb` notebook.