# Windower Repo Example 01: Kitsune Pipeline Run

This notebook provides steps to preprocess the dataset for the original Kitsune NIDS, as provided by Mirsky et al. [1], run its evaluation pipeline, and analyze its output and performance.

This notebook expects the PCAP dataset variant already prepared inside the `examples/work`. This can be achieved by running the `00_dataset.ipynb` notebook.

[1] Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In proceedings of NDSS Symposium 2018. Available at: <https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-3_Mirsky_paper.pdf>.

In [1]:
import os

In [2]:
# We expect a current directory to be /examples in order for these variables to work
SRC_DIR  = '../src'
WORK_DIR = 'work'

## Data Preprocessing

The original Kitsune model requires special preprocessing for each packet formed based on the tshark output. This section describes these steps.

In [3]:
TSHARK_FIELDS = '-e frame.time_epoch -e frame.len -e eth.src -e eth.dst -e ip.src ' \
    '-e ip.dst -e tcp.srcport -e tcp.dstport -e udp.srcport -e tcp.dstport ' \
    '-e udp.srcport -e udp.dstport -e icmp.type -e icmp.code -e arp.opcode ' \
    '-e arp.src.hw_mac -e arp.src.proto_ipv4 -e arp.dst.hw_mac -e arp.dst.proto_ipv4 ' \
    '-e ipv6.src -e ipv6.dst'

In [4]:
%%time
# Extract Tshark TSV file the train set
!tshark -r $WORK_DIR/ctu13_sc4_train.pcap -T fields -E header=y -E occurrence=f $TSHARK_FIELDS > $WORK_DIR/ctu13_sc4_train.tsv

CPU times: user 350 ms, sys: 57.3 ms, total: 407 ms
Wall time: 19.1 s


In [5]:
%%time
# Extract Tshark TSV file for the test set
!tshark -r $WORK_DIR/ctu13_sc4_test.pcap -T fields -E header=y -E occurrence=f $TSHARK_FIELDS > $WORK_DIR/ctu13_sc4_test.tsv

CPU times: user 1.22 s, sys: 184 ms, total: 1.4 s
Wall time: 1min 4s


In [6]:
%%time
# Perform per-packet feature to prepare input for Kitsune
!python $SRC_DIR/kitsune/run-extraction-h5.py -o $WORK_DIR/ctu13_sc4_train.h5 $WORK_DIR/ctu13_sc4_train.tsv

INFO:utils:there are 263132 packets
INFO:__main__:running extractor
100%|██████████████████████████████████| 263131/263131 [09:10<00:00, 477.66it/s]
INFO:__main__:extractor finished
CPU times: user 13.3 s, sys: 2.32 s, total: 15.6 s
Wall time: 9min 12s


In [7]:
%%time
# Perform per-packet feature to prepare input for Kitsune
!python $SRC_DIR/kitsune/run-extraction-h5.py -o $WORK_DIR/ctu13_sc4_test.h5 $WORK_DIR/ctu13_sc4_test.tsv

INFO:utils:there are 929102 packets
INFO:__main__:running extractor
100%|██████████████████████████████████| 929101/929101 [30:19<00:00, 510.59it/s]
INFO:__main__:extractor finished
CPU times: user 45.2 s, sys: 18.4 s, total: 1min 3s
Wall time: 30min 21s


## Model Training

In [8]:
# Determine the amount of samples within the training set
train_len = sum(1 for _ in open(os.path.join(WORK_DIR, 'ctu13_sc4_train.tsv'))) - 2  
train_len

263130

In [9]:
# Use 10% of samples for Kitsune scheme traning and the rest for training to AEs themselves
fmgrace = int(train_len * 0.1)
adgrace = train_len - fmgrace

In [10]:
%%time
# Perform Kitsune training
!python $SRC_DIR/kitsune/run-learning.py -o $WORK_DIR/model_kitsune.bin --fmgrace $fmgrace --adgrace $adgrace $WORK_DIR/ctu13_sc4_train.h5

INFO:KitNET.KitNET:Feature-Mapper: train-mode, Anomaly-Detector: off-mode
INFO:__main__:running learning
 10%|███▏                             | 25614/263131 [00:01<00:12, 18606.64it/s]INFO:KitNET.KitNET:The Feature-Mapper found a mapping: 100 features to 17 autoencoders.
INFO:KitNET.KitNET:Feature-Mapper: execute-mode, Anomaly-Detector: train-mode
100%|█████████████████████████████████▉| 263096/263131 [05:47<00:00, 701.44it/s]INFO:KitNET.KitNET:Feature-Mapper: execute-mode, Anomaly-Detector: execute-mode
100%|██████████████████████████████████| 263131/263131 [05:47<00:00, 756.80it/s]
INFO:__main__:learning finished
INFO:__main__:model written
CPU times: user 7.18 s, sys: 1.21 s, total: 8.39 s
Wall time: 5min 49s


## Evaluation Running

In [11]:
%%time
# Perform Kitsune evaluation
!python $SRC_DIR/kitsune/run-testing.py --model $WORK_DIR/model_kitsune.bin --output $WORK_DIR/predictions_kitsune_pkts.rmse $WORK_DIR/ctu13_sc4_test.h5

INFO:__main__:running detector
100%|█████████████████████████████████| 929101/929101 [12:56<00:00, 1196.42it/s]
INFO:__main__:detection finished
CPU times: user 18.2 s, sys: 2.96 s, total: 21.1 s
Wall time: 12min 57s


At this point, we have a trained model `model_kitsune.bin` and a file with per-packet RMSE predictions `predictions_kitsune.pkts`. This file will be used for performance analysis and and comparison with the Windower in the `03_perf_comparison.ipynb` notebook.