# Windower Repo Example 00: PCAP Data Preparation

This notebook performs a complete process to reconstruct the CTU-13, scenario #4 filtered PCAP dataset subset used for demonstrating Windower's abilities compared to the original Kitsune version. Details and reasoning of certain design choices when creating a dataset are not elaborated in this notebook. It merely presents the steps required for dataset reconstruction. Refer to the `datasets.md` file for the description of design choices.

This file performs PCAP data preparation common for both Kitsune and Windower Pipelines. Specific data preprocessing (e.g., running the Windower itself) can be found within the particular Jupyter notebooks `01_kitsune.ipynb` and `02_windower.ipynb`.

In [1]:
WORK_DIR = 'work'
SRC_DIR  = '../../src'

In [2]:
# Create a separate directory and work in it for the rest of examples
!mkdir -p $WORK_DIR

In [3]:
%cd -q $WORK_DIR

## PCAP Dataset Preparation Process

In [4]:
# Download the dataset
!LANG=C wget https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-45/capture20110815.truncated.pcap.bz2

--2023-11-10 13:09:36--  https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-45/capture20110815.truncated.pcap.bz2
Resolving mcfp.felk.cvut.cz (mcfp.felk.cvut.cz)... 147.32.82.194
Connecting to mcfp.felk.cvut.cz (mcfp.felk.cvut.cz)|147.32.82.194|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 980043232 (935M) [application/x-bzip2]
Saving to: 'capture20110815.truncated.pcap.bz2'


2023-11-10 13:12:27 (5.48 MB/s) - 'capture20110815.truncated.pcap.bz2' saved [980043232/980043232]



In [5]:
# Unzip the dataset
!bunzip2 capture20110815.truncated.pcap.bz2

In [6]:
# Obtain source traffic of IP addresses specified by the documentation
filter = 'src host 147.32.84.165 or src host 147.32.84.170 or src host 147.32.84.134 or ' \
    'src host 147.32.84.164 or src host 147.32.87.36 or src host 147.32.80.9 or src host 147.32.87.11'


!tcpdump -r capture20110815.truncated.pcap -w ctu13_sc4_filtered.pcap $filter

reading from file capture20110815.truncated.pcap, link-type EN10MB (Ethernet), snapshot length 262144


In [7]:
# Extract attacking and bening traffic
!tcpdump -r ctu13_sc4_filtered.pcap -w ctu13_sc4_malicious.pcap 'ip and src host 147.32.84.165 and dst host 147.32.96.69'
!tcpdump -r ctu13_sc4_filtered.pcap -w ctu13_sc4_benign.pcap 'ip and (not src host 147.32.84.165 or not dst host 147.32.96.69)'

reading from file ctu13_sc4_filtered.pcap, link-type EN10MB (Ethernet), snapshot length 262144
reading from file ctu13_sc4_filtered.pcap, link-type EN10MB (Ethernet), snapshot length 262144


In [8]:
# Remap malicious traffic to allow per-packet labelling
!tcprewrite -i ctu13_sc4_malicious.pcap -o ctu13_sc4_malicious_remap.pcap --srcipmap=147.32.84.165/32:10.0.0.165/32

In [9]:
# Merge the dataset back into one piece
!mergecap -w ctu13_sc4_remap.pcap ctu13_sc4_benign.pcap ctu13_sc4_malicious_remap.pcap

In [10]:
# Create train and test sets
!editcap -B '2011-08-15 12:30:00' ctu13_sc4_remap.pcap ctu13_sc4_train.pcap
!editcap -A '2011-08-15 12:30:00' ctu13_sc4_remap.pcap ctu13_sc4_test.pcap

In [11]:
# Create file consisting of attackers' IP addresses
!echo '10.0.0.165' > ctu13_sc4_test_attack_ips.txt