# CIC-IoT23 Dataset Analysis

 * **Author:** Patrik Goldschmidt (igoldschmidt@fit.vut.cz)
* **Project:** Network Intrusion Datasets: A Survey, Limitations, and Recommendations
* **Date:** 2024

In [8]:
import pandas as pd
import os
import numpy as np

pd.set_option('display.max_columns', None)

## CSV Analysis

In [2]:
# In prior, we merged all CSVs into a single file for the analysis
CSV_PATH = '/data/ciciot23/csv/merge.csv'

In [3]:
data = pd.read_csv(CSV_PATH)

In [4]:
data.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46686579 entries, 0 to 46686578
Data columns (total 47 columns):
 #   Column           Dtype  
---  ------           -----  
 0   flow_duration    float64
 1   Header_Length    float64
 2   Protocol Type    float64
 3   Duration         float64
 4   Rate             float64
 5   Srate            float64
 6   Drate            float64
 7   fin_flag_number  float64
 8   syn_flag_number  float64
 9   rst_flag_number  float64
 10  psh_flag_number  float64
 11  ack_flag_number  float64
 12  ece_flag_number  float64
 13  cwr_flag_number  float64
 14  ack_count        float64
 15  syn_count        float64
 16  fin_count        float64
 17  urg_count        float64
 18  rst_count        float64
 19  HTTP             float64
 20  HTTPS            float64
 21  DNS              float64
 22  Telnet           float64
 23  SMTP             float64
 24  SSH              float64
 25  IRC              float64
 26  TCP              float64
 27  UDP       

In [5]:
len(data)

46686579

In [9]:
data.head()

Unnamed: 0,flow_duration,Header_Length,Protocol Type,Duration,Rate,Srate,Drate,fin_flag_number,syn_flag_number,rst_flag_number,psh_flag_number,ack_flag_number,ece_flag_number,cwr_flag_number,ack_count,syn_count,fin_count,urg_count,rst_count,HTTP,HTTPS,DNS,Telnet,SMTP,SSH,IRC,TCP,UDP,DHCP,ARP,ICMP,IPv,LLC,Tot sum,Min,Max,AVG,Std,Tot size,IAT,Number,Magnitue,Radius,Covariance,Variance,Weight,label
0,0.0,54.0,6.0,64.0,0.329807,0.329807,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,567.0,54.0,54.0,54.0,0.0,54.0,83343830.0,9.5,10.392305,0.0,0.0,0.0,141.55,DDoS-RSTFINFlood
1,0.0,57.04,6.33,64.0,4.290556,4.290556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,581.33,54.0,66.3,54.796404,2.822973,57.04,82926070.0,9.5,10.464666,4.010353,160.987842,0.05,141.55,DoS-TCP_Flood
2,0.0,0.0,1.0,64.0,33.396799,33.396799,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,441.0,42.0,42.0,42.0,0.0,42.0,83127990.0,9.5,9.165151,0.0,0.0,0.0,141.55,DDoS-ICMP_Flood
3,0.328175,76175.0,17.0,64.0,4642.13301,4642.13301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,525.0,50.0,50.0,50.0,0.0,50.0,83015700.0,9.5,10.0,0.0,0.0,0.0,141.55,DoS-UDP_Flood
4,0.11732,101.73,6.11,65.91,6.202211,6.202211,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.01,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,644.6,57.88,131.6,67.95923,23.113111,57.88,82973000.0,9.5,11.346876,32.716243,3016.808286,0.19,141.55,DoS-SYN_Flood


In [20]:
# Check on classes distributions
data['label'].value_counts()

label
DDoS-ICMP_Flood            7200504
DDoS-UDP_Flood             5412287
DDoS-TCP_Flood             4497667
DDoS-PSHACK_Flood          4094755
DDoS-SYN_Flood             4059190
DDoS-RSTFINFlood           4045285
DDoS-SynonymousIP_Flood    3598138
DoS-UDP_Flood              3318595
DoS-TCP_Flood              2671445
DoS-SYN_Flood              2028834
BenignTraffic              1098195
Mirai-greeth_flood          991866
Mirai-udpplain              890576
Mirai-greip_flood           751682
DDoS-ICMP_Fragmentation     452489
MITM-ArpSpoofing            307593
DDoS-UDP_Fragmentation      286925
DDoS-ACK_Fragmentation      285104
DNS_Spoofing                178911
Recon-HostDiscovery         134378
Recon-OSScan                 98259
Recon-PortScan               82284
DoS-HTTP_Flood               71864
VulnerabilityScan            37382
DDoS-HTTP_Flood              28790
DDoS-SlowLoris               23426
DictionaryBruteForce         13064
BrowserHijacking              5859
CommandInjecti

There are no timestamps in the associated CSV files, despite the documentation declares that CSVs should contain a `ts` feature. For this reason, temporal/continuity analysis cannot be performed via this channel.

## PCAP Analysis

In [2]:
DATA_PATH = '/data/ciciot23/pcap'

In [3]:
FILE_EXT = '.pcap'

# Function to find all .pcap files and launch the capinfos program upon them
def analyze_pcaps(folder):
    pcap_files = [pcap_file for pcap_file in os.listdir(folder) if pcap_file.endswith(FILE_EXT)]
    pcap_files.sort()

    for pcap_file in pcap_files:
        pcap_path = os.path.join(DATA_PATH, pcap_file)

        !capinfos -c -d -a -e -K -M -n -o -u $pcap_path
        print()

The majority of data preparation was done from bash. This notebook will only serve to display the results.

Due to space restrictions, we will unroll the PCAP analysis in batches - with changing contents of the DATA_PATH based on the batch. Therefore, the same command below yields different results (different files are analyzed) each time when called.

In [8]:
# Batch 1
analyze_pcaps(DATA_PATH)

File name:           /data/ciciot23/pcap/backdoor_malware.pcap
Number of packets:   33414
Data size:           10010158 bytes
Capture duration:    2040.527838 seconds
First packet time:   2023-02-13 14:58:00.278956
Last packet time:    2023-02-13 15:32:00.806794
Strict time order:   True



File name:           /data/ciciot23/pcap/benign.pcap
Number of packets:   11102705
Data size:           6820202305 bytes
Capture duration:    100800.722040 seconds
First packet time:   2022-10-07 19:15:00.350417
Last packet time:    2022-10-08 23:15:01.072457
Strict time order:   True

File name:           /data/ciciot23/pcap/browser_hijacking.pcap
Number of packets:   59821
Data size:           34081456 bytes
Capture duration:    1559.994786 seconds
First packet time:   2022-11-14 21:23:00.677549
Last packet time:    2022-11-14 21:49:00.672335
Strict time order:   True

File name:           /data/ciciot23/pcap/command_injection.pcap
Number of packets:   55741
Data size:           27693726 bytes
Capture duration:    2340.506996 seconds
First packet time:   2023-02-14 15:53:00.043362
Last packet time:    2023-02-14 16:32:00.550358
Strict time order:   True

File name:           /data/ciciot23/pcap/ddos_ack_fragmentation.pcap
Number of packets:   28434845
Data size:           2497658955

In [4]:
# Batch 2
analyze_pcaps(DATA_PATH)

File name:           /data/ciciot23/pcap/ddos_pshack_flood.pcap
Number of packets:   409376766
Data size:           24772704029 bytes
Capture duration:    369180.850307 seconds
First packet time:   2022-10-20 15:55:00.439052
Last packet time:    2022-10-24 22:28:01.289359
Strict time order:   True

File name:           /data/ciciot23/pcap/ddos_rstfin_flood.pcap
Number of packets:   404426407
Data size:           24725282938 bytes
Capture duration:    171060.584695 seconds
First packet time:   2022-10-26 22:43:00.385687
Last packet time:    2022-10-28 22:14:00.970382
Strict time order:   False

File name:           /data/ciciot23/pcap/ddos_slowloris.pcap
Number of packets:   2351454
Data size:           637984446 bytes
Capture duration:    68759.594804 seconds
First packet time:   2022-11-03 19:18:00.843086
Last packet time:    2022-11-04 14:24:00.437890
Strict time order:   True

File name:           /data/ciciot23/pcap/ddos_syn_flood.pcap
Number of packets:   405821779
Data size:     

In [5]:
# Batch 3
analyze_pcaps(DATA_PATH)

File name:           /data/ciciot23/pcap/mirai_greeth_flood.pcap
Number of packets:   98965701
Data size:           57787761324 bytes
Capture duration:    520381.212710 seconds
First packet time:   2023-01-12 16:19:00.107475
Last packet time:    2023-01-18 16:52:01.320185
Strict time order:   False

File name:           /data/ciciot23/pcap/mirai_greip_flood.pcap
Number of packets:   75028492
Data size:           42510225765 bytes
Capture duration:    1810980.552373 seconds
First packet time:   2022-12-19 17:23:00.400742
Last packet time:    2023-01-09 16:26:00.953115
Strict time order:   False

File name:           /data/ciciot23/pcap/mirai_udpplain.pcap
Number of packets:   88883725
Data size:           48569735062 bytes
Capture duration:    2252820.035282 seconds
First packet time:   2023-01-19 16:51:00.595065
Last packet time:    2023-02-14 18:38:00.630347
Strict time order:   False

File name:           /data/ciciot23/pcap/mitm_arpspoofing.pcap
Number of packets:   3091765
Data siz

### Selection of Sketchy Durations
* `ddos_http_flood.pcap`
* `ddos_icmp_flood.pcap`
* `ddos_icmp_fragmentation.pcap`
* `ddos_syn_flood.pcap`
* `ddos_udp_flood.pcap`
* `ddos_udp_fragmentation.pcap`

Based on the manual analysis, there are gaps in the capture PCAP files - even in a single one.
E.g., Mirai-udpplain24.pcap has 2 very dense areas with packets, but a huge gap around 1172000th packet, causing the overall capture to span a long duration.

### Fixed/Precomputed Durations


UDP-fragmentation contained one packet with faulty timestamp, so we analyzed it manually to at least fix this one error.

```
File name:           /data/ciciot23/pcap/ddos_udp_fragmentation.pcap
Number of packets:   29213174
Data size:           25508655571 bytes
Capture duration:    158880.49 seconds
First packet time:   2022-11-01 18:44:00.173997
Last packet time:    2022-11-03 14:52:00.661996
Strict time order:   False
```

### Totals
* Total duration was computed using a search for the longest overlap, done outside of notebook
* Count packets


In [None]:
# Results Backup
all_info = \
'''
File name:           /data/ciciot23/pcap/backdoor_malware.pcap
Number of packets:   33414

File name:           /data/ciciot23/pcap/benign.pcap
Number of packets:   11102705

File name:           /data/ciciot23/pcap/browser_hijacking.pcap
Number of packets:   59821

File name:           /data/ciciot23/pcap/command_injection.pcap
Number of packets:   55741

File name:           /data/ciciot23/pcap/ddos_ack_fragmentation.pcap
Number of packets:   28434845

File name:           /data/ciciot23/pcap/ddos_http_flood.pcap
Number of packets:   2881005

File name:           /data/ciciot23/pcap/ddos_icmp_flood.pcap
Packet size limit:   inferred: 52 bytes
Number of packets:   719862941

File name:           /data/ciciot23/pcap/ddos_icmp_fragmentation.pcap
Number of packets:   45134379

File name:           /data/ciciot23/pcap/ddos_pshack_flood.pcap
Number of packets:   409376766

File name:           /data/ciciot23/pcap/ddos_rstfin_flood.pcap
Number of packets:   404426407

File name:           /data/ciciot23/pcap/ddos_slowloris.pcap
Number of packets:   2351454

File name:           /data/ciciot23/pcap/ddos_syn_flood.pcap
Number of packets:   405821779

File name:           /data/ciciot23/pcap/ddos_synonymous_ip_flood.pcap
Number of packets:   359727604

File name:           /data/ciciot23/pcap/ddos_tcp_flood.pcap
Number of packets:   449646363

File name:           /data/ciciot23/pcap/ddos_udp_flood.pcap
Number of packets:   541088223

File name:           /data/ciciot23/pcap/ddos_udp_fragmentation.pcap
Number of packets:   29213174

File name:           /data/ciciot23/pcap/dictionary_bruteforce.pcap
Number of packets:   133138

File name:           /data/ciciot23/pcap/dns_spoofing.pcap
Number of packets:   1812557

File name:           /data/ciciot23/pcap/dos_http_flood.pcap
Number of packets:   7177958

File name:           /data/ciciot23/pcap/dos_syn_flood.pcap
Number of packets:   202843161

File name:           /data/ciciot23/pcap/dos_tcp_flood.pcap
Number of packets:   267081205

File name:           /data/ciciot23/pcap/dos_udp_flood.pcap
Number of packets:   331744834

File name:           /data/ciciot23/pcap/mirai_greeth_flood.pcap
Number of packets:   98965701

File name:           /data/ciciot23/pcap/mirai_greip_flood.pcap
Number of packets:   75028492

File name:           /data/ciciot23/pcap/mirai_udpplain.pcap
Number of packets:   88883725

File name:           /data/ciciot23/pcap/mitm_arpspoofing.pcap
Number of packets:   3091765

File name:           /data/ciciot23/pcap/recon_hostdiscovery.pcap
Number of packets:   1371112

File name:           /data/ciciot23/pcap/recon_os_scan.pcap
Number of packets:   992241

File name:           /data/ciciot23/pcap/recon_ping_sweep.pcap
Number of packets:   22943

File name:           /data/ciciot23/pcap/recon_portscan.pcap
Number of packets:   831856

File name:           /data/ciciot23/pcap/sql_injection.pcap
Number of packets:   53462

File name:           /data/ciciot23/pcap/uploading_attack.pcap
Number of packets:   12939

File name:           vulnerability_scan.pcap
Number of packets:   3802533

File name:           /data/ciciot23/pcap/xss.pcap
Number of packets:   40183
'''

In [17]:
# Count the total number of packets in all PCAP files by adding up the lines from the capinfos outputs.
pkts_total = 0

for line in all_info.splitlines():
    if line.startswith('Number of packets:'):
        file_pkts = int(line.rsplit(' ', 1)[1])

        pkts_total += file_pkts

print(pkts_total)

4493106426
