# UNK22 Dataset Analysis

* **Author:** Patrik Goldschmidt (igoldschmidt@fit.vut.cz)
* **Project:** Network Intrusion Datasets: A Survey, Limitations, and Recommendations
* **Date:** 2024

Data source: [https://github.com/ucadatalab/ff4ml/tree/master/data/unk22](https://github.com/ucadatalab/ff4ml/tree/master/data/unk22)

In [1]:
import pandas as pd
import numpy as np
import os

pd.set_option('display.max_columns', None)

In [2]:
DATA_PATH_10 = 'unk22_10K.csv'
DATA_PATH_20 = 'unk22_20K.csv'

## Data10

In [3]:
data10 = pd.read_csv(DATA_PATH_10)

In [4]:
data10.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20099 entries, 0 to 20098
Data columns (total 136 columns):
 #    Column              Non-Null Count  Dtype 
---   ------              --------------  ----- 
 0    srcip_private       20099 non-null  int64 
 1    srcip_public        20099 non-null  int64 
 2    srcip_default       20099 non-null  int64 
 3    dstip_private       20099 non-null  int64 
 4    dstip_public        20099 non-null  int64 
 5    dstip_default       20099 non-null  int64 
 6    sport_zero          20099 non-null  int64 
 7    sport_multiplex     20099 non-null  int64 
 8    sport_echo          20099 non-null  int64 
 9    sport_discard       20099 non-null  int64 
 10   sport_daytime       20099 non-null  int64 
 11   sport_quote         20099 non-null  int64 
 12   sport_chargen       20099 non-null  int64 
 13   sport_ftp_data      20099 non-null  int64 
 14   sport_ftp_control   20099 non-null  int64 
 15   sport_ssh           20099 non-null  int64 
 16   sp

In [5]:
len(data10)

20099

In [6]:
data10['dataset'].value_counts()

nsl-kdd      10451
ugr16         5850
unsw-nb15     3798
Name: dataset, dtype: int64

In [7]:
data10['outcome'].value_counts()

dos           9858
background    9158
scan          1083
Name: outcome, dtype: int64

Data10 is supposed to be a subset of Data20, so we will focus on its analysis furthermore.

## Data20

In [8]:
data20 = pd.read_csv(DATA_PATH_20)

In [9]:
data20.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40530 entries, 0 to 40529
Data columns (total 136 columns):
 #    Column              Non-Null Count  Dtype 
---   ------              --------------  ----- 
 0    srcip_private       40530 non-null  int64 
 1    srcip_public        40530 non-null  int64 
 2    srcip_default       40530 non-null  int64 
 3    dstip_private       40530 non-null  int64 
 4    dstip_public        40530 non-null  int64 
 5    dstip_default       40530 non-null  int64 
 6    sport_zero          40530 non-null  int64 
 7    sport_multiplex     40530 non-null  int64 
 8    sport_echo          40530 non-null  int64 
 9    sport_discard       40530 non-null  int64 
 10   sport_daytime       40530 non-null  int64 
 11   sport_quote         40530 non-null  int64 
 12   sport_chargen       40530 non-null  int64 
 13   sport_ftp_data      40530 non-null  int64 
 14   sport_ftp_control   40530 non-null  int64 
 15   sport_ssh           40530 non-null  int64 
 16   sp

In [10]:
len(data20)

40530

In [11]:
data20.head(10)

Unnamed: 0,srcip_private,srcip_public,srcip_default,dstip_private,dstip_public,dstip_default,sport_zero,sport_multiplex,sport_echo,sport_discard,sport_daytime,sport_quote,sport_chargen,sport_ftp_data,sport_ftp_control,sport_ssh,sport_telnet,sport_smtp,sport_dns,sport_bootp,sport_gopher,sport_finger,sport_http,sport_kerberos,sport_pop3,sport_nntp,sport_ntp,sport_netbios,sport_imap4,sport_snmp,sport_ldap,sport_https,sport_mds,sport_kpasswd,sport_smtp_ssl,sport_syslog,sport_smtp2,sport_ldaps,sport_cups,sport_imap4_ssl,sport_socks,sport_openvpn,sport_mssql,sport_citrix,sport_oracle,sport_rapservice,sport_msnmessenger,sport_mgc,sport_mysql,sport_metasploit,sport_emule,sport_xmpp,sport_irc,sport_bittorrent,sport_http2,sport_reserved,sport_register,sport_private,dport_zero,dport_multiplex,dport_echo,dport_discard,dport_daytime,dport_quote,dport_chargen,dport_ftp_data,dport_ftp_control,dport_ssh,dport_telnet,dport_smtp,dport_dns,dport_bootp,dport_gopher,dport_finger,dport_http,dport_kerberos,dport_pop3,dport_nntp,dport_ntp,dport_netbios,dport_imap4,dport_snmp,dport_ldap,dport_https,dport_mds,dport_kpasswd,dport_smtp_ssl,dport_syslog,dport_smtp2,dport_ldaps,dport_cups,dport_imap4_ssl,dport_socks,dport_openvpn,dport_mssql,dport_citrix,dport_oracle,dport_rapservice,dport_msnmessenger,dport_mgc,dport_mysql,dport_metasploit,dport_emule,dport_xmpp,dport_irc,dport_bittorrent,dport_http2,dport_reserved,dport_register,dport_private,protocol_tcp,protocol_udp,protocol_icmp,protocol_igmp,protocol_other,tcpflags_URG,tcpflags_ACK,tcpflags_PSH,tcpflags_RST,tcpflags_SYN,tcpflags_FIN,srctos_zero,srctos_192,srctos_other,npackets_verylow,npackets_low,npackets_medium,npackets_high,npackets_veryhigh,nbytes_verylow,nbytes_low,nbytes_medium,nbytes_high,nbytes_veryhigh,outcome,dataset
0,3,193921,3617,10,193376,4155,1316,0,0,0,0,0,0,1,259,132,60,2140,31606,0,0,0,26730,0,1295,0,279,39,462,168,0,17973,1006,0,11,0,16,0,0,204,2,11,7,1,1,1,0,0,68,1,0,87,3,4,107,83856,59011,54674,378,27,1,0,0,127,1,1,318,231,1436,1881,30144,1,1,0,29178,0,1176,0,388,69,475,548,1,20479,11417,0,13,0,23,1,0,220,4,8,7,0,1,1,0,0,37,1,0,82,4,8,172,99243,48207,50091,128622,67580,1222,0,117,0,180427,92019,15895,101629,80556,170167,288,27086,107162,74516,12244,3272,347,87701,51022,45305,10867,2646,background,ugr16
1,31,193453,4057,28,193032,4481,1820,1,0,0,0,0,0,9,173,143,62,1815,28409,0,0,0,26642,0,1074,0,282,23,450,242,0,19936,1010,0,15,0,14,0,0,157,4,14,2,2,1,0,2,2,60,1,1,72,1,4,110,82474,61057,54010,481,23,1,0,0,124,1,9,176,181,1246,1987,29437,0,0,2,27181,0,977,0,406,44,446,690,0,20399,13843,0,15,0,16,2,1,155,7,12,5,2,0,0,1,1,34,1,0,89,0,9,173,98642,46723,52176,129264,66431,1714,0,132,0,178563,91077,16779,103253,81099,164658,283,32600,107096,74349,12342,3356,398,86592,52230,45360,10637,2722,background,ugr16
2,0,193309,4232,13,192795,4733,1523,1,0,0,0,0,0,3,122,165,76,1830,28006,0,0,0,26834,0,1070,0,246,15,481,256,0,19738,907,0,10,0,72,0,0,168,0,12,9,1,0,3,0,2,63,1,1,79,0,29,96,81685,64272,51584,478,24,0,0,0,114,0,3,141,228,1585,1938,28406,1,0,0,27525,4,979,0,355,34,465,613,0,20469,13118,0,10,0,74,0,2,192,1,10,10,2,0,3,0,1,37,1,0,93,0,38,137,97610,50638,49293,129056,66928,1382,0,175,0,178888,91377,16266,102663,79836,162919,297,34325,106815,75644,11539,3163,380,84398,55196,45180,10188,2579,background,ugr16
3,1,192892,4648,8,193330,4203,1498,0,0,0,0,0,0,94,326,177,74,2298,28982,0,0,0,27477,0,1033,0,281,12,371,211,0,19418,914,0,11,0,61,0,0,157,0,14,11,4,1,1,0,0,57,2,2,78,0,13,145,83559,61728,52254,484,21,0,0,0,93,1,51,338,217,2509,2450,29609,1,0,0,28254,1,947,0,388,29,359,561,0,19957,12540,0,13,0,66,0,0,180,0,12,7,4,1,1,0,0,38,2,1,90,0,20,176,99858,48314,49369,129391,66620,1383,0,147,0,178640,91110,15746,102964,79633,166523,333,30685,106919,75050,11980,3232,360,86546,53336,45024,9895,2740,background,ugr16
4,2,194608,2931,13,193276,4252,1748,3,0,0,0,0,0,16,602,181,66,2076,28057,0,0,0,26810,0,1288,0,278,14,466,254,0,19734,969,0,12,0,43,0,0,168,2,12,104,6,2,1,0,0,59,1,2,95,0,5,108,82993,63069,51479,457,23,0,0,1,119,4,15,604,222,1387,2202,28931,0,0,0,27556,0,1183,0,367,30,484,652,1,20174,13091,0,13,0,48,0,0,178,3,11,3873,4,0,0,1,0,36,1,1,115,1,7,156,98631,50656,48254,132472,63281,1622,0,166,0,175110,90033,16008,103917,77763,163925,322,33294,109516,72720,11652,3265,388,89566,50115,45476,9856,2528,background,ugr16
5,0,194010,3531,16,192995,4530,1532,0,0,0,0,0,0,4,298,183,84,2132,28883,0,0,0,28185,0,1279,0,265,15,479,216,0,19054,962,0,18,0,38,0,1,221,3,13,357,3,3,1,1,2,47,0,1,82,0,3,115,84019,61151,52371,478,24,0,0,1,111,2,4,316,245,998,2307,29555,0,1,0,29169,0,1184,1,383,32,486,605,1,19850,13359,0,20,0,39,0,2,213,4,10,349,5,3,1,1,2,28,2,1,99,0,8,146,100235,47340,49966,130888,65090,1394,0,169,0,178888,91869,16648,102618,79817,165306,307,31928,106788,75500,11798,3067,388,87605,51204,45913,10314,2505,background,ugr16
6,10,193571,3960,24,193084,4433,1650,0,0,0,0,0,0,0,96,156,74,2580,29364,0,0,0,26317,0,1384,0,308,16,503,225,0,18455,963,0,10,0,26,1,0,136,1,13,135,1,2,1,1,0,41,1,0,83,2,31,113,82452,64159,50930,509,22,1,0,1,106,2,0,109,194,1081,2733,29977,0,0,0,26787,0,1289,2,430,32,488,585,1,18888,13540,0,10,0,30,0,0,164,3,10,130,2,0,1,1,0,25,3,0,100,1,36,164,97856,51053,48632,126440,69421,1524,0,156,1,177977,87000,15923,98800,74955,166746,357,30438,111781,70944,11309,3140,367,88329,54551,42578,9526,2557,background,ugr16
7,2,193554,3985,10,193238,4293,1448,2,0,0,0,0,0,6,450,146,128,2726,28617,0,0,0,24494,0,1337,0,278,11,514,251,0,18898,1069,0,15,0,19,0,0,215,2,15,6,6,4,4,0,2,64,4,0,70,0,11,82,80808,64591,52142,474,24,1,0,1,114,2,6,447,216,5539,2865,29792,1,0,0,25158,1,1224,2,380,32,525,655,1,19104,13447,0,15,0,27,0,0,231,3,14,2,4,2,2,0,1,39,6,0,95,0,6,119,101032,48562,47947,130973,65092,1329,0,147,0,171936,85775,15775,103612,73877,161412,251,35878,112832,69883,11295,3169,362,93188,50230,42442,9168,2513,background,ugr16
8,47,194206,3288,20,193080,4441,1895,1,0,1,0,0,0,16,229,201,162,2397,29942,0,0,0,26231,0,1571,0,270,10,481,240,0,19141,971,0,21,0,37,0,0,203,0,12,17,2,1,2,3,1,53,2,0,83,0,6,82,84206,62062,51273,549,24,1,1,0,131,0,11,253,250,2132,2507,30717,0,0,1,26605,1,1462,3,391,26,489,603,3,19180,13598,0,21,0,48,1,2,204,1,12,9,3,1,1,3,1,37,2,0,106,0,7,113,100038,49676,47827,129724,65892,1750,0,175,0,176074,89542,16096,102394,77456,167916,376,29249,109117,72780,11776,3479,389,91234,50139,44060,9380,2728,background,ugr16
9,2,193935,3604,9,192941,4591,1680,1,0,1,0,0,0,33,244,191,110,2304,30648,0,0,0,26298,0,1642,0,323,14,463,221,0,18953,996,0,12,0,20,0,0,153,3,14,22,2,2,4,1,3,54,0,0,90,1,8,114,84522,60878,52141,525,23,0,1,0,110,2,31,265,252,1284,2497,31237,1,0,0,26819,1,1523,1,437,32,471,628,1,19050,13909,0,12,0,31,0,1,163,5,11,12,3,1,4,1,2,30,0,0,107,1,6,140,100140,47483,49918,127775,68058,1556,0,152,0,178021,88980,16060,100607,78059,166208,336,30997,109491,72747,11669,3261,373,89570,52313,43898,9139,2621,background,ugr16


In [12]:
data20.describe()

Unnamed: 0,srcip_private,srcip_public,srcip_default,dstip_private,dstip_public,dstip_default,sport_zero,sport_multiplex,sport_echo,sport_discard,sport_daytime,sport_quote,sport_chargen,sport_ftp_data,sport_ftp_control,sport_ssh,sport_telnet,sport_smtp,sport_dns,sport_bootp,sport_gopher,sport_finger,sport_http,sport_kerberos,sport_pop3,sport_nntp,sport_ntp,sport_netbios,sport_imap4,sport_snmp,sport_ldap,sport_https,sport_mds,sport_kpasswd,sport_smtp_ssl,sport_syslog,sport_smtp2,sport_ldaps,sport_cups,sport_imap4_ssl,sport_socks,sport_openvpn,sport_mssql,sport_citrix,sport_oracle,sport_rapservice,sport_msnmessenger,sport_mgc,sport_mysql,sport_metasploit,sport_emule,sport_xmpp,sport_irc,sport_bittorrent,sport_http2,sport_reserved,sport_register,sport_private,dport_zero,dport_multiplex,dport_echo,dport_discard,dport_daytime,dport_quote,dport_chargen,dport_ftp_data,dport_ftp_control,dport_ssh,dport_telnet,dport_smtp,dport_dns,dport_bootp,dport_gopher,dport_finger,dport_http,dport_kerberos,dport_pop3,dport_nntp,dport_ntp,dport_netbios,dport_imap4,dport_snmp,dport_ldap,dport_https,dport_mds,dport_kpasswd,dport_smtp_ssl,dport_syslog,dport_smtp2,dport_ldaps,dport_cups,dport_imap4_ssl,dport_socks,dport_openvpn,dport_mssql,dport_citrix,dport_oracle,dport_rapservice,dport_msnmessenger,dport_mgc,dport_mysql,dport_metasploit,dport_emule,dport_xmpp,dport_irc,dport_bittorrent,dport_http2,dport_reserved,dport_register,dport_private,protocol_tcp,protocol_udp,protocol_icmp,protocol_igmp,protocol_other,tcpflags_URG,tcpflags_ACK,tcpflags_PSH,tcpflags_RST,tcpflags_SYN,tcpflags_FIN,srctos_zero,srctos_192,srctos_other,npackets_verylow,npackets_low,npackets_medium,npackets_high,npackets_veryhigh,nbytes_verylow,nbytes_low,nbytes_medium,nbytes_high,nbytes_veryhigh
count,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0,40530.0
mean,7.141574,57375.112633,1347.903627,11.759487,57238.2528,1480.145547,772.780977,0.462892,0.071873,0.046089,0.058599,0.050185,0.059265,2.776215,264.965112,108.923119,49.32529,593.184752,6401.023193,0.04266,0.04942,0.058475,7823.560523,0.089711,334.673501,0.049494,99.917222,10.638687,138.36753,98.924525,0.258154,5899.257094,375.133753,0.040538,4.600518,0.183222,14.141451,0.059561,0.047076,53.72756,0.593807,3.289415,14.544288,0.827831,0.383987,0.30565,0.308019,0.387589,59.784135,0.380311,0.516556,30.772366,0.341303,134.228078,58.637306,23126.459092,20628.293881,14975.404639,224.912386,11.898668,0.556304,0.242808,0.444313,4.084851,7.791932,3.013422,273.241969,261.691611,1094.251937,672.945004,6653.178584,0.308389,0.435159,0.889835,7998.285023,1.382112,321.642561,0.659906,159.780459,27.960844,138.21621,263.581545,1.339576,5978.837133,5183.245695,0.11774,6.339477,0.269603,15.583099,0.493215,0.454133,53.971182,4.977819,2.311498,77.031532,0.66188,1.380484,0.225463,0.327165,0.511448,67.556181,0.989119,0.52946,26.992203,0.569578,144.914014,113.027288,29761.293634,16013.76906,12954.591266,42140.993758,15773.350703,713.777449,7.4e-05,102.03585,0.306094,49958.002517,26704.175672,5691.098988,32623.033087,22449.998988,48213.277646,117.570318,10374.524032,32846.602344,22429.829657,2676.441056,685.482704,91.80185,27236.426844,15899.271971,12870.655662,2199.613718,524.192475
std,24.392182,88190.962103,2314.787922,13.113842,87976.099006,2338.373052,1235.342329,1.534254,1.241673,0.678061,0.706781,0.679842,0.841121,15.403653,540.535474,247.992585,222.312218,1178.708449,10027.953622,0.405471,0.695701,0.793728,12196.08908,1.679771,536.280033,0.703683,157.586461,57.430348,221.141159,157.405831,3.183484,9162.9801,596.688904,0.658839,9.724149,0.843256,55.406561,0.678613,0.667652,88.31719,2.366523,7.256922,105.811515,2.241411,1.29417,0.785167,0.991547,1.363152,201.272043,2.724052,1.446142,56.899958,2.712451,320.127014,122.613823,35627.483639,31818.082747,23132.545004,359.125233,20.286299,9.783153,8.712149,3.22167,19.086751,95.962676,24.524352,551.509551,799.69564,1867.840177,1312.230653,10416.854879,2.44438,9.113939,30.996026,12475.470623,35.360491,523.508962,6.849579,277.522545,142.091748,218.740649,428.126853,9.243637,9287.285622,8159.520529,2.320823,29.629364,2.415944,66.203056,6.110214,3.256317,93.866921,101.77588,5.281151,682.239643,2.816593,47.845689,0.665571,2.423381,6.384907,372.019883,39.074741,2.675872,43.390662,2.849617,334.113411,351.371746,45787.959961,24763.976878,20042.79449,64927.492962,24590.396547,1143.890453,0.011107,119.724257,1.739099,76930.206883,41259.931513,9045.048893,50316.836052,34694.561118,74185.723878,185.356221,16119.448238,50656.43526,34599.782354,4282.750445,1091.372894,145.970241,42058.181495,24561.524972,19899.796248,3543.513293,851.123286
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,1.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,2.0,1.0,1.0,1.0,0.0,2.0,1.0,2.0,0.0,0.0
50%,7.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,1.0,4.0,6.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,4.0,3.0,4.0,3.0,1.0,4.0,4.0,4.0,2.0,0.0
75%,7.0,192358.75,3429.0,17.0,191961.75,4375.0,2027.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,636.0,184.0,92.0,1138.0,18094.75,0.0,0.0,0.0,23656.0,0.0,925.0,0.0,274.0,7.0,376.0,263.0,0.0,17685.75,1000.0,0.0,10.0,0.0,16.0,0.0,0.0,137.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,42.0,0.0,0.0,72.0,0.0,68.0,116.0,74843.75,63811.0,47298.75,563.0,28.0,0.0,0.0,0.0,0.0,0.0,1.0,654.0,263.0,2236.0,1330.0,18754.25,0.0,0.0,0.0,24159.0,0.0,898.0,0.0,415.0,33.0,381.0,664.0,0.0,17915.5,14500.75,0.0,12.0,0.0,17.0,0.0,0.0,135.0,1.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,26.0,0.0,0.0,67.0,0.0,86.0,184.0,97475.25,49907.0,39545.75,134829.5,46314.0,1874.0,0.0,181.0,0.0,163311.0,84281.75,16659.0,103914.75,69681.5,158778.25,323.0,31418.5,104096.25,71484.75,6117.0,1570.0,228.0,85615.5,50183.0,39927.75,4951.75,1109.0
max,4255.0,197496.0,33996.0,527.0,197453.0,44452.0,10762.0,54.0,202.0,30.0,32.0,30.0,36.0,560.0,12338.0,4458.0,29646.0,39264.0,35337.0,12.0,35.0,52.0,76490.0,283.0,5939.0,30.0,692.0,1384.0,3295.0,1073.0,66.0,26625.0,2140.0,29.0,562.0,30.0,1173.0,30.0,31.0,2825.0,134.0,43.0,5023.0,32.0,55.0,8.0,30.0,156.0,5654.0,504.0,30.0,3947.0,177.0,2305.0,2898.0,99299.0,138183.0,166371.0,2034.0,178.0,1281.0,932.0,82.0,213.0,5080.0,3846.0,12364.0,10539.0,31744.0,39635.0,52911.0,74.0,918.0,4351.0,93613.0,4845.0,10679.0,400.0,5475.0,7430.0,3263.0,5275.0,528.0,29138.0,25107.0,82.0,4405.0,82.0,4521.0,346.0,80.0,4691.0,3937.0,195.0,60509.0,82.0,5046.0,9.0,84.0,394.0,9076.0,3868.0,82.0,209.0,129.0,2300.0,16456.0,123795.0,165479.0,81403.0,196546.0,130094.0,10591.0,2.0,464.0,102.0,185756.0,107013.0,84077.0,193603.0,93729.0,183119.0,1038.0,118501.0,196441.0,127088.0,15272.0,6283.0,732.0,195892.0,126840.0,54550.0,13132.0,3734.0


In [13]:
data20['dataset'].value_counts()

nsl-kdd      20561
ugr16        12059
unsw-nb15     7910
Name: dataset, dtype: int64

In [14]:
data20['outcome'].value_counts()

background    19038
dos           17098
scan           4394
Name: outcome, dtype: int64

We wonder why does NSL-KDD dataset have more entries than ugr16 and unsw-nb15, which are both much bigger datasets.

In [15]:
data20.nunique()

srcip_private       142
srcip_public       3927
srcip_default      3906
dstip_private       137
dstip_public       2862
                   ... 
nbytes_medium      8014
nbytes_high        6178
nbytes_veryhigh    2163
outcome               3
dataset               3
Length: 136, dtype: int64

In [16]:
# Ok, so now, NSL-KDD does not include any information about the packets. Will it be all 0s?
data20[data20['dataset'] == 'nsl-kdd'][['npackets_low', 'npackets_medium', 'npackets_high', 'npackets_veryhigh']].nunique()

npackets_low         6
npackets_medium      7
npackets_high        6
npackets_veryhigh    5
dtype: int64

In [17]:
# Example values of NSL-KDD npackets-low
data20[data20['dataset'] == 'nsl-kdd']['npackets_low'].unique()

array([0, 1, 2, 3, 4, 5])

Apparently not, where does the data come from?

In [18]:
# Also, how do we want to get flags?
data20[data20['dataset'] == 'nsl-kdd'][['tcpflags_URG', 'tcpflags_ACK', 'tcpflags_PSH', 'tcpflags_RST', 'tcpflags_SYN', 'tcpflags_FIN']].nunique()

tcpflags_URG    1
tcpflags_ACK    1
tcpflags_PSH    1
tcpflags_RST    1
tcpflags_SYN    1
tcpflags_FIN    1
dtype: int64

In [19]:
# How does it look at the data level?
data20[data20['dataset'] == 'nsl-kdd'][['tcpflags_URG', 'tcpflags_ACK', 'tcpflags_PSH', 'tcpflags_RST', 'tcpflags_SYN', 'tcpflags_FIN']].head()

Unnamed: 0,tcpflags_URG,tcpflags_ACK,tcpflags_PSH,tcpflags_RST,tcpflags_SYN,tcpflags_FIN
19969,0,0,0,0,0,0
19970,0,0,0,0,0,0
19971,0,0,0,0,0,0
19972,0,0,0,0,0,0
19973,0,0,0,0,0,0


In this case, it seems that there is only one value for all NSL-KDD entries - 0s

In [20]:
# Similarly, source ports should be affected too
sport_fields = [field for field in data20.columns if 'sport_' in field]

data20[data20['dataset'] == 'nsl-kdd'][sport_fields].nunique()

sport_zero            1
sport_multiplex       1
sport_echo            1
sport_discard         1
sport_daytime         1
sport_quote           1
sport_chargen         1
sport_ftp_data        1
sport_ftp_control     1
sport_ssh             1
sport_telnet          1
sport_smtp            1
sport_dns             1
sport_bootp           1
sport_gopher          1
sport_finger          1
sport_http            1
sport_kerberos        1
sport_pop3            1
sport_nntp            1
sport_ntp             1
sport_netbios         1
sport_imap4           1
sport_snmp            1
sport_ldap            1
sport_https           1
sport_mds             1
sport_kpasswd         1
sport_smtp_ssl        1
sport_syslog          1
sport_smtp2           1
sport_ldaps           1
sport_cups            1
sport_imap4_ssl       1
sport_socks           2
sport_openvpn         2
sport_mssql           2
sport_citrix          2
sport_oracle          2
sport_rapservice      2
sport_msnmessenger    2
sport_mgc       

In [21]:
# Similarly, source ports should be affected too
dport_fields = [field for field in data20.columns if 'dport_' in field]

data20[data20['dataset'] == 'nsl-kdd'][dport_fields].nunique()

dport_zero            1
dport_multiplex       1
dport_echo            3
dport_discard         3
dport_daytime         3
dport_quote           1
dport_chargen         1
dport_ftp_data        5
dport_ftp_control     4
dport_ssh             3
dport_telnet          6
dport_smtp            5
dport_dns             3
dport_bootp           1
dport_gopher          3
dport_finger          4
dport_http            8
dport_kerberos        1
dport_pop3            1
dport_nntp            3
dport_ntp             3
dport_netbios         4
dport_imap4           3
dport_snmp            1
dport_ldap            3
dport_https           3
dport_mds             1
dport_kpasswd         1
dport_smtp_ssl        1
dport_syslog          2
dport_smtp2           1
dport_ldaps           1
dport_cups            1
dport_imap4_ssl       1
dport_socks           1
dport_openvpn         1
dport_mssql           1
dport_citrix          1
dport_oracle          1
dport_rapservice      1
dport_msnmessenger    1
dport_mgc       

These values are results of the specific feature extraction method: Feature as a Counter (FaaC) durter described in the associated paper. These features are not general and hardly representative for general traffic due to the specificity of this algorithm. Therefore, we will need to **exclude the dataset from the survey** due to being heavily preprocessed and focused on data aggregation rather than providing a new dataset.