# N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders
## Yair Meidan, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Dominik Breitenbacher, Asaf Shabtai, and Yuval Elovici
## IEEE PERVASIVE COMPUTING, VOL. 13, NO. 9, JULY-SEPTEMBER 2018
## [Dataset available](http://archive.ics.uci.edu/ml/datasets/detection_of_IoT_botnet_attacks_N_BaIoT#)

## Data Sources
- Traffic Collected from 2 separate Botnet Attacks across 9 devices
 - Mirai
    - Danmini Doorbell
    - Ecobee Thermostat
    - Ennio Dorbell
    - Philips B120N10 Baby Monitor
    - Provision PT_737E Security Camera
    - Provision PT_838 Security Camera
    - Simplehome XCS_1002_WHT Security Camera
    - Simplehome XCS_1003_WHT Security Camera
 - Bashlite
    - Danmini Doorbell
    - Ecobee Thermostat
    - Philips B120N10 Baby Monitor
    - Provision PT_737E Security Camera
    - Provision PT_838 Security Camera
    - Samsung SNH_1011_N Webcam
    - Simplehome XCS_1002_WHT Security Camera
    - Simplehome XCS_1003_WHT Security Camera

## Attribute Description (From "N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders")
- The following describes each of the features headers:
 - Stream aggregation:
	- H: ("Source IP" in N-BaIoT paper) Stats summarizing the recent traffic from this packet's host (IP)
	- MI: ("Source MAC-IP" in N-BaIoT paper) Stats summarizing the recent traffic from this packet's host (IP + MAC)
	- HH: ("Channel" in N-BaIoT paper) Stats summarizing the recent traffic going from this packet's host (IP) to the 
	packet's destination host.
	- HH_jit: ("Channel jitter" in N-BaIoT paper) Stats summarizing the jitter of the traffic going from this packet's
	 host (IP) to the packet's destination host.
	- HpHp: ("Socket" in N-BaIoT paper) Stats summarizing the recent traffic going from this packet's host+port (IP) 
	to the packet's destination host+port. Example 192.168.4.2:1242 -> 192.168.4.12:80
- Time-frame (The decay factor Lambda used in the damped window): 
	- How much recent history of the stream is capture in these statistics
	- L5, L3, L1, L0.1 and L0.01
- The statistics extracted from the packet stream:
	- weight: The weight of the stream (can be viewed as the number of items observed in recent history)
	- mean: ...
	- std: ...
	- radius: The root squared sum of the two streams' variances
	- magnitude: The root squared sum of the two streams' mea  ns 
	- cov: An approximated covariance between two streams
	- pcc: An approximated correlation coefficient between two streams   

In [1]:
import dask.dataframe as dd

In [9]:
bashlite_devices = ['danmini_doorbell', 'ecobee_thermostat', 'ennio_doorbell', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera', 'samsung_SNH_1011_N_webcam',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

mirai_devices = ['danmini_doorbell', 'ecobee_thermostat', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

benign = ['danmini_doorbell', 'ecobee_thermostat', 'ennio_doorbell', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera', 'samsung_SNH_1011_N_webcam',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

bashlite_attacks = ['scan', # scanning a network for vulnerable devices
                    'junk', # sending spam data
                    'udp', # udp flood
                    'tcp', # tcp flood
                    'combo'] # sending spam data and opening connection to specific ip/port

mirai_attacks = ['ack', # automatic scan for vulnerable devices
                 'scan', # ack flood
                 'syn', # syn flood
                 'udp', # udp flood
                 'udpplain'] # optimized udp flood

In [10]:
bashlite_attack_data = {}
for device in bashlite_devices:
    for attack in bashlite_attacks:
        bashlite_attack_data[device] = {attack: dd.read_csv(f'../data/n_balo_t/{device}/gafgyt_attacks/{attack}.csv')}

mirai_attack_data = {}
for device in mirai_devices:
    for attack in mirai_attacks:
        mirai_attack_data[device] = {attack: dd.read_csv(f'../data/n_balo_t/{device}/mirai_attacks/{attack}.csv')}
        
benign_traffic_data = {}
for device in benign:
    benign_traffic_data[device] = dd.read_csv(f'../data/n_balo_t/{device}/benign_traffic.csv')

In [12]:
(bashlite_attack_data['danmini_doorbell']['combo']).head()

Unnamed: 0,MI_dir_L5_weight,MI_dir_L5_mean,MI_dir_L5_variance,MI_dir_L3_weight,MI_dir_L3_mean,MI_dir_L3_variance,MI_dir_L1_weight,MI_dir_L1_mean,MI_dir_L1_variance,MI_dir_L0.1_weight,...,HpHp_L0.1_radius,HpHp_L0.1_covariance,HpHp_L0.1_pcc,HpHp_L0.01_weight,HpHp_L0.01_mean,HpHp_L0.01_std,HpHp_L0.01_magnitude,HpHp_L0.01_radius,HpHp_L0.01_covariance,HpHp_L0.01_pcc
0,1.0,98.0,0.0,1.0,98.0,0.0,1.0,98.0,0.0,1.0,...,0.0,0.0,0.0,1.0,98.0,0.0,98.0,0.0,0.0,0.0
1,1.029,98.0,1.818989e-12,1.11952,98.0,0.0,1.492583,98.0,3.637979e-12,1.93164,...,1.818989e-12,0.0,0.0,1.992944,98.0,1e-06,138.592929,1.818989e-12,0.0,0.0
2,1.504156,76.725612,228.1808,1.729662,79.499272,249.746357,2.294102,84.051188,251.7926,2.904273,...,0.0,0.0,0.0,1.0,66.0,0.0,114.856432,0.0,0.0,0.0
3,2.460087,75.617679,137.22,2.699075,77.461807,164.269331,3.280499,80.987267,196.4467,3.902546,...,0.0,0.0,0.0,1.0,74.0,0.0,74.0,0.0,0.0,0.0
4,3.460055,75.150149,98.09937,3.699054,76.525944,122.224798,4.28049,79.354915,159.2943,4.902545,...,0.0,0.0,0.0,1.0,74.0,0.0,74.0,0.0,0.0,0.0
