# N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders
## Yair Meidan, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Dominik Breitenbacher, Asaf Shabtai, and Yuval Elovici
## IEEE PERVASIVE COMPUTING, VOL. 13, NO. 9, JULY-SEPTEMBER 2018
## [Dataset available](http://archive.ics.uci.edu/ml/datasets/detection_of_IoT_botnet_attacks_N_BaIoT#)

## Data Sources
- Traffic Collected from 2 separate Botnet Attacks across 9 devices
 - Mirai
    - Danmini Doorbell
    - Ecobee Thermostat
    - Ennio Dorbell
    - Philips B120N10 Baby Monitor
    - Provision PT_737E Security Camera
    - Provision PT_838 Security Camera
    - Simplehome XCS_1002_WHT Security Camera
    - Simplehome XCS_1003_WHT Security Camera
 - Bashlite
    - Danmini Doorbell
    - Ecobee Thermostat
    - Philips B120N10 Baby Monitor
    - Provision PT_737E Security Camera
    - Provision PT_838 Security Camera
    - Samsung SNH_1011_N Webcam
    - Simplehome XCS_1002_WHT Security Camera
    - Simplehome XCS_1003_WHT Security Camera

## Attribute Description (From "N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders")
- The following describes each of the features headers:
 - Stream aggregation:
	- H: ("Source IP" in N-BaIoT paper) Stats summarizing the recent traffic from this packet's host (IP)
	- MI: ("Source MAC-IP" in N-BaIoT paper) Stats summarizing the recent traffic from this packet's host (IP + MAC)
	- HH: ("Channel" in N-BaIoT paper) Stats summarizing the recent traffic going from this packet's host (IP) to the 
	packet's destination host.
	- HH_jit: ("Channel jitter" in N-BaIoT paper) Stats summarizing the jitter of the traffic going from this packet's
	 host (IP) to the packet's destination host.
	- HpHp: ("Socket" in N-BaIoT paper) Stats summarizing the recent traffic going from this packet's host+port (IP) 
	to the packet's destination host+port. Example 192.168.4.2:1242 -> 192.168.4.12:80
- Time-frame (The decay factor Lambda used in the damped window): 
	- How much recent history of the stream is capture in these statistics
	- L5, L3, L1, L0.1 and L0.01
- The statistics extracted from the packet stream:
	- weight: The weight of the stream (can be viewed as the number of items observed in recent history)
	- mean: ...
	- std: ...
	- radius: The root squared sum of the two streams' variances
	- magnitude: The root squared sum of the two streams' mea  ns 
	- cov: An approximated covariance between two streams
	- pcc: An approximated correlation coefficient between two streams   

In [1]:
import dask.dataframe as dd

In [2]:
bashlite_devices = ['danmini_doorbell', 'ecobee_thermostat', 'ennio_doorbell', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera', 'samsung_SNH_1011_N_webcam',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

mirai_devices = ['danmini_doorbell', 'ecobee_thermostat', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

benign = ['danmini_doorbell', 'ecobee_thermostat', 'ennio_doorbell', 'philips_B120N10_baby_monitor', 
            'provision_PT_737E_security_camera', 'provision_PT_838_security_camera', 'samsung_SNH_1011_N_webcam',
            'simplehome_XCS_1002_WHT_security_camera', 'simplehome_XCS_1003_WHT_security_camera']

bashlite_attacks = ['scan', # scanning a network for vulnerable devices
                    'junk', # sending spam data
                    'udp', # udp flood
                    'tcp', # tcp flood
                    'combo'] # sending spam data and opening connection to specific ip/port

mirai_attacks = ['ack', # automatic scan for vulnerable devices
                 'scan', # ack flood
                 'syn', # syn flood
                 'udp', # udp flood
                 'udpplain'] # optimized udp flood

In [3]:
bashlite_attack_data = {}
for device in bashlite_devices:
    for attack in bashlite_attacks:
        bashlite_attack_data[device] = {attack: dd.read_csv(f'../data/n_balo_t/{device}/gafgyt_attacks/{attack}.csv')}

mirai_attack_data = {}
for device in mirai_devices:
    for attack in mirai_attacks:
        mirai_attack_data[device] = {attack: dd.read_csv(f'../data/n_balo_t/{device}/mirai_attacks/{attack}.csv')}
        
benign_traffic_data = {}
for device in benign:
    benign_traffic_data[device] = dd.read_csv(f'../data/n_balo_t/{device}/benign_traffic.csv')

In [7]:
cols = (bashlite_attack_data['danmini_doorbell']['combo']).columns

In [9]:
for x in cols:
    print(x)

MI_dir_L5_weight
MI_dir_L5_mean
MI_dir_L5_variance
MI_dir_L3_weight
MI_dir_L3_mean
MI_dir_L3_variance
MI_dir_L1_weight
MI_dir_L1_mean
MI_dir_L1_variance
MI_dir_L0.1_weight
MI_dir_L0.1_mean
MI_dir_L0.1_variance
MI_dir_L0.01_weight
MI_dir_L0.01_mean
MI_dir_L0.01_variance
H_L5_weight
H_L5_mean
H_L5_variance
H_L3_weight
H_L3_mean
H_L3_variance
H_L1_weight
H_L1_mean
H_L1_variance
H_L0.1_weight
H_L0.1_mean
H_L0.1_variance
H_L0.01_weight
H_L0.01_mean
H_L0.01_variance
HH_L5_weight
HH_L5_mean
HH_L5_std
HH_L5_magnitude
HH_L5_radius
HH_L5_covariance
HH_L5_pcc
HH_L3_weight
HH_L3_mean
HH_L3_std
HH_L3_magnitude
HH_L3_radius
HH_L3_covariance
HH_L3_pcc
HH_L1_weight
HH_L1_mean
HH_L1_std
HH_L1_magnitude
HH_L1_radius
HH_L1_covariance
HH_L1_pcc
HH_L0.1_weight
HH_L0.1_mean
HH_L0.1_std
HH_L0.1_magnitude
HH_L0.1_radius
HH_L0.1_covariance
HH_L0.1_pcc
HH_L0.01_weight
HH_L0.01_mean
HH_L0.01_std
HH_L0.01_magnitude
HH_L0.01_radius
HH_L0.01_covariance
HH_L0.01_pcc
HH_jit_L5_weight
HH_jit_L5_mean
HH_jit_L5_variance