# caso Practico : Random Forest
En este caso de uso practico se pretende resolver un problema de deteccion de malware en
dispositivos android mediante el analisis del trafico de red que genera el dispositivo 
medinte el uso de conjuntos de árboles de decisión 

### DataSet: Deteccion de Malware en Android

#### Descripcion
The sophisticated and advanced Android malware is able to identify the presence of the emulator used by the malware analyst and in response, alter its behavior to evade detection. To overcome this issue, we installed the Android applications on the real device and captured its network traffic. See our publicly available Android Sandbox.

CICAAGM dataset is captured by installing the Android apps on the real smartphones semi-automated. The dataset is generated from 1900 applications with the following three categories:

**1. Adware (250 apps)**
* Airpush: Designed to deliver unsolicited advertisements to the user’s systems for information stealing.
* Dowgin: Designed as an advertisement library that can also steal the user’s information.
* Kemoge: Designed to take over a user’s Android device. This adware is a hybrid of botnet and disguises itself as popular apps via repackaging.
* Mobidash: Designed to display ads and to compromise user’s personal information.
* Shuanet: Similar to Kemoge, Shuanet also is designed to take over a user’s device.

**2. General Malware (150 apps)**
* AVpass: Designed to be distributed in the guise of a Clock app.
* FakeAV: Designed as a scam that tricks user to purchase a full version of the software in order to re-mediate non-existing infections.
* FakeFlash/FakePlayer: Designed as a fake Flash app in order to direct users to a website (after successfully installed).
* GGtracker: Designed for SMS fraud (sends SMS messages to a premium-rate number) and information stealing.
* Penetho: Designed as a fake service (hacktool for Android devices that can be used to crack the WiFi password). The malware is also able to infect the user’s computer via infected email attachment, fake updates, external media and infected documents.

**3. Benign (1500 apps)**
* 2015 GooglePlay market (top free popular and top free new)
* 2016 GooglePlay market (top free popular and top free new)

### Ficheros de datos
* pcap files – the network traffic of both the malware and benign (20% malware and 80% benign)
* <span style="color:green">.csv files - the list of extracted network traffic features generated by the CIC-flowmeter</span>

### Descarga de los ficheros de datos
https://www.unb.ca/cic/datasets/android-adware.html

### Referencias adicionales sobre el conjunto de datos
_Arash Habibi Lashkari, Andi Fitriah A. Kadir, Hugo Gonzalez, Kenneth Fon Mbah and Ali A. Ghorbani, “Towards a Network-Based Framework for Android Malware Detection and Characterization”, In the proceeding of the 15th International Conference on Privacy, Security and Trust, PST, Calgary, Canada, 2017._

Imports

In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import f1_score

## Fuciones Auxiliares

In [15]:
# construccion de una funcion que realice el particionado completo 
def train_val_test_split(df, rstate = 42, shuffle = True, stratify = None):
    strat = df[stratify] if stratify else None
    train_set, test_set = train_test_split(
        df, test_size = 0.4, random_state =rstate, shuffle=shuffle, stratify=strat)
    strat = test_set[stratify] if stratify else  None
    val_set, test_set = train_test_split(
        test_set, test_size = 0.5, random_state = rstate, shuffle=shuffle, stratify=strat)
    return (train_set, val_set, test_set)

In [16]:
def remove_labels(df, label_name):
    X = df.drop(label_name, axis=1)
    y = df[label_name].copy()
    return (X, y) 

In [17]:
def evaluate_resul(y_pred, y, y_prep_pred, y_prep, metric):
    print(metric.__name__, "WITHOUT preparation:", metric(y_pred, y, average='weighted'))
    print(metric.__name__, "WITH preparation:", metric(y_prep_pred, y_prep, average='weighted'))

### 1.- Lectura del DataSet

In [18]:
df = pd.read_csv('datasets/datasets/TotalFeatures-ISCXFlowMeter.csv')

### 2.- visualizacion del DataSet

In [19]:
df.head(10)

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward,calss
0,1020586,668,1641,35692,2276876,52,52,679,1390,53.431138,1387.492992,25.850592,54.220784,1012372,1004743,1,2,33003,37595,1517.7991,612.648171,3430.256494,1961.288204,3,16,0,0,13360,32820,654.525929,1607.899775,2262.425704,2265922.0,52,1390,1001.545258,606.929295,-30,20674,442.194974,1208.029126,0,2,0,19,2308,0,0,0,63.792334,1001.545258,53.431138,13360,376,7,238318,2939.893396,1387,13535,9,4961514,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,4194240,1853440,1640,668,32,benign
1,80794,1,1,75,124,75,124,75,124,75.0,124.0,0.0,0.0,0,0,-1,-1,-1,-1,0.0,0.0,0.0,0.0,0,0,0,0,20,20,12.377157,12.377157,24.754313,2463.054,75,124,99.5,34.633799,80794,80794,80794.0,0.0,0,0,0,0,0,0,0,0,1.653333,99.5,75.0,20,0,0,0,0.0,124,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,0,0,0,1,0,benign
2,998,3,0,187,0,52,-1,83,-1,62.333333,0.0,17.883885,0.0,998,0,2,-1,996,-1,499.0,0.0,702.863429,0.0,1,0,0,0,60,0,3006.012024,0.0,3006.012024,187374.7,52,83,62.333333,17.883885,2,996,499.0,702.863429,0,0,1,1,3,0,0,0,0.0,62.333333,62.333333,60,0,0,0,0.0,0,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,4,101888,-1,0,3,32,benign
3,189868,9,9,1448,6200,52,52,706,1390,160.888889,688.888889,216.161944,665.690045,137870,161424,2,11,40746,71705,17233.75,20178.0,17780.63985,29411.67515,3,4,0,0,180,180,47.401353,47.401353,94.802705,40280.62,52,1390,424.888889,551.654345,2,51972,11168.70588,17180.91265,1,2,0,7,17,0,0,0,4.281768,424.888889,160.888889,180,334,4,8196,443143.2361,688,4201,4,2879369,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,4194240,2722560,8,9,32,benign
4,110577,4,6,528,1422,52,52,331,1005,132.0,237.0,134.065904,377.398728,82246,81939,3,12,80371,49456,27415.33333,16387.8,45870.47268,22885.04745,2,4,0,0,80,120,36.173888,54.260832,90.43472,17634.77,52,1005,195.0,296.747367,3,49456,12286.33333,18425.63881,0,0,0,6,10,0,0,0,2.693182,195.0,132.0,80,0,0,0,142429.8,237,1370,5,27150,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,155136,31232,5,4,32,benign
5,261876,7,6,1618,882,52,52,730,477,231.142857,147.0,290.112581,170.474045,261876,186534,2,454,95625,77201,43646.0,37306.8,39724.78194,27948.16201,3,2,0,0,140,120,26.730208,22.911607,49.641815,9546.503,52,730,192.307692,236.849771,2,72473,21823.0,22332.50694,0,2,0,5,12,0,0,0,0.545117,192.307692,231.142857,140,0,0,0,29061.4,147,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,4194240,926720,3,7,32,benign
6,14,2,0,104,0,52,-1,52,-1,52.0,0.0,1.0,0.0,14,0,14,-1,14,-1,14.0,0.0,0.0,0.0,0,0,0,0,40,0,142857.1429,0.0,142857.1429,7428571.0,52,52,52.0,1.0,14,14,14.0,0.0,0,0,0,0,2,0,0,0,0.0,52.0,52.0,40,0,0,0,0.0,0,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,3,5824,-1,0,2,32,benign
7,29675,1,1,71,213,71,213,71,213,71.0,213.0,0.0,0.0,0,0,-1,-1,-1,-1,0.0,0.0,0.0,0.0,0,0,0,0,20,20,33.698399,33.698399,67.396799,9570.345,71,213,142.0,100.404183,29675,29675,29675.0,0.0,0,0,0,0,0,0,0,0,3.0,142.0,71.0,20,0,0,0,0.0,213,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,2,0,0,0,1,0,benign
8,806635,4,0,239,0,52,-1,83,-1,59.75,0.0,15.489244,0.0,806635,0,5,-1,765503,-1,268878.3333,0.0,430580.7699,0.0,1,0,0,0,80,0,4.958872,0.0,4.958872,296.2926,52,83,59.75,15.489244,5,765503,268878.3333,430580.7699,1,0,0,1,4,0,0,0,0.0,59.75,59.75,80,239,4,296,0.0,0,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,5,107008,-1,0,4,32,benign
9,56620,3,2,1074,719,52,52,592,667,358.0,359.5,277.105575,434.869521,56620,33231,6,33231,56614,33231,28310.0,33231.0,40027.90066,0.0,2,1,0,0,60,40,52.984811,35.323207,88.308018,31667.26,52,667,358.6,292.698736,6,33231,14155.0,15940.48286,0,0,0,3,5,0,0,0,0.66946,358.6,358.0,60,0,0,0,189111.5,359,0,0,0,0,0,0,0,-1,0.0,-1,0.0,-1,0.0,-1,0.0,3,128512,10816,1,3,32,benign


In [20]:
df.describe()

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward
count,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0,631955.0
mean,21952450.0,6.728514,10.431934,954.0172,12060.42,141.475727,44.357688,263.675901,183.248084,174.959706,105.546055,53.014212,61.454557,16339850.0,7132276.0,11705090.0,937564.7,15302440.0,6284945.0,12757840.0,2047904.0,1769660.0,2240714.0,0.858734,1.057927,0.0,0.0,134.654176,208.6387,29744.04,1731.437342,31475.47,1682912.0,140.473502,340.888032,201.354935,84.865349,11678770.0,20790310.0,13320300.0,3396481.0,0.530555,0.429222,0.06706,1.91666,16.414992,0.0,0.0,0.0,1.243868,201.354935,174.959706,134.654176,52.822829,0.333408,228446.3,28367.889759,105.44723,454.686911,0.491646,214802.9,3.053119,422.1001,4.886468,5962.097,20040920.0,20372120.0,20813100.0,464565.5,19973270.0,20312280.0,20752380.0,466387.5,2.360896,962079.6,310451.9,9.733144,6.72471,19.965713
std,190057800.0,174.161354,349.424019,82350.4,482471.6,157.68088,89.099554,289.644383,371.863224,162.024811,206.667634,122.899076,156.816045,179861200.0,63327360.0,160711500.0,40812120.0,179771800.0,62916670.0,165425800.0,43769710.0,56307590.0,26560130.0,3.983274,13.346,0.0,0.0,3484.81722,6988.48,163336.1,22444.835598,164574.2,9239142.0,153.266293,383.732807,172.537279,159.429483,160707800.0,189967900.0,166939900.0,66584240.0,0.652985,0.911039,0.30462,16.44443,514.737161,0.0,0.0,0.0,5.01363,172.537279,162.024811,3484.81722,571.950886,3.274184,6517195.0,88098.474276,206.563178,2539.571052,2.146704,3557865.0,144.009491,66553.44,265.053479,366797.9,189809400.0,189800500.0,189984600.0,6192833.0,189798600.0,189790200.0,189972100.0,6199704.0,3.04181,1705655.0,664795.6,347.877923,174.13813,14.914261
min,-18.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.0,33.0,33.0,0.0,-181.0,-18.0,-18.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.0,0.0,0.0,0.0,0.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,2.0,-1.0,-1.0,0.0,0.0,0.0
25%,0.0,1.0,0.0,69.0,0.0,52.0,-1.0,52.0,-1.0,52.0,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0,0.0,0.0,0.0,0.0,0.0,52.0,52.0,52.0,0.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,52.0,52.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,2.0,0.0,-1.0,0.0,1.0,0.0
50%,24450.0,1.0,0.0,184.0,0.0,52.0,-1.0,83.0,-1.0,83.0,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.0,0.0,0.2597275,0.0,0.4213964,54.16667,52.0,159.0,108.0,1.0,3.0,23749.0,10096.08,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,108.0,83.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,-1.0,0.0,2.0,87616.0,-1.0,0.0,1.0,32.0
75%,1759751.0,3.0,1.0,427.0,167.0,108.0,52.0,421.0,115.0,356.0,104.25,15.489244,0.0,422751.5,0.0,13.0,-1.0,264001.0,-1.0,101250.9,0.0,10909.57,0.0,1.0,0.0,0.0,0.0,60.0,20.0,29.88375,15.898378,60.37554,6158.076,108.0,422.0,365.0,88.326965,17630.0,1306116.0,463002.5,18526.61,1.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,365.0,356.0,60.0,0.0,0.0,0.0,0.0,104.0,0.0,0.0,0.0,1.0,52.0,0.0,0.0,1051198.0,1480127.0,1492370.0,0.0,1013498.0,1291379.0,1306116.0,0.0,2.0,304640.0,90496.0,1.0,3.0,32.0
max,44310760000.0,48255.0,74768.0,40496440.0,103922200.0,1390.0,1390.0,1500.0,1390.0,1390.0,1390.0,954.593631,946.108345,44310760000.0,26789120000.0,26647710000.0,11541020000.0,44310720000.0,26789120000.0,26600000000.0,11500000000.0,31300000000.0,15500000000.0,933.0,2133.0,0.0,0.0,965100.0,1495360.0,2000000.0,1000000.0,2000000.0,1230000000.0,1390.0,1500.0,1390.0,954.593631,26647710000.0,44310720000.0,26600000000.0,31300000000.0,2.0,10.0,2.0,2486.0,123022.0,0.0,0.0,0.0,973.0,1390.0,1390.0,965100.0,95256.0,2268.0,645166700.0,895121.0,1390.0,126269.0,92.0,644300000.0,48255.0,40496440.0,74768.0,103922200.0,44310760000.0,44300000000.0,44310760000.0,847000000.0,44310720000.0,44300000000.0,44310720000.0,847000000.0,2269.0,4194240.0,4194240.0,74524.0,48255.0,44.0


In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 631955 entries, 0 to 631954
Data columns (total 80 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   duration                 631955 non-null  int64  
 1   total_fpackets           631955 non-null  int64  
 2   total_bpackets           631955 non-null  int64  
 3   total_fpktl              631955 non-null  int64  
 4   total_bpktl              631955 non-null  int64  
 5   min_fpktl                631955 non-null  int64  
 6   min_bpktl                631955 non-null  int64  
 7   max_fpktl                631955 non-null  int64  
 8   max_bpktl                631955 non-null  int64  
 9   mean_fpktl               631955 non-null  float64
 10  mean_bpktl               631955 non-null  float64
 11  std_fpktl                631955 non-null  float64
 12  std_bpktl                631955 non-null  float64
 13  total_fiat               631955 non-null  int64  
 14  tota

In [22]:
print("longitud del DataSet", len(df))
print("numerode columnas del DataSet", len(df.columns))

longitud del DataSet 631955
numerode columnas del DataSet 80


In [23]:
df['calss'].value_counts()

calss
benign            471597
asware            155613
GeneralMalware      4745
Name: count, dtype: int64

### Buscando correlaciones

In [24]:
# Trasformar la variable de salida a numerica para calcular correlaciones
X = df.copy()
X['calss'] = X['calss'].factorize()[0]
corr_matrix = X.corr()
corr_matrix['calss'].sort_values(ascending=False)

calss                     1.000000
flow_fin                  0.286175
min_seg_size_forward      0.258352
Init_Win_bytes_forward    0.129425
std_fpktl                 0.123758
                            ...   
furg_cnt                       NaN
burg_cnt                       NaN
flow_urg                       NaN
flow_cwr                       NaN
flow_ece                       NaN
Name: calss, Length: 80, dtype: float64

In [26]:
X.corr()

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward,calss
duration,1.000000,0.004837,0.004011,0.001673,0.003518,-0.064100,-0.027231,0.008761,0.042925,-0.043746,0.025117,0.039350,0.048743,0.943898,0.324705,0.841692,0.212482,0.943438,0.324226,0.918036,0.280637,0.421540,0.237307,0.021372,0.010780,,,0.004835,0.004011,-0.021033,-0.008908,-0.022090,-0.021032,-0.065256,0.020567,-0.032405,0.036942,0.841666,0.999457,0.949299,0.522119,0.068154,0.020298,-0.003951,0.013926,0.004439,,,,0.006775,-0.032405,-0.043746,0.004835,0.007375,0.009191,0.002252,0.042755,0.025105,0.022799,0.028002,0.006007,0.005759,0.001839,0.004587,0.003962,0.997972,0.998911,0.999465,0.047587,0.997952,0.998901,0.999458,0.047582,0.016532,0.027610,0.029712,0.003785,0.004838,0.082955,0.067066
total_fpackets,0.004837,1.000000,0.924622,0.425756,0.904007,-0.018958,0.005252,0.024685,0.086255,-0.007910,0.139142,0.010172,0.020324,0.002190,0.016479,-0.001975,-0.000651,-0.000420,0.008979,-0.001970,-0.000694,0.000037,0.000619,0.175476,0.174778,,,0.999838,0.924622,-0.004898,-0.002443,-0.005194,-0.003356,-0.019531,0.074042,0.098944,0.087741,-0.001972,0.002273,-0.002050,-0.000211,0.005461,0.045730,-0.004097,0.184351,0.965916,,,,0.249932,0.098944,-0.007910,0.999838,0.040468,0.073107,0.001703,0.012731,0.139151,0.163392,0.143763,0.030887,0.859868,0.350716,0.812197,0.794283,0.002195,0.002580,0.003261,0.018109,0.001258,0.001614,0.002267,0.017229,0.016089,0.050201,0.059224,0.902713,0.999866,0.018198,0.018377
total_bpackets,0.004011,0.924622,1.000000,0.156780,0.997268,-0.017667,0.006912,0.018170,0.086886,-0.016104,0.151761,0.002331,0.014005,0.001718,0.014774,-0.002172,-0.000542,-0.000714,0.007659,-0.002241,-0.000797,-0.000390,-0.000354,0.136986,0.207102,,,0.924199,1.000000,-0.005400,-0.001277,-0.005533,-0.003341,-0.017999,0.073497,0.109240,0.087307,-0.002169,0.001623,-0.002292,-0.000653,0.004894,0.043305,-0.004343,0.201262,0.991675,,,,0.306614,0.109240,-0.016104,0.924199,0.022299,0.059453,0.000605,0.006942,0.151778,0.183837,0.156579,0.033440,0.745095,0.124747,0.806157,0.804088,0.001495,0.001852,0.002592,0.017605,0.000610,0.000922,0.001617,0.016230,-0.000493,0.048190,0.058435,0.997580,0.924746,0.015124,0.019430
total_fpktl,0.001673,0.425756,0.156780,1.000000,0.090082,-0.003099,0.000803,0.021278,0.022088,0.022409,0.018954,0.011416,0.007763,0.000708,0.006442,-0.000746,-0.000185,-0.000389,0.003309,-0.000728,-0.000025,0.000059,0.000960,0.197453,0.055171,,,0.425588,0.156780,-0.001868,-0.000810,-0.001964,-0.001427,-0.003763,0.022072,0.027087,0.024298,-0.000744,0.000611,-0.000748,0.000119,-0.001464,0.012057,-0.001550,0.092604,0.250470,,,,0.026329,0.027087,0.022409,0.425588,0.055662,0.015117,0.001867,0.004947,0.018949,0.020708,0.027528,0.004007,0.342773,0.819266,0.132894,0.074529,0.000434,0.000650,0.000919,0.009627,0.000113,0.000335,0.000609,0.009896,0.001657,0.013283,0.015991,0.088422,0.425789,0.005477,0.000679
total_bpktl,0.003518,0.904007,0.997268,0.090082,1.000000,-0.014926,0.005966,0.012560,0.079905,-0.017328,0.146437,-0.003162,0.007768,0.001546,0.012610,-0.001820,-0.000564,-0.000548,0.006454,-0.001907,-0.001037,-0.000479,-0.001235,0.102561,0.176926,,,0.903594,0.997268,-0.004518,-0.001317,-0.004664,-0.002556,-0.015198,0.067389,0.106402,0.080666,-0.001816,0.001458,-0.001972,-0.000893,0.005112,0.039253,-0.003732,0.168433,0.982846,,,,0.303726,0.106402,-0.017328,0.903594,0.017390,0.057119,0.000042,0.002038,0.146457,0.179838,0.149651,0.032653,0.730320,0.070128,0.805913,0.808062,0.001410,0.001724,0.002409,0.015722,0.000544,0.000812,0.001452,0.014336,-0.000293,0.043571,0.053134,0.999616,0.904129,0.012139,0.019838
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Init_Win_bytes_backward,0.029712,0.059224,0.058435,0.015991,0.053134,-0.268444,0.038319,0.429893,0.593143,-0.030004,0.478823,0.532900,0.577954,-0.016421,0.137065,-0.033944,-0.004303,-0.016206,0.137262,-0.032466,0.023352,0.010546,0.115016,0.218447,0.106445,,,0.059186,0.058435,-0.084946,-0.029054,-0.088269,-0.082914,-0.273336,0.559119,0.131843,0.586355,-0.033903,0.029591,-0.030175,0.034652,0.204597,0.729790,-0.064202,0.139303,0.059540,,,,0.221086,0.131843,-0.030004,0.059186,0.153791,0.172238,0.047065,0.507734,0.478400,0.325269,0.386466,0.092437,0.039487,0.011656,0.039235,0.036000,0.025199,0.027367,0.029961,0.097958,0.024794,0.026959,0.029512,0.097316,-0.052507,0.811204,1.000000,0.056761,0.059242,0.333701,0.069405
RRT_samples_clnt,0.003785,0.902713,0.997580,0.088422,0.999616,-0.016659,0.006156,0.015727,0.084280,-0.017595,0.150252,0.000140,0.012050,0.001615,0.013810,-0.002037,-0.000548,-0.000644,0.007190,-0.002113,-0.000877,-0.000425,-0.000655,0.119724,0.202768,,,0.902300,0.997580,-0.005061,-0.001146,-0.005179,-0.003016,-0.016941,0.071197,0.108008,0.084725,-0.002033,0.001566,-0.002169,-0.000738,0.005338,0.042069,-0.004112,0.193563,0.982622,,,,0.306685,0.108008,-0.017595,0.902300,0.018277,0.058546,0.000335,0.005591,0.150270,0.183046,0.154268,0.033313,0.727248,0.068380,0.803583,0.805652,0.001474,0.001808,0.002520,0.016599,0.000605,0.000893,0.001560,0.015200,-0.000437,0.046784,0.056761,1.000000,0.902834,0.014299,0.019679
Act_data_pkt_forward,0.004838,0.999866,0.924746,0.425789,0.904129,-0.018947,0.005264,0.024705,0.086278,-0.007893,0.139172,0.010182,0.020335,0.002190,0.016484,-0.001974,-0.000651,-0.000418,0.008983,-0.001968,-0.000693,0.000037,0.000621,0.175504,0.174803,,,0.999409,0.924746,-0.004894,-0.002441,-0.005191,-0.003353,-0.019520,0.074069,0.098977,0.087764,-0.001971,0.002276,-0.002048,-0.000210,0.005479,0.045747,-0.004092,0.184379,0.966045,,,,0.249971,0.098977,-0.007893,0.999409,0.037043,0.058843,0.001704,0.012739,0.139180,0.163417,0.143788,0.030892,0.859984,0.350763,0.812306,0.794389,0.002198,0.002582,0.003264,0.018113,0.001261,0.001617,0.002269,0.017233,0.000734,0.050220,0.059242,0.902834,1.000000,0.018229,0.018391
min_seg_size_forward,0.082955,0.018198,0.015124,0.005477,0.012139,-0.686154,-0.189824,-0.074763,0.217989,-0.524024,0.122226,0.301237,0.276055,0.065124,0.082773,0.051853,0.015670,0.061285,0.073755,0.055151,0.033687,0.023564,0.063637,0.165214,0.053999,,,0.018157,0.015124,0.066403,-0.100929,0.052139,0.053846,-0.727837,-0.012360,-0.501759,0.219346,0.051763,0.079417,0.056882,0.037374,0.551140,0.370962,0.064160,0.083844,0.017223,,,,0.031576,-0.501759,-0.524024,0.018157,0.062537,0.069962,0.024525,0.231281,0.121997,0.126253,0.160884,0.044876,0.009066,0.004172,0.006480,0.004878,0.077047,0.078161,0.079542,0.048520,0.076794,0.077943,0.079324,0.048803,0.052177,0.394743,0.333701,0.014299,0.018229,1.000000,0.258352


In [28]:
# Se puede llegar a valorar quedarse con aquellas con mayor correlacion
corr_matrix[corr_matrix['calss']>0.05]

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward,calss
duration,1.0,0.004837,0.004011,0.001673,0.003518,-0.0641,-0.027231,0.008761,0.042925,-0.043746,0.025117,0.03935,0.048743,0.943898,0.324705,0.841692,0.212482,0.943438,0.324226,0.918036,0.280637,0.42154,0.237307,0.021372,0.01078,,,0.004835,0.004011,-0.021033,-0.008908,-0.02209,-0.021032,-0.065256,0.020567,-0.032405,0.036942,0.841666,0.999457,0.949299,0.522119,0.068154,0.020298,-0.003951,0.013926,0.004439,,,,0.006775,-0.032405,-0.043746,0.004835,0.007375,0.009191,0.002252,0.042755,0.025105,0.022799,0.028002,0.006007,0.005759,0.001839,0.004587,0.003962,0.997972,0.998911,0.999465,0.047587,0.997952,0.998901,0.999458,0.047582,0.016532,0.02761,0.029712,0.003785,0.004838,0.082955,0.067066
max_bpktl,0.042925,0.086255,0.086886,0.022088,0.079905,-0.277317,0.275923,0.492194,1.0,-0.018358,0.895712,0.564243,0.941626,-0.005961,0.17412,-0.035963,-0.00339,-0.009878,0.162721,-0.032412,0.031298,0.021591,0.137169,0.29476,0.16771,,,0.086204,0.086886,-0.089801,-0.014859,-0.091152,-0.081798,-0.282852,0.851572,0.38727,0.911584,-0.035917,0.038796,-0.03118,0.042461,0.035163,0.537187,-0.040593,0.207509,0.087864,,,,0.443414,0.38727,-0.018358,0.086204,0.162106,0.25406,0.064046,0.900543,0.89556,0.557349,0.653017,0.189258,0.056164,0.015515,0.056596,0.05232,0.033057,0.035932,0.039286,0.128193,0.032544,0.035413,0.038732,0.127548,-0.044916,0.586742,0.593143,0.08428,0.086278,0.217989,0.073212
mean_bpktl,0.025117,0.139142,0.151761,0.018954,0.146437,-0.280648,0.465208,0.342392,0.895712,-0.096195,1.0,0.406878,0.74487,-0.009298,0.127441,-0.03699,-0.001386,-0.012626,0.117203,-0.034037,0.018819,0.017698,0.085891,0.253869,0.176532,,,0.139067,0.151761,-0.092269,-0.007038,-0.092535,-0.081023,-0.285849,0.728284,0.431783,0.867507,-0.036954,0.021587,-0.033885,0.027641,-0.018607,0.413631,-0.048392,0.204764,0.149739,,,,0.648624,0.431783,-0.096195,0.139067,0.129846,0.288076,0.046447,0.710476,1.0,0.714473,0.742566,0.232429,0.090559,0.011901,0.099357,0.096353,0.01641,0.019,0.02202,0.1158,0.015961,0.018533,0.021519,0.114949,-0.045442,0.452735,0.478823,0.150252,0.139172,0.122226,0.064753
std_fpktl,0.03935,0.010172,0.002331,0.011416,-0.003162,-0.245792,0.052877,0.817873,0.564243,0.259588,0.406878,1.0,0.598272,-0.003338,0.164844,-0.030462,0.000778,-0.008382,0.14992,-0.025088,0.041391,0.026499,0.14337,0.260774,0.094071,,,0.010157,0.002331,-0.065727,-0.033032,-0.069738,-0.048775,-0.252067,0.706382,0.247325,0.738951,-0.030449,0.034024,-0.024454,0.045162,0.10834,0.508487,-0.054598,0.139512,0.004901,,,,0.043725,0.247325,0.259588,0.010157,0.219957,0.105756,0.082116,0.460822,0.40653,0.161492,0.260807,0.053297,0.005395,0.00748,0.000445,-0.002807,0.029585,0.031783,0.034376,0.097515,0.029228,0.031418,0.033978,0.096844,-0.020885,0.558249,0.5329,0.00014,0.010182,0.301237,0.123758
std_bpktl,0.048743,0.020324,0.014005,0.007763,0.007768,-0.225143,0.035371,0.534532,0.941626,0.051756,0.74487,0.598272,1.0,0.001593,0.168591,-0.028522,-0.001745,-0.001432,0.160015,-0.023936,0.036213,0.027269,0.143583,0.271393,0.142682,,,0.020305,0.014005,-0.07133,-0.013417,-0.072623,-0.06585,-0.229666,0.831081,0.346385,0.839132,-0.028478,0.045505,-0.022967,0.04738,0.074662,0.543797,-0.027512,0.181537,0.016223,,,,0.246787,0.346385,0.051756,0.020305,0.134302,0.204834,0.064397,0.968131,0.74464,0.420704,0.536862,0.155494,0.011922,0.005013,0.007514,0.003705,0.040433,0.043004,0.04592,0.114314,0.039991,0.042562,0.04545,0.113854,-0.033991,0.59479,0.577954,0.01205,0.020335,0.276055,0.072953
total_fiat,0.943898,0.00219,0.001718,0.000708,0.001546,-0.049879,-0.031218,-0.029233,-0.005961,-0.048869,-0.009298,-0.003338,0.001593,1.0,-0.00251,0.891675,-0.00103,0.999426,-0.004987,0.972448,-0.001881,0.446252,-0.003958,0.010947,0.006005,,,0.002189,0.001718,-0.016543,-0.007006,-0.017374,-0.016542,-0.050739,-0.025167,-0.047711,-0.007518,0.891593,0.943342,0.96241,0.372655,0.047125,-0.016272,-0.006354,0.007526,0.001994,,,,-0.006917,-0.047711,-0.048869,0.002189,-0.003214,-0.000441,-0.000374,0.001028,-0.009292,0.002833,0.003938,0.000737,0.002395,0.000719,0.001536,0.001406,0.943339,0.943621,0.943282,0.015258,0.943385,0.943668,0.943339,0.015254,0.020302,-0.011637,-0.016421,0.001615,0.00219,0.065124,0.06477
min_fiat,0.841692,-0.001975,-0.002172,-0.000746,-0.00182,-0.040046,-0.036619,-0.052289,-0.035963,-0.053811,-0.03699,-0.030462,-0.028522,0.891675,-0.008157,1.0,-0.001644,0.892524,-0.007249,0.970863,-0.003376,-0.002279,-0.006135,-0.014688,-0.005755,,,-0.001976,-0.002172,-0.013263,-0.005618,-0.01393,-0.013267,-0.040735,-0.054113,-0.061678,-0.038025,0.999912,0.842516,0.961751,-0.003504,0.055798,-0.034014,-0.008943,-0.008229,-0.002041,,,,-0.018008,-0.061678,-0.053811,-0.001976,-0.006726,-0.007416,-0.002553,-0.02344,-0.036974,-0.01304,-0.016679,-0.004396,-0.000536,-0.000342,-0.001342,-0.001184,0.843502,0.843262,0.842433,-0.005375,0.843575,0.84333,0.842512,-0.005389,0.015316,-0.02735,-0.033944,-0.002037,-0.001974,0.051853,0.074491
max_fiat,0.943438,-0.00042,-0.000714,-0.000389,-0.000548,-0.046977,-0.03263,-0.032606,-0.009878,-0.049032,-0.012626,-0.008382,-0.001432,0.999426,-0.00567,0.892524,-0.001153,1.0,-0.005374,0.973241,-0.00209,0.446162,-0.004103,-0.003336,-0.000635,,,-0.000422,-0.000714,-0.015501,-0.006566,-0.01628,-0.015504,-0.047788,-0.028094,-0.048393,-0.011081,0.892447,0.943975,0.963253,0.372735,0.050879,-0.018147,-0.005154,-0.001324,-0.000533,,,,-0.008249,-0.048393,-0.049032,-0.000422,-0.003467,-0.001062,-0.000656,-0.000151,-0.012616,0.001128,-0.000768,0.000617,0.001033,4.8e-05,0.00017,0.000119,0.944146,0.944357,0.943905,0.01256,0.944202,0.944412,0.943971,0.012557,0.017756,-0.012317,-0.016206,-0.000644,-0.000418,0.061285,0.064875
mean_fiat,0.918036,-0.00197,-0.002241,-0.000728,-0.001907,-0.042455,-0.036056,-0.048712,-0.032412,-0.052892,-0.034037,-0.025088,-0.023936,0.972448,-0.00743,0.970863,-0.001372,0.973241,-0.006695,1.0,-0.00281,0.237114,-0.005449,-0.013006,-0.005389,,,-0.00197,-0.002241,-0.014044,-0.005949,-0.01475,-0.014048,-0.043187,-0.050438,-0.060742,-0.033657,0.970779,0.918802,0.990379,0.198219,0.053624,-0.03198,-0.006456,-0.007524,-0.002085,,,,-0.01796,-0.060742,-0.052892,-0.00197,-0.006786,-0.007346,-0.002408,-0.019876,-0.034022,-0.012734,-0.016156,-0.004196,-0.000477,-0.000307,-0.001368,-0.001233,0.919754,0.919583,0.918716,-0.003516,0.919829,0.919653,0.918797,-0.003523,0.016958,-0.025823,-0.032466,-0.002113,-0.001968,0.055151,0.071397
std_flowpktl,0.036942,0.087741,0.087307,0.024298,0.080666,-0.262968,0.338513,0.631466,0.911584,0.095976,0.867507,0.738951,0.839132,-0.007518,0.162212,-0.038025,-0.003946,-0.011081,0.151473,-0.033657,0.030322,0.024414,0.130809,0.284118,0.157254,,,0.087688,0.087307,-0.086623,-0.021452,-0.088897,-0.065099,-0.304435,0.891713,0.433647,1.0,-0.03796,0.033164,-0.033201,0.040798,0.036357,0.532533,-0.028322,0.196445,0.088636,,,,0.437305,0.433647,0.095976,0.087688,0.23259,0.248759,0.082469,0.770038,0.867349,0.53235,0.612995,0.177253,0.057988,0.016836,0.05796,0.05382,0.027786,0.030499,0.033622,0.120984,0.02731,0.030013,0.0331,0.120257,-0.035737,0.580266,0.586355,0.084725,0.087764,0.219346,0.119375


### 3.- Division del DataSet

In [29]:
# Division del DataSet
train_set, val_set, test_set = train_val_test_split(X)

In [30]:
X_train, y_train = remove_labels(train_set, 'calss')
X_val, y_val = remove_labels(val_set, 'calss')
X_test, y_test = remove_labels(test_set, 'calss')


### 4.- Escalado del DataSet

Es importante comprender que los arboles de decision son algoritmos que **no requieren demaciada preparacion de los datos** concretamente, no requiere la realizacion o escalado o normalizacion. En este ejercisio se va a realizar escalado al DataSet y se van a comparr los resultados con el DataSet sin escalar. De esdta manera se muestra como aplicar preprocesamientos de como el escalado puede llegar a afectar al rendimiento del modelo 

In [31]:
scaler = RobustScaler()
X_train_scaler = scaler.fit_transform(X_train)

In [32]:
scaler = RobustScaler()
X_test_scaler = scaler.fit_transform(X_test)

In [33]:
scaler = RobustScaler()
X_val_scaler = scaler.fit_transform(X_val)

In [35]:
# Trasformacion de los datos
from pandas import DataFrame

X_train_scaled = DataFrame(X_train_scaler, columns = X_train.columns, index = X_train.index)
X_train_scaled.head(10)

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward
508881,-0.013646,0.0,1.0,-0.310056,1.556886,0.375,4.924528,-0.0271,2.25,-0.032895,2.494005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,68.333145,127.926531,67.693073,110.512958,0.375,0.267568,0.186901,1.449936,0.02881,-0.018341,-0.020676,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,3.561644,0.186901,-0.032895,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,1.1e-05,0.0,0.0,-1.0
208326,-0.013926,0.0,0.0,0.664804,0.0,6.607143,0.0,0.918699,0.0,1.115132,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,6.607143,0.705405,1.003195,-0.011049,-0.000237,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,1.003195,1.115132,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0
107213,-0.013926,0.0,0.0,0.703911,0.0,6.857143,0.0,0.95664,0.0,1.161184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,6.857143,0.743243,1.047923,-0.011049,-0.000237,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,1.047923,1.161184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0
466726,-0.000273,0.0,1.0,-0.363128,2.724551,0.035714,8.603774,-0.078591,3.931034,-0.095395,4.364508,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.390527,2.619145,1.379151,3.449949,0.035714,0.794595,0.468051,3.121935,1.415641,0.00018,0.030094,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,8.425926,0.468051,-0.095395,0.0,0.0,0.0,0.0,0.0,4.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,1.1e-05,0.0,0.0,-1.0
230085,-0.013926,0.0,0.0,0.664804,0.0,6.607143,0.0,0.918699,0.0,1.115132,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,6.607143,0.705405,1.003195,-0.011049,-0.000237,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,1.003195,1.115132,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0
472961,34.421927,1.5,4.0,1.558659,3.868263,0.0,1.0,1.341463,4.163793,0.337171,1.549161,16.895157,213.699165,0.336088,60331999.0,1062.857,11262.0,0.289293,60228809.0,0.467228,20100000.0,2.777741,34700000.0,1.0,1.0,0.0,0.0,1.5,4.0,-0.006473,0.004154,-0.004729,-0.005071,0.0,1.127027,0.209265,2.436956,0.272362,47.562216,18.6544,1230.57301,1.0,2.0,0.0,2.0,1.666667,0.0,0.0,0.0,0.87062,0.209265,0.337171,1.5,0.0,0.0,0.0,45667.33333,1.548077,0.0,0.0,0.0,4.0,14.269231,4.0,646.0,59.174875,41.926131,41.696562,0.0,59.657102,48.116772,47.580946,0.0,0.0,13.421042,12.65758,3.0,1.5,0.0
482372,-0.013805,0.5,0.0,-0.136872,0.0,0.0,0.0,0.0,0.0,-0.050987,0.0,1.413722,0.0,0.000503,0.0,15.21429,0.0,0.000807,0.0,0.002096,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.5,0.0,315.91112,0.0,156.469488,103.55251,0.0,-0.210811,-0.129393,0.230901,0.012364,-0.018561,-0.021278,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.129393,-0.050987,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-0.080736,0.0,0.0,0.5,0.0
619993,17.578734,0.5,0.0,-0.050279,0.0,0.553571,0.0,0.0,0.0,0.0,0.0,0.064561,0.0,73.123604,0.0,2202849.0,0.0,116.818174,0.0,304.574301,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.5,0.0,-0.00652,0.0,-0.005851,-0.007935,0.553571,-0.210811,-0.079872,0.0,1824.412979,24.346736,66.681024,0.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,-0.079872,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,3.192308,0.0,0.0,30.231384,21.37955,21.302026,0.0,30.549478,24.617883,24.365465,0.0,1.0,0.063376,0.0,0.0,0.5,0.0
65344,-0.013926,0.0,0.0,0.703911,0.0,6.857143,0.0,0.95664,0.0,1.161184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,6.857143,0.743243,1.047923,-0.011049,-0.000237,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,1.047923,1.161184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0
46666,-0.013926,0.0,0.0,0.505587,0.0,5.589286,0.0,0.764228,0.0,0.927632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,5.589286,0.551351,0.821086,-0.011049,-0.000237,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,0.821086,0.927632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0


In [36]:
X_train_scaled.describe()

Unnamed: 0,duration,total_fpackets,total_bpackets,total_fpktl,total_bpktl,min_fpktl,min_bpktl,max_fpktl,max_bpktl,mean_fpktl,mean_bpktl,std_fpktl,std_bpktl,total_fiat,total_biat,min_fiat,min_biat,max_fiat,max_biat,mean_fiat,mean_biat,std_fiat,std_biat,fpsh_cnt,bpsh_cnt,furg_cnt,burg_cnt,total_fhlen,total_bhlen,fPktsPerSecond,bPktsPerSecond,flowPktsPerSecond,flowBytesPerSecond,min_flowpktl,max_flowpktl,mean_flowpktl,std_flowpktl,min_flowiat,max_flowiat,mean_flowiat,std_flowiat,flow_fin,flow_syn,flow_rst,flow_psh,flow_ack,flow_urg,flow_cwr,flow_ece,downUpRatio,avgPacketSize,fAvgSegmentSize,fHeaderBytes,fAvgBytesPerBulk,fAvgPacketsPerBulk,fAvgBulkRate,bVarianceDataBytes,bAvgSegmentSize,bAvgBytesPerBulk,bAvgPacketsPerBulk,bAvgBulkRate,sflow_fpacket,sflow_fbytes,sflow_bpacket,sflow_bbytes,min_active,mean_active,max_active,std_active,min_idle,mean_idle,max_idle,std_idle,FFNEPD,Init_Win_bytes_forward,Init_Win_bytes_backward,RRT_samples_clnt,Act_data_pkt_forward,min_seg_size_forward
count,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0,379173.0
mean,12.543409,2.927643,10.762678,1.947184,75.104138,1.600682,0.855241,0.491724,1.591823,0.303639,1.014557,3.439516,61.636887,38.860997,7147523.0,833560.3,951107.0,58.145453,6298974.0,126.177236,2063700.0,164.122,2243039.0,0.861778,1.070132,0.0,0.0,2.931056,10.762678,984.393314,110.028225,516.685557,269.838608,1.583599,0.48842,0.299527,0.92925,688.5496,16.453111,28.84289,187.0475,0.529429,0.429477,0.066421,1.93191,4.956526,0.0,0.0,0.0,1.243744,0.299527,0.303639,2.931056,53.583019,0.336585,228529.9,28454.905968,1.016042,457.111767,0.494429,214383.6,3.197622,7.910795,5.196285,6398.888,19.698746,14.179793,14.41701,466937.7,19.838596,16.279309,16.441899,468661.8,0.362658,2.861506,3.435883,10.087709,2.924548,-0.376798
std,116.165117,92.319402,370.875546,185.549801,3071.968461,2.817301,1.676098,0.7865,3.210186,0.533388,1.985607,7.959015,157.021881,453.629167,71456110.0,11713750.0,44334200.0,724.373302,71092070.0,1690.470033,48049560.0,6204.794,31335050.0,3.804841,13.671129,0.0,0.0,92.381891,370.875546,5433.35764,1419.815251,2712.294228,1443.50506,2.738992,1.038379,0.551685,1.764197,9700.87,160.819659,374.179837,4308.946,0.652334,0.911635,0.302768,16.541754,182.595272,0.0,0.0,0.0,4.791424,0.551685,0.533388,92.381891,604.67025,4.022938,6486112.0,88196.057761,1.989369,2552.776459,2.156685,3469284.0,162.611348,1271.541381,295.218736,409031.1,199.389786,141.185645,140.610872,6181972.0,201.477623,162.562856,160.822696,6188642.0,3.836035,5.577212,7.354164,369.660712,92.301146,0.466297
min,-0.013936,-0.5,0.0,-0.513966,0.0,-0.946429,0.0,-0.227642,0.0,-0.273026,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.5,0.0,-0.008692,0.0,-0.006927,-0.00881,-0.339286,-0.345946,-0.239617,-0.011049,-0.010885,-0.018743,-0.021776,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,-0.239617,-0.273026,-0.5,0.0,0.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286345,0.0,0.0,-0.5,-1.0
25%,-0.013926,0.0,0.0,-0.321229,0.0,0.0,0.0,-0.084011,0.0,-0.101974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.008692,0.0,-0.006927,-0.00881,0.0,-0.294595,-0.178914,-0.011049,-0.0002366304,-0.018729,-0.021737,0.0,0.0,0.0,0.0,0.0,-0.666667,0.0,0.0,0.0,0.0,-0.178914,-0.101974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.286342,0.0,0.0,0.0,-1.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.986074,1.0,1.0,0.678771,1.0,1.0,1.0,0.915989,1.0,0.898026,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.991308,1.0,0.993073,0.99119,1.0,0.705405,0.821086,0.988951,0.9997634,0.981271,0.978263,1.0,1.0,0.0,0.0,1.0,0.333333,0.0,0.0,0.0,1.0,0.821086,0.898026,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.713658,1.0,1.0,1.0,0.0
max,25277.128872,24127.0,74768.0,113118.041899,622288.473054,23.892857,26.245283,3.840108,11.991379,4.299342,13.333333,61.629454,946.108345,105064.031995,26789120000.0,1903408000.0,11541020000.0,167844.283001,26789120000.0,263041.441388,11500000000.0,2816003.0,15500000000.0,330.0,2133.0,0.0,0.0,24127.0,74768.0,66974.991318,62683.999989,33172.99307,88795.768963,23.892857,3.618919,4.095847,10.536464,1576414.0,35008.267353,57606.90842,1689339.0,2.0,10.0,2.0,2143.0,41006.666667,0.0,0.0,0.0,729.75,4.095847,4.299342,24127.0,95256.0,2268.0,615428600.0,895121.0,13.365385,126269.0,92.0,493222200.0,48255.0,778777.75,74768.0,103922200.0,43436.466301,30750.456918,30606.76162,566000000.0,43893.471528,35408.189283,35008.286082,567000000.0,2267.0,13.421042,46.346741,74524.0,24127.0,0.375


### 5.- Decision Forest

In [37]:
# Modelo entrenado con el DataSet sin escalar

from sklearn.tree import DecisionTreeClassifier

clf_tree = DecisionTreeClassifier(random_state=42)
clf_tree.fit(X_train, y_train)

0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,42
,max_leaf_nodes,
,min_impurity_decrease,0.0
