### Ensemble Learning.

### Definition

###### . Ensemble techniques combine individual models together to improve the stability and predictive power of the model.
###### . This technique permits higher predictive performance.
###### . It combines multiple machine learning models into one predictive model.

#### Ensemble Methods are divided into two.

##### Sequential Ensemble Method

###### . Base learners are generated consecutively

###### . Basic motivation is to use the dependence between the base learners

###### . The overall performance of a model can be boosted

##### Parallel Ensemble Method

###### . Applied wherever the base learners are generated in parallel

###### . Basic motivation is to use independence between the base learners

In [1]:
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier

In [2]:
#import the dataset
dataframe = pd.read_csv("//Users//motolanikay-salami//Downloads//diabetes (1).csv")
#extract the values from the column in the form of an array
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 30

In [4]:
#build classifiers using adaboost and xgboost
kfold = model_selection.KFold(n_splits=10,random_state=seed, shuffle=True)
model = AdaBoostClassifier(n_estimators=num_trees,random_state=seed)
results = model_selection.cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.7552802460697198


In [7]:
#do the same for xgboost
from sklearn import svm
from xgboost import XGBClassifier
clf=XGBClassifier()

seed =7
num_trees=30
kfold = model_selection.KFold(n_splits=10,random_state=seed, shuffle=True)
model = XGBClassifier(n_estimators=num_trees,random_state=seed)
results = model_selection.cross_val_score(model,X,Y,cv=kfold)
print(results.mean())











0.7382433356117566


#### Generating Random Forest using the cross validation spliting technique.

In [2]:
from sklearn.datasets import load_iris

In [3]:
iris_data = load_iris()
print(iris_data)

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
     

In [8]:
# extract the input data and target values
data_input = iris_data.data
data_output = iris_data.target

In [6]:
print(data_input)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

In [7]:
print(data_output)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [9]:
#split the dataset into five consecutive groups
from sklearn.model_selection import KFold
kf = KFold(n_splits=5,shuffle=True)

In [10]:
#using a forloop iterate the train test set using the input data
print("Train Set      Train Set   ")
#split the dataset into ten consecutive folds,iterate the train test set within the output data
for train_set,test_set in kf.split(data_input):
    print(train_set,test_set)

Train Set      Train Set   
[  1   2   3   4   5   6   7   8  10  11  12  13  14  16  17  18  20  22
  23  24  25  26  27  28  29  30  31  32  33  34  35  37  38  39  40  42
  43  44  45  46  47  49  52  53  54  55  56  58  59  60  61  62  65  66
  68  69  72  73  74  75  77  78  79  80  81  82  83  84  85  86  87  88
  89  90  91  92  93  94  97  98  99 100 101 104 105 106 107 109 111 112
 113 114 115 116 117 119 120 121 122 123 124 126 128 130 131 132 133 135
 136 137 138 140 141 142 143 144 146 147 148 149] [  0   9  15  19  21  36  41  48  50  51  57  63  64  67  70  71  76  95
  96 102 103 108 110 118 125 127 129 134 139 145]
[  0   1   2   3   4   5   7   8   9  10  11  12  13  14  15  16  18  19
  20  21  22  23  24  25  26  27  28  29  35  36  37  38  39  40  41  43
  44  45  46  47  48  49  50  51  52  53  54  56  57  58  59  60  61  62
  63  64  67  68  69  70  71  72  74  76  78  79  80  81  82  83  84  85
  86  87  89  90  91  93  94  95  96  97  98  99 100 101 102 103 105 

In [11]:
#initialize the random forest classifier
from sklearn.ensemble import RandomForestClassifier
rf_class = RandomForestClassifier(n_estimators=10)