# Ensemble methods. Exercises


In this section we have only two exercise:

1. Find the best three classifier in the stacking method using the classifiers from scikit-learn package.

2. Build arcing arc-x4 method. 

In [1]:
%store -r data_set
%store -r labels
%store -r test_data_set
%store -r test_labels
%store -r unique_labels

## Exercise 1: Find the best three classifier in the stacking method

Please use the following classifiers:

* Linear regression,
* Nearest Neighbors,
* Linear SVM,
* Decision Tree,
* Naive Bayes,
* QDA.

In [2]:
import numpy as np
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

In [3]:
def build_classifiers():
    # fill this part
    linearRegression = LinearRegression()
    kNeighbors = KNeighborsClassifier()
    svc = SVC(gamma='auto')
    decisionTree = DecisionTreeClassifier()
    gaussianNB = GaussianNB()
    qda = QuadraticDiscriminantAnalysis()
    return linearRegression, kNeighbors, svc, decisionTree, gaussianNB, qda # and here

In [4]:
def build_stacked_classifier(classifiers,stacked):
    output = []
    for classifier in classifiers:
        classifier.fit(data_set,labels)
        output.append(classifier.predict(data_set))
    output = np.array(output).reshape((130,3))
    
    # stacked classifier part:
    stacked_classifier = stacked # set here
    stacked_classifier.fit(output.reshape((130,3)), labels.reshape((130,)))
    test_set = []
    for classifier in classifiers:
        test_set.append(classifier.predict(test_data_set))
    test_set = np.array(test_set).reshape((len(test_set[0]),3))
    predicted = stacked_classifier.predict(test_set)
    return predicted

In [5]:
all_classifiers = build_classifiers()
all_classifiers[0]
classifiers_names = {
    all_classifiers[0]:"Linear Regression", 
    all_classifiers[1]:"K Neighbors Classifier",
    all_classifiers[2]:"SVC",
    all_classifiers[3]:"Decision Tree Classifier",
    all_classifiers[4]:"Naive Bayes",
    all_classifiers[5]:"QDA"}

In [6]:
import itertools 

for combination in itertools.combinations(all_classifiers,4):
    for i in range(4):
        predicted = build_stacked_classifier(np.delete(combination,i), combination[i])
        accuracy = accuracy_score(test_labels, predicted.round())
        print("stacked: " + classifiers_names[combination[i]])
        print("rest: ", end = "")
        for elem in np.delete(combination,i):
            print(classifiers_names[elem], end = ", ")
        print("\naccurancy: " + str(accuracy), end = "\n\n")

stacked: Linear Regression
rest: K Neighbors Classifier, SVC, Decision Tree Classifier, 
accurancy: 0.55

stacked: K Neighbors Classifier
rest: Linear Regression, SVC, Decision Tree Classifier, 
accurancy: 0.4

stacked: SVC
rest: Linear Regression, K Neighbors Classifier, Decision Tree Classifier, 
accurancy: 0.5

stacked: Decision Tree Classifier
rest: Linear Regression, K Neighbors Classifier, SVC, 
accurancy: 0.3

stacked: Linear Regression
rest: K Neighbors Classifier, SVC, Naive Bayes, 
accurancy: 0.55

stacked: K Neighbors Classifier
rest: Linear Regression, SVC, Naive Bayes, 
accurancy: 0.4

stacked: SVC
rest: Linear Regression, K Neighbors Classifier, Naive Bayes, 
accurancy: 0.5

stacked: Naive Bayes
rest: Linear Regression, K Neighbors Classifier, SVC, 
accurancy: 0.45

stacked: Linear Regression
rest: K Neighbors Classifier, SVC, QDA, 
accurancy: 0.55

stacked: K Neighbors Classifier
rest: Linear Regression, SVC, QDA, 
accurancy: 0.4

stacked: SVC
rest: Linear Regression, K 



stacked: Decision Tree Classifier
rest: Linear Regression, SVC, Naive Bayes, 
accurancy: 0.4

stacked: Naive Bayes
rest: Linear Regression, SVC, Decision Tree Classifier, 
accurancy: 0.5

stacked: Linear Regression
rest: SVC, Decision Tree Classifier, QDA, 
accurancy: 0.55

stacked: SVC
rest: Linear Regression, Decision Tree Classifier, QDA, 
accurancy: 0.5

stacked: Decision Tree Classifier
rest: Linear Regression, SVC, QDA, 
accurancy: 0.4

stacked: QDA
rest: Linear Regression, SVC, Decision Tree Classifier, 
accurancy: 0.55

stacked: Linear Regression
rest: SVC, Naive Bayes, QDA, 
accurancy: 0.55

stacked: SVC
rest: Linear Regression, Naive Bayes, QDA, 
accurancy: 0.5

stacked: Naive Bayes
rest: Linear Regression, SVC, QDA, 
accurancy: 0.5

stacked: QDA
rest: Linear Regression, SVC, Naive Bayes, 
accurancy: 0.3

stacked: Linear Regression
rest: Decision Tree Classifier, Naive Bayes, QDA, 
accurancy: 0.6

stacked: Decision Tree Classifier
rest: Linear Regression, Naive Bayes, QDA, 
a



In [7]:
# classifiers = build_classifiers()
# predicted = build_stacked_classifier(classifiers)
# accuracy = accuracy_score(test_labels, predicted)
# print(accuracy)

## Exercise 2: 

Use the boosting method and change the code to fullfilt the following requirements:

* the weights should be calculated as:
$w_{n}^{(t+1)}=\frac{1+ I(y_{n}\neq h_{t}(x_{n})}{\sum_{i=1}^{N}1+I(y_{n}\neq h_{t}(x_{n})}$,
* the prediction is done with a voting method.

In [8]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# prepare data set

def generate_data(sample_number, feature_number, label_number):
    data_set = np.random.random_sample((sample_number, feature_number))
    labels = np.random.choice(label_number, sample_number)
    return data_set, labels

labels = 2
dimension = 2
test_set_size = 1000
train_set_size = 5000
train_set, train_labels = generate_data(train_set_size, dimension, labels)
test_set, test_labels = generate_data(test_set_size, dimension, labels)

# init weights
number_of_iterations = 10
weights = np.ones((test_set_size,)) / test_set_size


def train_model(classifier, weights):
    return classifier.fit(X=test_set, y=test_labels, sample_weight=weights)

def calculate_error(model):
    predicted = model.predict(test_set)
    I=calculate_accuracy_vector(predicted, test_labels)
    Z=np.sum(I)
    return (1+Z)/1.0

In [9]:
test_set_size

1000

Fill the two functions below:

In [10]:
def set_new_weights(model):
    # fill the code here (two lines)
    predicted=model.predict(test_set)
    helper = np.ones(test_set_size) + calculate_accuracy_vector(predicted,test_labels)
    return helper/sum(helper)

In [11]:
#taken from: 073Ensemble_Boosting notebook
def calculate_accuracy_vector(predicted, labels):
    result = []
    for i in range(len(predicted)):
        if predicted[i] == labels[i]:
            result.append(0)
        else:
            result.append(1)
    return result

Train the classifier with the code below:

In [12]:
classifier = DecisionTreeClassifier(max_depth=1, random_state=1)
classifier.fit(X=train_set, y=train_labels)
alphas = []
classifiers = []
for iteration in range(number_of_iterations):
    model = train_model(classifier, weights)
    weights = set_new_weights(model)
    classifiers.append(model)

print(weights)


validate_x, validate_label = generate_data(1, dimension, labels)

[0.00066313 0.00132626 0.00066313 0.00132626 0.00066313 0.00066313
 0.00066313 0.00132626 0.00132626 0.00132626 0.00132626 0.00132626
 0.00132626 0.00066313 0.00132626 0.00066313 0.00066313 0.00066313
 0.00066313 0.00132626 0.00132626 0.00066313 0.00066313 0.00066313
 0.00132626 0.00132626 0.00132626 0.00066313 0.00132626 0.00132626
 0.00132626 0.00132626 0.00066313 0.00132626 0.00066313 0.00066313
 0.00066313 0.00132626 0.00066313 0.00066313 0.00066313 0.00066313
 0.00066313 0.00132626 0.00132626 0.00132626 0.00066313 0.00066313
 0.00066313 0.00066313 0.00132626 0.00132626 0.00066313 0.00132626
 0.00066313 0.00132626 0.00132626 0.00066313 0.00132626 0.00132626
 0.00066313 0.00132626 0.00132626 0.00132626 0.00066313 0.00132626
 0.00066313 0.00132626 0.00066313 0.00132626 0.00132626 0.00132626
 0.00066313 0.00066313 0.00132626 0.00066313 0.00066313 0.00132626
 0.00132626 0.00066313 0.00066313 0.00132626 0.00132626 0.00132626
 0.00066313 0.00066313 0.00066313 0.00066313 0.00066313 0.0006

Set the validation data set:

In [13]:
validate_x, validate_label = generate_data(1, dimension, labels)

Fill the prediction code:

In [14]:
def get_prediction(x):
    # fill the code here (5-6 lines)
    predictions = []
    for i in range(len(classifiers)):
        predictions.append(classifiers[i].predict(x))
    result = [0 if predictions.count(0) > predictions.count(1) else 1]
    return np.array(result)

Test it:

In [15]:
prediction = get_prediction(validate_x)[0]

print(prediction)

1
