# Lab 5: Feature Selection 

This notebook builds on top of Lab 4 by introducing feature selection into the process of selecting the best classifier for a binary classification problem.

The feature selection method applied here is Recursive Feature Elimination (RFE) as demonstrated in the tutorial at https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/.

In this demonstration we use a modified version of the seeds data set (see https://archive.ics.uci.edu/ml/datasets/seeds), which is the same data set used in Lab 4.

## A. Preparation

### Import Python modules

In [2]:
import pandas as pd
import numpy as np

from sklearn import preprocessing #needed for scaling attributes to the nterval [0,1]

from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE

from sklearn.model_selection import train_test_split

### Load and prepare the dataset for training and evaluation
Feel free to apply any other pre-processing technique at this point.

In [3]:
lab5_df = pd.read_csv("./seeds_dataset_binary.csv")
lab5_df.describe()

Unnamed: 0,area,perimeter,compactness,length of kernel,width of kernel,asymmetry coefficient,length of kernel groove,type
count,210.0,210.0,210.0,210.0,210.0,210.0,210.0,210.0
mean,14.847524,14.559286,0.870999,5.628533,3.258605,3.700201,5.408071,0.333333
std,2.909699,1.305959,0.023629,0.443063,0.377714,1.503557,0.49148,0.472531
min,10.59,12.41,0.8081,4.899,2.63,0.7651,4.519,0.0
25%,12.27,13.45,0.8569,5.26225,2.944,2.5615,5.045,0.0
50%,14.355,14.32,0.87345,5.5235,3.237,3.599,5.223,0.0
75%,17.305,15.715,0.887775,5.97975,3.56175,4.76875,5.877,1.0
max,21.18,17.25,0.9183,6.675,4.033,8.456,6.55,1.0


In [4]:
# target attribute
target_attribute_name = 'type'
target = lab5_df[target_attribute_name]

# predictor attributes
predictors = lab5_df.drop(target_attribute_name, axis=1).values
predictors

array([[15.26  , 14.84  ,  0.871 , ...,  3.312 ,  2.221 ,  5.22  ],
       [14.88  , 14.57  ,  0.8811, ...,  3.333 ,  1.018 ,  4.956 ],
       [14.29  , 14.09  ,  0.905 , ...,  3.337 ,  2.699 ,  4.825 ],
       ...,
       [13.2   , 13.66  ,  0.8883, ...,  3.232 ,  8.315 ,  5.056 ],
       [11.84  , 13.21  ,  0.8521, ...,  2.836 ,  3.598 ,  5.044 ],
       [12.3   , 13.34  ,  0.8684, ...,  2.974 ,  5.637 ,  5.063 ]])

Split the data set into a training (80%) and test (20%) data sets.

In [5]:
# pepare independent stratified data sets for training and test of the final model
predictors_train, predictors_test, target_train, target_test = train_test_split(
    predictors, target, test_size=0.20, shuffle=True, stratify=target)

Scale all predictor values to the range [0, 1]. Note the target attribute is already binary.

Note that the MinMaxScaler is applied separately to the training and the testing datasets. 
This is to ensure that this transformation when performed on teh testing dataset is not influnced by the training dataset.

In [6]:
predictors_train.shape

(168, 7)

In [7]:
min_max_scaler = preprocessing.MinMaxScaler()
predictors_train = min_max_scaler.fit_transform(predictors_train)
predictors_test = min_max_scaler.fit_transform(predictors_test)

In [8]:
predictors_train

array([[0.36543909, 0.40248963, 0.66878403, ..., 0.53243051, 0.26979165,
        0.25849335],
       [0.20774315, 0.23236515, 0.63974592, ..., 0.30220955, 0.62489569,
        0.21614968],
       [0.79225685, 0.88174274, 0.46188748, ..., 0.74126871, 0.38754156,
        0.97439685],
       ...,
       [0.41737488, 0.48755187, 0.52268603, ..., 0.4383464 , 0.13588259,
        0.23732152],
       [0.05571294, 0.13070539, 0.16787659, ..., 0.04490378, 0.33999126,
        0.23732152],
       [0.49008499, 0.5186722 , 0.76406534, ..., 0.57305773, 0.63946542,
        0.30379124]])

## B. Feature Selection

#### 1. Apply RFE with SVM for selecting the best features

In [6]:
# create a base classifier used to evaluate a subset of attributes
estimatorSVM = svm.SVR(kernel="linear")
selectorSVM = RFE(estimatorSVM, 3)
selectorSVM = selectorSVM.fit(predictors_train, target_train)
# summarize the selection of the attributes
print(selectorSVM.support_)
print(selectorSVM.ranking_)

[False False False  True False  True  True]
[4 3 2 1 5 1 1]


#### 2. Apply RFE with Logistic Regression for selecting the best features

In [7]:
# create a base classifier used to evaluate a subset of attributes
estimatorLR = LogisticRegression()
# create the RFE model and select 3 attributes
selectorLR = RFE(estimatorLR, 3)
selectorLR = selectorLR.fit(predictors_train, target_train)
# summarize the selection of the attributes
print(selectorLR.support_)
print(selectorLR.ranking_)

[False False  True False False  True  True]
[3 5 1 4 2 1 1]


## B. Evaluate on the Test Data Set

Apply the selectors to prepare training data sets only with the selected features

__Note:__ The same selectors are applied to the test data set. However, it is important that the test data set was not used by (it's invisible to) the selectors. 

In [8]:
predictors_train_SVMselected = selectorSVM.transform(predictors_train)
predictors_test_SVMselected = selectorSVM.transform(predictors_test)

In [9]:
predictors_train_LRselected = selectorLR.transform(predictors_train)
predictors_test_LRselected = selectorLR.transform(predictors_test)

### Train and evaluate SVM classifiers with both the selected features and all features 

Here we train three models:
* model1 - with the features selected by SVM
* model2 - with the features selected by Logistic Regression
* model3 - with all features (i.e. without feature selection)

In [10]:
classifier = svm.SVC()

In [11]:
model1 = classifier.fit(predictors_train_SVMselected, target_train)
model1.score(predictors_test_SVMselected, target_test)

0.8809523809523809

In [12]:
model2 = classifier.fit(predictors_train_LRselected, target_train)
model2.score(predictors_test_LRselected, target_test)

0.8571428571428571

In [1]:
model3 = classifier.fit(predictors_train, target_train)
model3.score(predictors_test, target_test)

NameError: name 'classifier' is not defined

## C. Conclusion

The results above, give evidence that model1 is most accurate.

However, when you execute this code again, it is very likely to get different results.

To get more accurate results, accounting for the variance in the results, it is better to run the whole experiment multiple times and measure the variance in the results. Then pick the model that gives better results.