### Feature Selection:
Feature selection is a process where you automatically select those features in your data that
contribute most to the prediction variable or output in which you are interested. Having
irrelevant features in your data can decrease the accuracy of many models, especially linear
algorithms like linear and logistic regression.Three benefits of performing feature selection
before modeling your data are:
<ul>
    <li>Reduces Overfitting: Less redundant data means less opportunity to make decisions
        based on noise.
    </li>
    <li>
        Improves Accuracy: Less misleading data means modeling accuracy improves.
    </li>
    <li>
        Reduces Training Time: Less data means that algorithms train faster.
    </li>
</ul> 
    

### 1.0 Univariate Selection

In [4]:
import pandas as pd
from numpy import set_printoptions
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

# Load Data
names = [ ' preg ' , ' plas ' , ' pres ' , ' skin ' , ' test ' , ' mass ' , ' pedi ' , ' age ' , ' class ' ]
data = pd.read_csv('pima.csv', names = names)
data.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [7]:
# Preprocess Data
array = data.values
X = array[:, 0:8]
y = array[:, 8]

In [15]:
# Select Features
test = SelectKBest(score_func = chi2, k = 4)
fit = test.fit(X, y)

# Summarize Scores
set_printoptions(precision = 3)
print(fit.scores_)
features = fit.transform(X)

[  111.52   1411.887    17.605    53.108  2175.565   127.669     5.393
   181.304]


In [16]:
print(features[0:5,:])

[[ 148.     0.    33.6   50. ]
 [  85.     0.    26.6   31. ]
 [ 183.     0.    23.3   32. ]
 [  89.    94.    28.1   21. ]
 [ 137.   168.    43.1   33. ]]
