<a 
 href="https://colab.research.google.com/github/LearnPythonWithRune/DataScienceWithPython/blob/main/colab/starter/12 - Lesson - Feature Selection.ipynb"
 target="_parent">
<img 
 src="https://colab.research.google.com/assets/colab-badge.svg"
alt="Open In Colab"/>
</a>

# Feature Selection

![Data Science Workflow](https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/jupyter/final/img/ds-workflow.png)

# Feature Selection
- **Feature selection** is about selecting attributes that have the greatest impact towards the **problem** you are solving.

- Notice: It should be clear that all steps are interconnected.

## Why Feature Selection?
- Higher accuracy
- Simpler models
- Reducing overfitting risk

See more details on [wikipedia](https://en.wikipedia.org/wiki/Feature_selection)

## Feature Selection Techniques
### Filter methods
- Independent of Model
- Based on scores of statistical
- Easy to understand
- Good for early feature removal
- Low computational requirements

#### Examples
- [Chi square](https://en.wikipedia.org/wiki/Chi-squared_test)
- [Information gain](https://en.wikipedia.org/wiki/Information_gain_in_decision_trees)
- [Correlation score](https://en.wikipedia.org/wiki/Correlation_coefficient)
- [Correlation Matrix with Heatmap](https://vitalflux.com/correlation-heatmap-with-seaborn-pandas/)

### Wrapper methods
- Compare different subsets of features and run the model on them
- Basically a search problem

#### Examples
- [Best-first search](https://en.wikipedia.org/wiki/Best-first_search)
- [Random hill-climbing algorithm](https://en.wikipedia.org/wiki/Hill_climbing)
- [Forward selection](https://en.wikipedia.org/wiki/Stepwise_regression)
- [Backward elimination](https://en.wikipedia.org/wiki/Stepwise_regression)

See more on [wikipedia](https://en.wikipedia.org/wiki/Feature_selection#Subset_selection)

### Embedded methods
- Find features that contribute most to the accuracy of the model while it is created
- Regularization is the most common method - it penalizes higher complexity

#### Examples
- [LASSO](https://en.wikipedia.org/wiki/Lasso_(statistics))
- [Elastic Net](https://en.wikipedia.org/wiki/Elastic_net_regularization)
- [Ridge Regression](https://en.wikipedia.org/wiki/Ridge_regression)

### Feature Selection Resources
- [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)
- [Comprehensive Guide on Feature Selection](https://www.kaggle.com/prashant111/comprehensive-guide-on-feature-selection/)

### Before Feature Selection
- Clean data (lesson 09)
- Divide into training and test set (lesson 10)
- Feature scaling (lesson 11)
- Only do feature selction on training set
    - To avoid overfitting

### Dataset
- [Santander Customer Satisfaction](https://www.kaggle.com/c/santander-customer-satisfaction/)
    - Which customers are happy customers?

## Filter Methods
### Constant features
- Remove constant features
- Constant features add no value

In [None]:
import pandas as pd

In [None]:
data = pd.read_parquet('https://raw.githubusercontent.com/LearnPythonWithRune/DataScienceWithPython/main/jupyter/final/files/customer_satisfaction.parquet')
data.head()

#### Constant features directly with DataFrames

#### Using Sklearn
- Remove constant and quasi constant features
- [`VarianceThreshold`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html) Feature selector that removes all low-variance features.

#### Quasi constant features
- Same value for the great majority of the observations

### Correaltion with color
- [`corr()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html) Compute pairwise correlation of columns, excluding NA/null values.
    - For better readability use: `.style.background_gradient(cmap='Blues')`
- Good features are highly correlated with target
- Ideally features should be correlated with target, but uncorrelated amont themselves

### Find correlated features
- The goal is to find and remove correlated features
- Calcualte correlation matrix (assign it to `corr_matrix`)
- A feature is correlated to any previous features if the following is true
    - Notice that we use correlation 0.8
```Python
feature = 'imp_op_var39_comer_ult1'
(corr_matrix[feature].iloc[:corr_matrix.columns.get_loc(feature)] > 0.8).any()
```
- Get all the correlated features by using list comprehension

## Wrapper Methods
### Forward Selection
- [`SequentialFeatureSelector`](http://rasbt.github.io/mlxtend/api_subpackages/mlxtend.feature_selection/#sequentialfeatureselector) Sequential Feature Selection for Classification and Regression.
- First install it by running the following in a cell
```
!pip install mlxtend
```
- For preparation remove all quasi-constant features and correlated features
```Python
X = data.drop(['TARGET'] + quasi_features + corr_features, axis=1)
y = data['TARGET']
```
- To demonstrate this we create a small training set
```Python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.75, random_state=42)
```
- We will use the `SVC` model with the `SequentialFeatureSelector`.
    - For two features

#### Good score?