# RFE

Recursive Feature Elimination (RFE) is a method of reducing features with the use of any classifier model to first fully fit a model with all features, then recursively remove features and recalculate the model's accuracy. 

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.feature_selection import RFE
from sklearn.linear_model import Lasso, LinearRegression, LogisticRegression, Ridge, SGDClassifier, ElasticNet

Data Reference: This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. Acknowledgements to:
1. O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18. 
2. William H. Wolberg and O.L. Mangasarian: "Multisurface method of pattern separation for medical diagnosis applied to breast cytology", Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December 1990, pp 9193-9196. 
3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern recognition via linear programming: Theory and application to medical diagnosis", in: "Large-scale numerical optimization", Thomas F. Coleman and Yuying Li, editors, SIAM Publications, Philadelphia 1990, pp 22-30. 
4. K. P. Bennett & O. L. Mangasarian: "Robust linear programming discrimination of two linearly inseparable sets", Optimization Methods and Software 1, 1992, 23-34 (Gordon & Breach Science Publishers).

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

Load the data that we will be using:

In [2]:
bcSet = pd.read_csv('Data/breast-cancer-wisconsin.csv',header=None, 
                      names=['ID','Clump Thickness','Uniformity of Cell Size','Uniformity of Cell Shape',
                             'Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin',
                             'Normal Nucleoli','Mitoses','Class'])
bcSet.head()

Unnamed: 0,ID,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


Set X to the dataset without our label variable and Y to our label variable:

In [3]:
Y  = bcSet['Class'].copy()
X = bcSet.drop(['Class','ID'], axis=1) 
features = X.columns

This dataset has 699 rows and 9 columns:

In [4]:
X.shape

(699, 9)

In [5]:
cols = X.columns

RFE can use any classifier model, here we'll test a few models from sklearn linear_model

In [6]:
linear_reg = LinearRegression()
lasso_reg = Lasso()
ridge_reg = Ridge()
elasticNet_reg = ElasticNet()
SGD_reg = SGDClassifier(max_iter=3, tol=None)
log_reg = LogisticRegression()

modelList = [linear_reg, lasso_reg, ridge_reg, elasticNet_reg, SGD_reg, log_reg]

In [7]:
for i in modelList:
    rfe = RFE(estimator=i, n_features_to_select=3, step=1)
    rfe = rfe.fit(X, Y)
    selectedFeature = [cols[x] for x,z in enumerate(rfe.ranking_) if z==1]
    print('Model: {} \nSupport: {} \nRanking: {} \nSelected Features: {} \n\n'.format(
        i,rfe.support_,rfe.ranking_,selectedFeature))

Model: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False) 
Support: [ True  True False False False  True False False False] 
Ranking: [1 1 4 6 5 1 2 3 7] 
Selected Features: ['Clump Thickness', 'Uniformity of Cell Size', 'Bare Nuclei'] 


Model: Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False) 
Support: [False  True  True False False  True False False False] 
Ranking: [7 1 1 6 5 1 4 3 2] 
Selected Features: ['Uniformity of Cell Size', 'Uniformity of Cell Shape', 'Bare Nuclei'] 


Model: Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001) 
Support: [ True  True False False False  True False False False] 
Ranking: [1 1 4 6 5 1 2 3 7] 
Selected Features: ['Clump Thickness', 'Uniformity of Cell Size', 'Bare Nuclei'] 


Model: ElasticNet(alpha=1.0, 