
### Mutual Information Score Method

---
The Mutual Information is a measure of the similarity between two labels of the same data. Get Top-4 features

### Recursive Feature Elimination Method

---
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. 

I used RFE with Ridge Regressor. Rigde Regressor parameters were default. Get Top-4 features


### Import Libraries
---

In [1]:
#Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Preparing & Exploring Data
---
This data about effect of features on calory burning

    - Calories Burned	
    - Steps	
    - Distance	
    - Floors	
    - Minutes Sedentary	
    - Minutes Lightly Active	
    - Minutes Fairly Active	
    - Minutes Very Active	
    - Activity Calories



In [4]:
data = pd.read_csv("../data/fitbit.csv") #Reading data
print("Length of data = ",len(data))
print("Shape of data  = ",data.shape)
data.head()

Length of data =  30
Shape of data  =  (30, 10)


Unnamed: 0,Date,Calories Burned,Steps,Distance,Floors,Minutes Sedentary,Minutes Lightly Active,Minutes Fairly Active,Minutes Very Active,Activity Calories
0,7/07/2016,2682,12541,9.02,13,667,171,18,60,1248
1,8/07/2016,2423,8029,5.7,35,760,208,13,6,928
2,9/07/2016,2875,10801,7.67,3,496,148,18,46,1040
3,10/07/2016,2638,11997,8.52,22,771,248,3,27,1285
4,11/07/2016,2423,9039,6.42,12,714,232,10,16,1044


In [21]:
#Drop "Date" Column
data = data.drop(['Date'], axis=1) 
print("Shape of data after drop column = ",data.shape)

#There is error in csv file that some of columns are considered as strings
#I converted them to floats
data["Steps"] =data["Steps"].apply(lambda x : x.replace(',' , '')).astype(float)
data["Calories Burned"] =data["Calories Burned"].apply(lambda x : x.replace(',' , '')).astype(float)
data["Activity Calories"] =data["Activity Calories"].apply(lambda x : x.replace(',' , '')).astype(float)


Shape of data after drop column =  (30, 9)


In [22]:
#Check Data to control Date column was dropped or not
data.head()

Unnamed: 0,Calories Burned,Steps,Distance,Floors,Minutes Sedentary,Minutes Lightly Active,Minutes Fairly Active,Minutes Very Active,Activity Calories
0,2682.0,12541.0,9.02,13,667,171,18,60,1248.0
1,2423.0,8029.0,5.7,35,760,208,13,6,928.0
2,2875.0,10801.0,7.67,3,496,148,18,46,1040.0
3,2638.0,11997.0,8.52,22,771,248,3,27,1285.0
4,2423.0,9039.0,6.42,12,714,232,10,16,1044.0


In [23]:
# Split Data as X(features) and y(targets)
y = data.iloc[:, 0].values
X = data.iloc[:, 1:9].values
print("Shape of X = ",X.shape)
print("Shape of y = ",y.shape)

Shape of X =  (30, 8)
Shape of y =  (30,)


### Function

In [24]:
from sklearn.feature_selection import mutual_info_regression
from sklearn.feature_selection import RFE
from sklearn.linear_model import Ridge

def problem1(X,y):
    """
    Input:
        
        X : Features
        y : Targets
        
    Returns:
    
        top_mutual : Top-4 features by mutual info
        top_rfe    : Top-4 features by RFE
        
    """
    
    mic = mutual_info_regression(X, y) #Find mutual information between classes
    top_mutual = mic.argsort()[-4:][::-1] #Sort and get Top-4 
    
    model = Ridge() #Create Ridge Regressor 
    rfe = RFE(model,4,step=0.1,verbose=0) #Create RFE object
    rfe_selection = rfe.fit(X,y) #Training
    top_rfe = rfe_selection.support_ #Top-4 Elements
    
    return top_mutual, top_rfe

In [25]:
#Print Results
mutual,rfe = problem1(X,y)
list_mutual = []
list_rfe = []

# Add answer to list object to print results properly
for i, col  in enumerate(mutual): 
    list_mutual.append(data.columns[col+1])

list_rfe.append(data.columns[1:9][rfe])

print("Top-4 Features")   
print("--------------------")   
print("Mutual Top-4 Features: {}\n".format(list_mutual))
print("RFE Top-4 Features: {}".format(list_rfe[0]))


Top-4 Features
--------------------
Mutual Top-4 Features: ['Activity Calories', 'Minutes Fairly Active', 'Steps', 'Distance']

RFE Top-4 Features: Index(['Distance', 'Minutes Lightly Active', 'Minutes Fairly Active',
       'Minutes Very Active'],
      dtype='object')
