### "Human Activity Recognition" 

#### Answer the following questions by providing Python code:
#### Objectives:
- Carry out the EDA.
- Carry out the data pre-processing.
- Optimize and test a predictive model of your choice.

In [320]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import warnings
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier, GradientBoostingClassifier
#from xgboost import XGBClassifier
from sklearn import metrics, preprocessing
warnings.filterwarnings(action='ignore')                  # Turn off the warnings.
%matplotlib inline

#### Read in data:
The explanation on the dataset can be found [here](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones).

In [321]:
# Go to the directory where the data file is located. 
# os.chdir(r'~~')                # Please, replace the path with your own. 

In [322]:
df = pd.read_csv('data_human activity recognition.csv', header='infer')

In [323]:
df.shape

(19622, 160)

In [324]:
df.columns

Index(['Unnamed: 0', 'user_name', 'raw_timestamp_part_1',
       'raw_timestamp_part_2', 'cvtd_timestamp', 'new_window', 'num_window',
       'roll_belt', 'pitch_belt', 'yaw_belt',
       ...
       'gyros_forearm_x', 'gyros_forearm_y', 'gyros_forearm_z',
       'accel_forearm_x', 'accel_forearm_y', 'accel_forearm_z',
       'magnet_forearm_x', 'magnet_forearm_y', 'magnet_forearm_z', 'classe'],
      dtype='object', length=160)

In [325]:
df.head()

Unnamed: 0.1,Unnamed: 0,user_name,raw_timestamp_part_1,raw_timestamp_part_2,cvtd_timestamp,new_window,num_window,roll_belt,pitch_belt,yaw_belt,...,gyros_forearm_x,gyros_forearm_y,gyros_forearm_z,accel_forearm_x,accel_forearm_y,accel_forearm_z,magnet_forearm_x,magnet_forearm_y,magnet_forearm_z,classe
0,1,carlitos,1323084231,788290,05/12/2011 11:23,no,11,1.41,8.07,-94.4,...,0.03,0.0,-0.02,192,203,-215,-17,654.0,476.0,A
1,2,carlitos,1323084231,808298,05/12/2011 11:23,no,11,1.41,8.07,-94.4,...,0.02,0.0,-0.02,192,203,-216,-18,661.0,473.0,A
2,3,carlitos,1323084231,820366,05/12/2011 11:23,no,11,1.42,8.07,-94.4,...,0.03,-0.02,0.0,196,204,-213,-18,658.0,469.0,A
3,4,carlitos,1323084232,120339,05/12/2011 11:23,no,12,1.48,8.05,-94.4,...,0.02,-0.02,0.0,189,206,-214,-16,658.0,469.0,A
4,5,carlitos,1323084232,196328,05/12/2011 11:23,no,12,1.48,8.07,-94.4,...,0.02,0.0,-0.02,189,206,-214,-17,655.0,473.0,A


1). Carry out the EDA. Check for the missing values. HINT: The response variable is 'classe'.

In [326]:
df.isnull().sum().sum()

1921600

2). Get rid of the columns that have more than 97% missing values.

In [327]:
df = df.loc[:, df.isnull().mean() < 0.97]

In [328]:
df.columns

Index(['Unnamed: 0', 'user_name', 'raw_timestamp_part_1',
       'raw_timestamp_part_2', 'cvtd_timestamp', 'new_window', 'num_window',
       'roll_belt', 'pitch_belt', 'yaw_belt', 'total_accel_belt',
       'gyros_belt_x', 'gyros_belt_y', 'gyros_belt_z', 'accel_belt_x',
       'accel_belt_y', 'accel_belt_z', 'magnet_belt_x', 'magnet_belt_y',
       'magnet_belt_z', 'roll_arm', 'pitch_arm', 'yaw_arm', 'total_accel_arm',
       'gyros_arm_x', 'gyros_arm_y', 'gyros_arm_z', 'accel_arm_x',
       'accel_arm_y', 'accel_arm_z', 'magnet_arm_x', 'magnet_arm_y',
       'magnet_arm_z', 'roll_dumbbell', 'pitch_dumbbell', 'yaw_dumbbell',
       'total_accel_dumbbell', 'gyros_dumbbell_x', 'gyros_dumbbell_y',
       'gyros_dumbbell_z', 'accel_dumbbell_x', 'accel_dumbbell_y',
       'accel_dumbbell_z', 'magnet_dumbbell_x', 'magnet_dumbbell_y',
       'magnet_dumbbell_z', 'roll_forearm', 'pitch_forearm', 'yaw_forearm',
       'total_accel_forearm', 'gyros_forearm_x', 'gyros_forearm_y',
       'gyros_f

In [329]:
df.isnull().sum().sum()

0

In [330]:
df.shape

(19622, 60)

3). Get rid of the unnecessary columns. HINT: Those columns with "time" in the name and those that are obviously unnecessary.

In [331]:
df = df.iloc[: , 1:]
del df['user_name']
del df['raw_timestamp_part_1']
del df['raw_timestamp_part_2']
del df['cvtd_timestamp']


In [332]:
df.columns

Index(['new_window', 'num_window', 'roll_belt', 'pitch_belt', 'yaw_belt',
       'total_accel_belt', 'gyros_belt_x', 'gyros_belt_y', 'gyros_belt_z',
       'accel_belt_x', 'accel_belt_y', 'accel_belt_z', 'magnet_belt_x',
       'magnet_belt_y', 'magnet_belt_z', 'roll_arm', 'pitch_arm', 'yaw_arm',
       'total_accel_arm', 'gyros_arm_x', 'gyros_arm_y', 'gyros_arm_z',
       'accel_arm_x', 'accel_arm_y', 'accel_arm_z', 'magnet_arm_x',
       'magnet_arm_y', 'magnet_arm_z', 'roll_dumbbell', 'pitch_dumbbell',
       'yaw_dumbbell', 'total_accel_dumbbell', 'gyros_dumbbell_x',
       'gyros_dumbbell_y', 'gyros_dumbbell_z', 'accel_dumbbell_x',
       'accel_dumbbell_y', 'accel_dumbbell_z', 'magnet_dumbbell_x',
       'magnet_dumbbell_y', 'magnet_dumbbell_z', 'roll_forearm',
       'pitch_forearm', 'yaw_forearm', 'total_accel_forearm',
       'gyros_forearm_x', 'gyros_forearm_y', 'gyros_forearm_z',
       'accel_forearm_x', 'accel_forearm_y', 'accel_forearm_z',
       'magnet_forearm_x', 'magn

In [333]:
print(len(df.columns))

55


4). Label encode the responde variable. HINT: use preprocessing.LabelEncoder(). 

In [334]:
df['classe'].unique()

array(['A', 'B', 'C', 'D', 'E'], dtype=object)

In [335]:
label_encoder = preprocessing.LabelEncoder()
df['classe']= label_encoder.fit_transform(df['classe'])
df['classe'].unique()


array([0, 1, 2, 3, 4])

In [336]:
df['new_window'].unique()

array(['no', 'yes'], dtype=object)

In [337]:
df['new_window']= label_encoder.fit_transform(df['new_window'])
df['new_window'].unique()

array([0, 1])

5). Carry out min-max scaling of the exploratory variables. HINT: use preprocessing.MinMaxScaler().

In [338]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
model=scaler.fit(df)
scaled_data=model.transform(df)

In [339]:
from sklearn.decomposition import PCA
#Applying PCA
pca = PCA(n_components = 2)
pca.fit(scaled_data)

PCA(n_components=2)

In [340]:
data_p = pd.DataFrame(scaled_data, columns=df.columns)

6). Choose an algorithm and carry out the predictive analysis.

- Optimize the hyperparameter(s)
- Calculate the accuracy.
- Aim for upper 90% accuracy.

In [341]:
x=df[['new_window', 'num_window', 'roll_belt', 'pitch_belt', 'yaw_belt',
       'total_accel_belt', 'gyros_belt_x', 'gyros_belt_y', 'gyros_belt_z',
       'accel_belt_x', 'accel_belt_y', 'accel_belt_z', 'magnet_belt_x',
       'magnet_belt_y', 'magnet_belt_z', 'roll_arm', 'pitch_arm', 'yaw_arm',
       'total_accel_arm', 'gyros_arm_x', 'gyros_arm_y', 'gyros_arm_z',
       'accel_arm_x', 'accel_arm_y', 'accel_arm_z', 'magnet_arm_x',
       'magnet_arm_y', 'magnet_arm_z', 'roll_dumbbell', 'pitch_dumbbell',
       'yaw_dumbbell', 'total_accel_dumbbell', 'gyros_dumbbell_x',
       'gyros_dumbbell_y', 'gyros_dumbbell_z', 'accel_dumbbell_x',
       'accel_dumbbell_y', 'accel_dumbbell_z', 'magnet_dumbbell_x',
       'magnet_dumbbell_y', 'magnet_dumbbell_z', 'roll_forearm',
       'pitch_forearm', 'yaw_forearm', 'total_accel_forearm',
       'gyros_forearm_x', 'gyros_forearm_y', 'gyros_forearm_z',
       'accel_forearm_x', 'accel_forearm_y', 'accel_forearm_z',
       'magnet_forearm_x', 'magnet_forearm_y', 'magnet_forearm_z']]
y=df['classe']
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=0)

In [343]:
from sklearn.model_selection import RandomizedSearchCV
rs=RandomizedSearchCV(RandomForestClassifier(),{'n_estimators': [1,5,10]},
                   cv=5, 
return_train_score=False, 
n_iter=2 )

rs.fit(x_train, y_train)




{'n_estimators': 10}

In [345]:
rs.best_score_

0.9910982582103444

In [344]:
rs.best_params_

{'n_estimators': 10}

In [346]:
clf = RandomForestClassifier(n_estimators = 10)  
clf.fit(x_train, y_train)
y_pred = clf.predict(x_test)

from sklearn import metrics  
print()
  
# using metrics module for accuracy calculation
print("ACCURACY OF THE MODEL: ", metrics.accuracy_score(y_test, y_pred))



ACCURACY OF THE MODEL:  0.9938850387280881
