# Activity Recognition Using Wearable Physiological Measurements: Selection of Features from a Comprehensive Literature Study

This study had the goal of being able to detect different forms of activity by monitoring different metrics of the human body, such as heart activity, lung activity and physical movement of the arm and hand.

The 4 types of activity present in the study are:
    
    1. Neutral;
    2. Emotional;
    3. Mental;
    4. Physical.

The way to induce such activities are as follows:

    - Neutral: by having the subjects watch documentaries;
    - Emotional: by having the subjects watch movies;
    - Mental: by having the subjects play games of mental arithmetics and tetris;
    - Physical: by having the subjects go up and down flights of stairs.

In [1]:
import pandas as pd
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

## Loading the data

The data provided from the "Activity recognition using wearable physiological measurements" dataset consists in 4480 instances of 533 features each.

The first column correspond to the index of the subject. The next 174 attributes are statistics extracted from the ECG signal. The next 151 attributes are features extracted from the TEB signal. The next 104 attributes come from the EDA measured in the arm, and the next 104 ones from the EDA in the hand. The last attribute is the pattern class, that is, the corresponding activity: 1-neutral, 2-emotional, 3-mental and 4-physical.

In [2]:
with open("data/labels.txt","r",encoding="utf-8") as f:
    labels = f.read().split("\n")
    #print(labels)
#labelset=set(labels)
for n in range(len(labels)):
    if 263<=n<=276:
        labels[n] = labels[n] + "_2"
    if 326<=n<=429:
        labels[n]="Arm_" + labels[n]
    if 430<=n<=533:
        labels[n]="Hand_" + labels[n] 
    if n == len(labels)-1:
        labels[n]="Activity"
n=0
"""
for i in labels:
    print(n,i, labels.count(i))
    n+=1
"""

'\nfor i in labels:\n    print(n,i, labels.count(i))\n    n+=1\n'

In [3]:
df=pd.read_csv("data/data.txt",sep=",",names=labels)

In [4]:
df

Unnamed: 0,Subject index (1-40),ECG_original_mean,ECG_original_std,ECG_original_trimmean25,ECG_original_median,ECG_original_skewness,ECG_original_kurtosis,ECG_original_max,ECG_original_min,ECG_original_prctile25,...,Hand_EDA_Functionals_power_Filt2kurtosis,Hand_EDA_Functionals_power_Filt2max,Hand_EDA_Functionals_power_Filt2min,Hand_EDA_Functionals_power_Filt2prctile25,Hand_EDA_Functionals_power_Filt2prctile75,Hand_EDA_Functionals_power_Filt2geomean(abs),Hand_EDA_Functionals_power_Filt2harmmean,Hand_EDA_Functionals_power_Filt2mad,Hand_EDA_Functionals_power_Filt2baseline,Activity
0,1,-0.004125,0.254095,0.001426,-0.01037,-0.538509,5.95534,1.04063,-1.37437,-0.10937,...,1015.36,7.170320e+08,0.027384,2.53425,17.3882,8.05589,1.80247,1413310.0,3028080.0,1
1,1,0.031029,0.193761,0.012918,-0.00237,0.781415,5.18794,0.98963,-0.71937,-0.08737,...,1015.78,7.058540e+08,0.016947,2.51513,16.5914,7.81769,1.52349,1390180.0,3016420.0,1
2,1,0.015678,0.182336,-0.003028,-0.02337,0.881194,5.66530,0.87563,-0.71937,-0.08037,...,1016.16,6.270180e+08,0.008129,2.25959,15.2312,7.11684,1.25860,1234110.0,3004430.0,1
3,1,0.014525,0.176636,-0.006161,-0.02737,1.024900,6.10968,0.91063,-0.71937,-0.08037,...,1015.61,5.597480e+08,0.007377,2.13924,14.4663,6.70236,1.26643,1102720.0,2992170.0,1
4,1,0.010349,0.179248,-0.008526,-0.02737,0.935697,5.83902,0.91063,-0.75637,-0.08337,...,1015.67,4.844730e+08,0.011448,1.93595,12.5493,6.08647,1.22387,954322.0,2979610.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4475,40,-0.015981,0.254373,-0.013341,-0.00101,-0.165105,5.15843,1.03999,-1.19301,-0.15801,...,1014.61,3.237410e+09,0.132094,9.48535,73.9901,31.82590,5.16972,6390410.0,398810.0,4
4476,40,-0.008857,0.238946,-0.010767,-0.00901,-0.034522,5.43013,1.01499,-1.10201,-0.14501,...,1016.07,3.156070e+09,0.133406,8.73701,68.4041,29.83820,5.06743,6214830.0,412407.0,4
4477,40,0.024672,0.213325,0.014418,0.01099,0.613841,4.55481,1.01499,-0.57301,-0.10401,...,1016.06,3.052520e+09,0.138525,8.90410,68.5051,30.45150,5.88492,6011070.0,425422.0,4
4478,40,0.025063,0.212210,0.015656,0.01299,0.593249,4.58374,0.95799,-0.64101,-0.10001,...,1015.80,3.322710e+09,0.076570,8.97766,72.4431,30.38700,4.43563,6544010.0,439695.0,4


In [5]:
df.isnull().sum()

Subject index (1-40)                            0
ECG_original_mean                               0
ECG_original_std                                0
ECG_original_trimmean25                         0
ECG_original_median                             0
                                               ..
Hand_EDA_Functionals_power_Filt2geomean(abs)    0
Hand_EDA_Functionals_power_Filt2harmmean        0
Hand_EDA_Functionals_power_Filt2mad             0
Hand_EDA_Functionals_power_Filt2baseline        0
Activity                                        0
Length: 535, dtype: int64

In [6]:
df.groupby('Activity').mean()

Unnamed: 0_level_0,Subject index (1-40),ECG_original_mean,ECG_original_std,ECG_original_trimmean25,ECG_original_median,ECG_original_skewness,ECG_original_kurtosis,ECG_original_max,ECG_original_min,ECG_original_prctile25,...,Hand_EDA_Functionals_power_Filt2skewness,Hand_EDA_Functionals_power_Filt2kurtosis,Hand_EDA_Functionals_power_Filt2max,Hand_EDA_Functionals_power_Filt2min,Hand_EDA_Functionals_power_Filt2prctile25,Hand_EDA_Functionals_power_Filt2prctile75,Hand_EDA_Functionals_power_Filt2geomean(abs),Hand_EDA_Functionals_power_Filt2harmmean,Hand_EDA_Functionals_power_Filt2mad,Hand_EDA_Functionals_power_Filt2baseline
Activity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,20.5,-0.017248,0.224012,-0.035954,-0.056528,0.93765,8.602679,0.958904,-0.723009,-0.147302,...,31.076512,999.006146,9787362000.0,0.086875,26.094028,209.055769,79.662689,5.520061,19278790.0,13580140.0
2,20.5,-0.015742,0.293427,-0.030485,-0.05467,0.935858,8.273527,1.049734,-0.849642,-0.19835,...,30.90953,991.348837,11810070000.0,0.096258,31.243043,251.225875,95.839197,6.893809,23262090.0,12756560.0
3,20.5,-0.017345,0.286182,-0.033134,-0.057011,0.840043,7.112982,1.019942,-0.845316,-0.209066,...,30.81843,987.18068,17831150000.0,0.106472,46.549498,376.558618,142.193595,8.290682,35124870.0,13027900.0
4,20.5,0.326771,1.151261,0.357649,0.443789,-0.149969,3.957743,2.987728,-2.554684,-0.486353,...,31.161746,1002.9655,31903250000.0,0.297815,83.263698,675.167256,259.468831,16.685674,62869030.0,15025440.0


In [7]:
X = df.drop("Activity", axis=1)
y = df["Activity"]

print("X.shape = " + str(X.shape))
print("y.shape = " + str(y.shape))

X.shape = (4480, 534)
y shape = (4480,)


## Feature Selection

In [8]:
N_max = [5, 10, 20, 40, 60, 80]

# a “population” of 100 combinations of features (chromosomes) is randomly generated.

# f there are two combinations with exactly the same set of features, one of them is modified by randomly replacing one of the features

# For each combination in the population, if the number of features is greater than the maximum N max , then features are randomly removed from the chromosome until the condition is satisfied

# Each combination is ranked using the mean squared error of a LSLC measured using the design set

# The best 10 combinations of the population are selected as “parents” that survive and are used to regenerate the remaining 90 chromosomes using a random crossover of the parents

# Mutations are added to the population by changing a feature with a probability of 1%. It is important to highlight that the best individual of each population remains unaltered. The process iterates in Step 2 until a given number of generations are evaluated.

## Classification

### Least Squares Linear Classifier (LSLC)

In [9]:
reg_linear = LinearRegression()

# k is the number of subjects available in the design database: 40 subjects
k = 40

# cross validation
reg_linear_scores = cross_val_score(reg_linear, X, y, cv=k, scoring='r2')

# outputs the scores
print('Cross Validation scores: {}'.format(reg_linear_scores))
print("\nAverage 40-Fold CV Score: {}".format(np.mean(reg_linear_scores)))

Cross Validation scores: [  0.64959563   0.79162877   0.80946387   0.87233028   0.91384746
   0.88550389   0.64400538   0.62334097 -20.9447981    0.65994375
   0.58635386   0.84512257   0.45807501   0.67547044   0.31909445
   0.83449538   0.63208934   0.81744175   0.75580048   0.74622545
 -19.84389885   0.26642051   0.63483732  -2.01589014   0.77462539
   0.76560493   0.69003497   0.60767891   0.39915285   0.84415247
   0.74067035   0.73902414   0.60272605   0.69730805   0.75514211
   0.59038778  -0.13125502   0.71970919 -83.26358276  -8.041628  ]

Average 40-Fold CV Score: -2.772343728745898


### Least Squares Quadratic Classifier (LSQC)

In [13]:
poly = PolynomialFeatures(degree=2)

X_poli = poly.fit_transform(X)
y_poli = poly.fit_transform(y.values.reshape(-1, 1))

reg_quadratic = make_pipeline(PolynomialFeatures(2), LinearRegression())


# cross validation
reg_quadratic_scores = cross_val_score(reg_quadratic, X_poli, y_poli, cv=k, scoring='r2')

# outputs the scores
print('Cross Validation scores: {}'.format(reg_quadratic_scores))
print("\nAverage 40-Fold CV Score: {}".format(np.mean(reg_quadratic_scores)))

### Support Vector Machines (SVMs)

### Multi-layer Perceptrons (MLPs)

### k-Nearest Neighbor (kNN)

### Centroid Displacement-Based k-Nearest Neighbor (CDNN)

### Random Forests (RF)