# Generating Training Data Set

After pre-processing and data augmentation, we are merging data from all participants to a single file as training data set for CNN Model.
After merging data from all participant we are reshaping data into 27 X 15 matrices and re-labeling data containing only labels as 'finger' and 'phalanx'.

In [1]:
from numpy import *
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import time
import os, sys

In [2]:
%%time
#Merging all augmented data and storing it as training data set.
df_training = pd.DataFrame()
counter = 0

#Here we include data set from finger orientation study project as 'Participant 26 (data26.pkl)' file.

for i in range(1,27):
    df_aug = pd.read_pickle('DataSet_Phalanx/03_Augmented_DataSet/data'+str(i)+'.pkl')
    print('Number of Images for Participant '+ str(i)+' are '+str(df_aug.shape[0])+'.')
    counter = counter+df_aug.shape[0]
    df_training = pd.concat([df_training,df_aug],ignore_index=True)

print('Total number of images in training set are '+str(counter)+'.')



Number of Images for Participant 1 are 31792.
Number of Images for Participant 2 are 42700.
Number of Images for Participant 3 are 37532.
Number of Images for Participant 4 are 46896.
Number of Images for Participant 5 are 53932.
Number of Images for Participant 6 are 66296.
Number of Images for Participant 7 are 66832.
Number of Images for Participant 8 are 48152.
Number of Images for Participant 9 are 52336.
Number of Images for Participant 10 are 47464.
Number of Images for Participant 11 are 48232.
Number of Images for Participant 12 are 45632.
Number of Images for Participant 13 are 52848.
Number of Images for Participant 14 are 48340.
Number of Images for Participant 15 are 54680.
Number of Images for Participant 16 are 59520.
Number of Images for Participant 17 are 52348.
Number of Images for Participant 18 are 51128.
Number of Images for Participant 19 are 48420.
Number of Images for Participant 20 are 36720.
Number of Images for Participant 21 are 52700.
Number of Images for P

In [3]:
%%time
#convert to 27x15 Matrix image
for i in range(df_training.shape[0]):
    full_matrix = np.zeros(405).reshape(27,15)
    D = df_training.Cropped_Matrix[i]
    full_matrix[:D.shape[0],:D.shape[1]]=D
    df_training.at[i,'Cropped_Matrix'] = full_matrix.astype(int32)


CPU times: user 3min 30s, sys: 3.62 s, total: 3min 33s
Wall time: 3min 33s


In [4]:
%%time
#change labels

df_training.rename(columns={'Participant': 'Participant', 'Task': 'Label','Cropped_Matrix': 'Input'}, inplace=True)
df_training['Label'] = df_training['Label'].replace({'DRAG': 'finger',
                                                     'TAP': 'finger',
                                                     'SCROLL':'finger',
                                                     'PHALANX_SCROLL': 'phalanx',
                                                     'PHALANX_TAP': 'phalanx',
                                                     'PHALANX_DRAG': 'phalanx',
                                                     2: 'finger',
                                                     4: 'finger'
                                                    })


CPU times: user 1.3 s, sys: 140 ms, total: 1.44 s
Wall time: 1.44 s


In [5]:
df_training.shape[0]

1761334

In [6]:
#Deleting Images which are captured during task changing duration labelled as Pause (to avoid wrong labelling).
for i in range(df_training.shape[0]):
    if(df_training.Label[i]=='PAUSE'):
        print(i,df_training.Label[i])
        df_training.drop(i, inplace=True)

223871 PAUSE
240445 PAUSE
257019 PAUSE
273593 PAUSE
280958 PAUSE
297666 PAUSE
314374 PAUSE
331082 PAUSE
1003726 PAUSE
1016901 PAUSE
1030076 PAUSE
1043251 PAUSE


In [7]:
df_training.to_pickle('DataSet_Phalanx/04_Training_Set/trainingdata.pkl')

In [8]:
df = pd.read_pickle('DataSet_Phalanx/04_Training_Set/trainingdata.pkl')

In [9]:
df['Label'].value_counts()

finger     1071422
phalanx     689900
Name: Label, dtype: int64