Getting and Cleaning Data usig R

This repo contains the analysis of the "Human Activity Recognition Using Smartphones" Dataset

The steps followed for collecteing this dataset are described below:

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years.
Each person performed six activities: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING- wearing a smartphone (Samsung Galaxy S II) on the waist.
Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz.
The experiments have been video-recorded to label the data manually.
The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window).
The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity.
The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used.
From each window, a vector of features was obtained by calculating variables from the time and frequency domain. See 'features_info.txt' for more details.

This repo includes the following files:

The steps followed in run_analysis.R are described below:

The files that have been read into R are: X_test, y_test, subject_test, X_train, y_train, subject_train, features.
Converted y_test to 'readable' labels to get ytestf.
Merged X_test with ytestf using cbind and then merge with subjecttest.
Repeated steps 2 and 3 for trainig data.
Merged the training and test data together using rbind.
Used factor() on features.txt and modified the variable names.
Extracted columns (into a new data frame) that measure mean and standard deviation.
Took the average by subject and activity of this new data frame using aggregate().

The finalData.txt and finalData.csv files have 180 rows and 81 columns.
After selecting values having only "mean" and "std" in their names I was left with 79 columns. I could have removed the thirteen columns having meanFreq() as variable names as these are not actual names. But, since they were not harming the final analysis in any way, I have let the total number of variables be 79. Two extra columns "subject" and "activity" have been added, thus making total columns 81.
I have left the names of the variables as they are in the features.txt document as I faced no problem using the names as they are.
The size of finalData.txt file is 263 kB.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
codebook.pdf		codebook.pdf
finalData.csv		finalData.csv
finalData.txt		finalData.txt
readme.md		readme.md
run_analysis.R		run_analysis.R