Getting and Cleaning Data class project

This README is for the Getting and Cleaning Data class project at Johns Hopkins University. The data used for this course can be downloaded at the following website.
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

This repo contains the following three files.

README.md
run_analysis.R
CodeBook.md

This README file contains links to the data used and specific steps taken to process the raw data taken from the above mentioned website. The run_analysis.R contains the actual code used to process the data from the downloaded raw dataset to the tidy data set that it produces. The CodeBook.md contains information about the variables and information about how I have chose to summarize the data.

Project Objective

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

run_analysis.R Script

*Notes prior to execution

Download the data from the URL above and unzip it in the directory of choice.
Download the run_analysis.R script into the directory above. This directory should contain the "UCI HAR Dataset" directory along with the test and train subdirectories.

Steps Taken to Process Raw Data

Using read.table, read in subject_train.txt, X_train.txt, y_train.txt, subject_test.txt, X_test.txt, y_test.txt, features.txt and activity_labels.txt files and store them accordingly.
Extract feature names from the features.txt file by subsetting. This files contain the names of the variables recored by the experiment.
Update the default column names in x_train and x_test data frames using the names() function with the data from step 2.
Update the default column name in the sub_train, sub_test, y_train and y_test data frames. These data frames only contain one column, Subject and Activity information. The names() function was used here again.
Replace the activity number with its character representation in y_train and y_test by looking up the meaning in the activity_labels.txt file. This was done using gsub().
Merge x_test, y_test and sub_test data frames together using the cbind().
Repeat step 6 with x_train, y_train and sub_train data frames.
Merge x_train and x_test data frames with rbind(). Store this into a new data frame called x_train_test.
Using grep, find the variable names which contain mean() or std() in their name. The features_info.txt file states the variables with the "()" after mean or std indicate mean and standard deviation respectively. There are 33 variables each for mean and standard deviation. Store the results in mean_cols and std_cols.
Subset the data frame created from step 8, x_train_test, using variables required for project. I selected Subject, Activity, and the mean and standard deviation variables found from the prior step. This is subsetted data frame is stored in tidy_data.
Using sub() and gsub(), cleanup variable names. E.g. Remove duplicate words and characters such as ".", "(", ")" and ",".
Sort the tidy_data data frame using arrange() by Subject and Activity variables.
Since we are asked to provide averages of the variables I used the group_by() function to do so. The groups selected were Subject and Activity.
Finally, aggregate or summarize data using the summarize_each function on the data frame created in step 13.
The tidy_data frame is now complete per project requirements. The data frame contains 10299 rows and 68 columns.

CodeBook.md

This file contain variable names and descriptions of each.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data class project

Project Objective

run_analysis.R Script

Steps Taken to Process Raw Data

CodeBook.md

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

zguy23/GCD_Project

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data class project

Project Objective

run_analysis.R Script

Steps Taken to Process Raw Data

CodeBook.md

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages