The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.
- R script called run_analysis.R that does the following.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
#######################################
This steps of R script of the course project
-
Determine Main Directory : UCI HAR Dataset
-
Create function mergecol : Function to load data list with pattern *.txt and combine all column as a alphabet
-
Create dttrain : Load training folder and create dttest : Load test folder, join dttrain & dttest (rbind) -> dtallfeatures
-
Read table activity and featuresname
-
Create label character class featuresname in Column V2
-
Create labelx class with cleaning label from "[-|(|)|,]" using gsub
-
Join Subject, labelx and activity label (namesallfeatures)
-
Naming dtallfeatures with namesallfeatures
-
Search column with mean and std labels (labelmean, labelstd)
-
Select dtallfeatures only with numfeatures (labelmean, labelstd)
-
Create dtmeanstdsubact data frame for dtmeanstd, subject and activity factors
-
Average of each variable dtmeanstdsubact for each activity and each subject.
-
Cleaning NA column
-
Write tidy data to indendepentidy.txt
-
OK