# H2O Demo: Human Activity Recognition with Smartphones

## About this demo

The notebook demonstrates the following H2O features:

- Starting and connecting to a H2O cluster from R.
- Importing compressed datasets (in gzip format) into H2O.
- Building and evaluating a predictive model based on training dataset.
- Making and evaluting predictions based on test dataset.

## About the dataset

- Recordings of 30 study participants performing activities of daily living
- by UCI Machine Learning
- Reference (Original): https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
- Reference (Kaggle): https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones


### Description

The Human Activity Recognition database was built from the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify activities into one of the six activities performed.

### Description of experiment

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

### Attribute information

For each record in the dataset the following is provided:

- Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
- A 561-feature vector with time and frequency domain variables.
- Its activity label.
- An identifier of the subject who carried out the experiment.

### Relevant papers

- Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

- Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge L. Reyes-Ortiz. Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic. Journal of Universal Computer Science. Special Issue in Ambient Assisted Living: Home Care. Volume 19, Issue 9. May 2013

- Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. 4th International Workshop of Ambient Assited Living, IWAAL 2012, Vitoria-Gasteiz, Spain, December 3-5, 2012. Proceedings. Lecture Notes in Computer Science 2012, pp 216-223.

- Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra-Llanas, Davide Anguita, Joan Cabestany, Andreu Català. Human Activity and Motion Disorder Recognition: Towards Smarter Interactive Cognitive Environments. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.

### Citation

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.

<hr>

## Step 1 - Start and connect to a H2O cluster (JVM)

In [1]:
# Start and connect to a H2O cluster (JVM)
suppressPackageStartupMessages(library(h2o))
h2o.init(nthreads = -1)


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpAkEs87/h2o_joe_started_from_r.out
    /tmp/RtmpAkEs87/h2o_joe_started_from_r.err


Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 333 milliseconds 
    H2O cluster version:        3.10.4.6 
    H2O cluster version age:    18 days  
    H2O cluster name:           H2O_started_from_R_joe_kmc898 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   5.21 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 3.3.2 (2016-10-31) 



## Step 2 - Importing datasets into H2O

In [2]:
# Import pre-processed datasets

# locally (if you have the datasets in the 'data' sub-folder)
# hex_train <- h2o.importFile("./data/train.csv.gz")
# hex_test <- h2o.importFile("./data/test.csv.gz")

# or directly from the web (github)
hex_train <- h2o.importFile("https://github.com/woobe/h2o_demos/blob/master/human_activitiy_recognition_with_smartphones/data/train.csv.gz?raw=true")
hex_test <- h2o.importFile("https://github.com/woobe/h2o_demos/blob/master/human_activitiy_recognition_with_smartphones/data/test.csv.gz?raw=true")



In [3]:
# Quick summary of train dataset
dim(hex_train)
head(hex_train)
summary(hex_train$activity, exact_quantiles=TRUE)

activity,V1,V2,V3,V4,V5,V6,V7,V8,V9,⋯,V552,V553,V554,V555,V556,V557,V558,V559,V560,V561
STANDING,0.2885845,-0.02029417,-0.1329051,-0.9952786,-0.9831106,-0.9135264,-0.9951121,-0.9831846,-0.923527,⋯,-0.07432303,-0.2986764,-0.7103041,-0.11275434,0.030400372,-0.4647614,-0.01844588,-0.8412468,0.1799406,-0.05862692
STANDING,0.2784188,-0.01641057,-0.1235202,-0.9982453,-0.9753002,-0.960322,-0.9988072,-0.9749144,-0.9576862,⋯,0.15807454,-0.5950509,-0.8614993,0.05347696,-0.007434566,-0.7326262,0.70351059,-0.8447876,0.1802889,-0.05431672
STANDING,0.2796531,-0.01946716,-0.1134617,-0.9953796,-0.967187,-0.978944,-0.9965199,-0.9636684,-0.9774686,⋯,0.41450281,-0.3907482,-0.7601037,-0.11855926,0.17789948,0.1006992,0.80852908,-0.8489335,0.1806373,-0.04911782
STANDING,0.2791739,-0.02620065,-0.1232826,-0.9960915,-0.9834027,-0.9906751,-0.9970995,-0.9827498,-0.9893025,⋯,0.40457253,-0.1172902,-0.4828445,-0.03678797,-0.012892494,0.640011,-0.48536645,-0.8486494,0.1819348,-0.04766318
STANDING,0.2766288,-0.01656966,-0.1153619,-0.9981386,-0.9808173,-0.9904816,-0.9983211,-0.9796719,-0.9904411,⋯,0.08775301,-0.3514709,-0.6992052,0.12332005,0.12254196,0.6935783,-0.61597061,-0.8478653,0.1851512,-0.04389225
STANDING,0.2771988,-0.01009785,-0.1051373,-0.997335,-0.9904868,-0.99542,-0.9976274,-0.9902177,-0.9955489,⋯,0.01995331,-0.5454101,-0.8446193,0.08263215,-0.14343901,0.2750408,-0.36822404,-0.8496316,0.1848225,-0.04212638


 activity                
 LAYING            :1407 
 STANDING          :1374 
 SITTING           :1286 
 WALKING           :1226 
 WALKING_UPSTAIRS  :1073 
 WALKING_DOWNSTAIRS: 986 

In [4]:
# Quick summary of test dataset
dim(hex_test)
head(hex_test)
summary(hex_test$activity, exact_quantiles=TRUE)

activity,V1,V2,V3,V4,V5,V6,V7,V8,V9,⋯,V552,V553,V554,V555,V556,V557,V558,V559,V560,V561
STANDING,0.2571778,-0.02328523,-0.01465376,-0.938404,-0.9200908,-0.6676833,-0.9525011,-0.9252487,-0.6743022,⋯,0.07164545,-0.3303704,-0.7059739,0.006462403,0.16291982,-0.82588562,0.27115145,-0.7200093,0.276801,-0.0579783
STANDING,0.2860267,-0.01316336,-0.11908252,-0.9754147,-0.9674579,-0.9449582,-0.9867988,-0.9684013,-0.9458234,⋯,-0.40118872,-0.1218451,-0.5949439,-0.083494968,0.01749957,-0.43437455,0.92059323,-0.6980908,0.2813429,-0.08389801
STANDING,0.2754848,-0.02605042,-0.11815167,-0.993819,-0.9699255,-0.962748,-0.9944034,-0.970735,-0.9634827,⋯,0.06289131,-0.1904219,-0.6407357,-0.03495625,0.20230203,0.06410335,0.14506843,-0.7027715,0.280083,-0.0793462
STANDING,0.2702982,-0.03261387,-0.11752018,-0.9947428,-0.9732676,-0.9670907,-0.9952743,-0.974471,-0.9688974,⋯,0.11669529,-0.344418,-0.7361238,-0.017067021,0.15443783,0.34013408,0.29640709,-0.6989538,0.2841138,-0.077108
STANDING,0.274833,-0.02784779,-0.12952716,-0.9938525,-0.9674455,-0.978295,-0.9941114,-0.9659526,-0.977346,⋯,-0.12171128,-0.5346849,-0.8465952,-0.002222652,-0.04004639,0.73671509,-0.11854473,-0.692245,0.290722,-0.07385681
STANDING,0.2792199,-0.0186204,-0.11390197,-0.9944552,-0.9704169,-0.9653163,-0.9945851,-0.9694806,-0.9658969,⋯,0.08360294,-0.4935174,-0.8575645,-0.095680522,0.04884881,0.76068392,-0.07221636,-0.6898161,0.2948958,-0.0684707


 activity               
 LAYING            :537 
 STANDING          :532 
 WALKING           :496 
 SITTING           :491 
 WALKING_UPSTAIRS  :471 
 WALKING_DOWNSTAIRS:420 

<br>

## Step 3 - Build and evalutate a predictive model using H2O's Gradient Boosting Machine (GBM) algorithm

In [5]:
# Define target and features for model training
target <- "activity"
features <- setdiff(colnames(hex_train), target) # i.e. using the records of all 561 sensors

In [25]:
# Build a GBM model
model <- h2o.gbm(x = features,
                 y = target,
                 training_frame = hex_train,                 
                 model_id = "h2o_gbm",
                 ntrees = 1000,
                 learn_rate = 0.05,
                 learn_rate_annealing = 0.999,
                 max_depth = 7,
                 sample_rate = 0.9,
                 col_sample_rate = 0.9,
                 nfolds = 3,
                 fold_assignment = "Stratified",
                 stopping_metric = "logloss",
                 stopping_rounds = 5,
                 score_tree_interval = 10,
                 #balance_classes = TRUE,
                 seed = 1234)



In [27]:
# Print out model summary
model

Model Details:

H2OMultinomialModel: gbm
Model ID:  h2o_gbm 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1             290                     1740             1459646         1
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         7    6.99655          2         83    55.64828


H2OMultinomialMetrics: gbm
** Reported on training data. **

Training Set Metrics: 

Extract training frame with `h2o.getFrame("train.hex_sid_83c1_1")`
MSE: (Extract with `h2o.mse`) 1.02967e-11
RMSE: (Extract with `h2o.rmse`) 3.208847e-06
Logloss: (Extract with `h2o.logloss`) 1.142173e-06
Mean Per-Class Error: 0
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
Confusion Matrix: vertical: actual; across: predicted
                   LAYING SITTING STANDING WALKING WALKING_DOWNSTAIRS
LAYING               1407       0        0       0                  0
SITTING                 0    1286        0       0                  0
STANDIN

<br>

## Step 4 - Make and evalutate predictions

In [30]:
# Make predictions
yhat_test <- h2o.predict(model, hex_test)
head(yhat_test)



predict,LAYING,SITTING,STANDING,WALKING,WALKING_DOWNSTAIRS,WALKING_UPSTAIRS
STANDING,8.627448e-08,7.071596e-06,0.9999925,1.011563e-07,9.650777e-08,1.042965e-07
STANDING,1.300119e-08,4.897534e-07,0.9999995,1.45719e-08,1.443481e-08,1.518905e-08
STANDING,1.201143e-08,9.275131e-06,0.9999907,1.302589e-08,1.357375e-08,1.392128e-08
STANDING,1.151841e-08,1.61759e-06,0.9999983,1.249811e-08,1.302733e-08,1.338994e-08
STANDING,1.128372e-08,8.109414e-07,0.9999991,1.223912e-08,1.283396e-08,1.312683e-08
STANDING,2.985663e-08,7.707183e-06,0.9999922,3.320442e-08,3.409307e-08,3.472674e-08


In [31]:
# Evaluate predictions
h2o.performance(model, newdata = hex_test)

H2OMultinomialMetrics: gbm

Test Set Metrics: 

MSE: (Extract with `h2o.mse`) 0.06358653
RMSE: (Extract with `h2o.rmse`) 0.2521637
Logloss: (Extract with `h2o.logloss`) 0.3214465
Mean Per-Class Error: 0.07575778
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
Confusion Matrix: vertical: actual; across: predicted
                   LAYING SITTING STANDING WALKING WALKING_DOWNSTAIRS
LAYING                537       0        0       0                  0
SITTING                 0     401       89       0                  0
STANDING                0      40      492       0                  0
WALKING                 0       0        0     481                  4
WALKING_DOWNSTAIRS      0       0        0      10                378
WALKING_UPSTAIRS        0       1        0      24                  6
Totals                537     442      581     515                388
                   WALKING_UPSTAIRS  Error          Rate
LAYING                            0 0.0000 =  