# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 

Enjoy! 

--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

In [1]:
# %run -i 'apple-health-data-parser' 'export.xml' 
%run -i 'apple-health-data-parser' 'data/export.xml' 

-----

# Apple Health Data Check and Simple Data Analysis

In [2]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [3]:
weight = pd.read_csv("data/BodyMass.csv")

In [4]:
weight.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
58,HealthFace,6966,"<<HKDevice: 0x1c4293bf0>, name:Apple Watch, ma...",BodyMass,lb,2018-04-17 16:29:38 +0800,2018-04-17 16:29:38 +0800,2018-04-17 16:29:38 +0800,172.6
59,BP,6.1.1,,BodyMass,lb,2018-04-24 08:10:51 +0800,2018-04-24 08:10:14 +0800,2018-04-24 08:10:14 +0800,172.6
60,Workflow,508,,BodyMass,lb,2018-04-25 08:04:36 +0800,2018-04-25 08:04:36 +0800,2018-04-25 08:04:36 +0800,173.945
61,BP,6.1.1,,BodyMass,lb,2018-04-25 08:07:11 +0800,2018-04-25 08:07:01 +0800,2018-04-25 08:07:01 +0800,173.9
62,Workflow,508,,BodyMass,lb,2018-04-30 11:33:33 +0800,2018-04-30 11:33:33 +0800,2018-04-30 11:33:33 +0800,173.283


In [5]:
weight.describe()

Unnamed: 0,value
count,63.0
mean,172.494619
std,1.989693
min,166.229
25%,171.63
50%,172.842
75%,174.165
max,176.149


----

## Steps

In [6]:
steps = pd.read_csv("data/StepCount.csv")

In [7]:
len(steps)

59129

In [8]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [9]:
steps.describe()

Unnamed: 0,value
count,59129.0
mean,133.0126
std,263.785141
min,1.0
25%,20.0
50%,57.0
75%,118.0
max,1612.0


In [10]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
59124,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4881270>, name:Apple Watch, ma...",StepCount,count,2018-04-30 18:36:00 +0800,2018-04-30 18:29:18 +0800,2018-04-30 18:30:29 +0800,24
59125,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4881310>, name:Apple Watch, ma...",StepCount,count,2018-04-30 18:36:00 +0800,2018-04-30 18:30:29 +0800,2018-04-30 18:31:31 +0800,19
59126,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c48813b0>, name:Apple Watch, ma...",StepCount,count,2018-04-30 18:36:00 +0800,2018-04-30 18:31:31 +0800,2018-04-30 18:32:48 +0800,37
59127,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c429f360>, name:Apple Watch, ma...",StepCount,count,2018-04-30 18:36:00 +0800,2018-04-30 18:32:48 +0800,2018-04-30 18:33:49 +0800,14
59128,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c429f180>, name:Apple Watch, ma...",StepCount,count,2018-04-30 18:36:00 +0800,2018-04-30 18:33:49 +0800,2018-04-30 18:35:42 +0800,3


In [11]:
# total all-time steps
steps.value.sum()

7864902

-------

## Stand Count

In [12]:
stand = pd.read_csv("data/AppleStandHour.csv")

In [13]:
len(stand)

8806

In [14]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [15]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [16]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
8801,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4686ef0>, name:Apple Watch, ma...",AppleStandHour,,2018-04-30 16:00:08 +0800,2018-04-30 15:00:00 +0800,2018-04-30 16:00:00 +0800,HKCategoryValueAppleStandHourIdle
8802,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4688bb0>, name:Apple Watch, ma...",AppleStandHour,,2018-04-30 17:03:20 +0800,2018-04-30 16:00:00 +0800,2018-04-30 17:00:00 +0800,HKCategoryValueAppleStandHourIdle
8803,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4685f00>, name:Apple Watch, ma...",AppleStandHour,,2018-04-30 17:59:51 +0800,2018-04-30 17:00:00 +0800,2018-04-30 18:00:00 +0800,HKCategoryValueAppleStandHourStood
8804,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c4c942d0>, name:Apple Watch, ma...",AppleStandHour,,2018-04-30 18:00:52 +0800,2018-04-30 18:00:00 +0800,2018-04-30 19:00:00 +0800,HKCategoryValueAppleStandHourStood
8805,"“马克\的 iPhone""",11.3,,AppleStandHour,,2018-04-30 18:52:26 +0800,2018-04-30 18:00:00 +0800,2018-04-30 19:00:00 +0800,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [17]:
restingHR = pd.read_csv("data/RestingHeartRate.csv")

In [18]:
len(restingHR)

300

In [19]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,300.0
mean,,46.286667
std,,4.017248
min,,38.0
25%,,43.0
50%,,46.0
75%,,49.0
max,,60.0


---

## Walking Heart Rate (HR) Average

In [20]:
walkingHR = pd.read_csv("data/WalkingHeartRateAverage.csv")

In [21]:
len(walkingHR)

185

In [22]:
walkingHR.describe()

Unnamed: 0,sourceVersion,device,value
count,185.0,0.0,185.0
mean,4.092432,,80.927027
std,0.072584,,12.104564
min,4.0,,60.0
25%,4.1,,73.0
50%,4.1,,78.0
75%,4.1,,86.0
max,4.3,,135.0


---

## Heart Rate Variability (HRV)

In [23]:
hrv = pd.read_csv("data/HeartRateVariabilitySDNN.csv")

In [24]:
len(hrv)

1216

In [25]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [26]:
hrv.describe()

Unnamed: 0,sourceVersion,value
count,1216.0,1216.0
mean,4.1,35.646432
std,0.065168,18.154448
min,4.0,8.46316
25%,4.1,23.1436
50%,4.1,31.748
75%,4.1,43.29695
max,4.3,178.671


In [27]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1211,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0c8dc50>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 04:23:46 +0800,2018-04-30 04:22:40 +0800,2018-04-30 04:23:45 +0800,12.5996
1212,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0684920>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 06:23:48 +0800,2018-04-30 06:22:47 +0800,2018-04-30 06:23:48 +0800,32.791
1213,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0c9a5e0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 08:24:10 +0800,2018-04-30 08:23:05 +0800,2018-04-30 08:24:10 +0800,22.8008
1214,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c06932e0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 12:37:02 +0800,2018-04-30 12:35:57 +0800,2018-04-30 12:37:02 +0800,110.704
1215,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0697c00>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 17:20:02 +0800,2018-04-30 17:18:57 +0800,2018-04-30 17:20:01 +0800,37.1214


-------

## VO2 Max

In [28]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [29]:
len(vo2max)

143

In [30]:
vo2max.describe()

Unnamed: 0,sourceVersion,device,value
count,0.0,0.0,143.0
mean,,,51.085681
std,,,1.900692
min,,,48.0084
25%,,,49.3646
50%,,,51.0986
75%,,,52.3505
max,,,55.0978


----

## Blood Pressure

In [31]:
diastolic = pd.read_csv("data/BloodPressureDiastolic.csv")
systolic = pd.read_csv("data/BloodPressureSystolic.csv")

In [32]:
diastolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,65.586207
std,,5.0816
min,,55.0
25%,,63.0
50%,,67.0
75%,,69.0
max,,76.0


In [33]:
systolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,113.206897
std,,8.973689
min,,95.0
25%,,106.0
50%,,112.0
75%,,122.0
max,,128.0


------

## Sleep

In [34]:
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [35]:
sleep.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1807,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-28 11:34:16 +0800,2018-04-28 10:23:00 +0800,2018-04-28 10:47:00 +0800,HKCategoryValueSleepAnalysisAsleep
1808,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-29 08:17:12 +0800,2018-04-29 00:27:00 +0800,2018-04-29 08:12:00 +0800,HKCategoryValueSleepAnalysisInBed
1809,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-29 08:17:12 +0800,2018-04-29 00:27:00 +0800,2018-04-29 08:12:00 +0800,HKCategoryValueSleepAnalysisAsleep
1810,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-30 10:04:58 +0800,2018-04-30 00:45:00 +0800,2018-04-30 08:43:00 +0800,HKCategoryValueSleepAnalysisInBed
1811,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-30 10:04:58 +0800,2018-04-30 00:45:00 +0800,2018-04-30 08:43:00 +0800,HKCategoryValueSleepAnalysisAsleep


In [36]:
sleep.describe()

Unnamed: 0,device,unit
count,0.0,0.0
mean,,
std,,
min,,
25%,,
50%,,
75%,,
max,,
