# KIN 471 - Fitbit Sleep Analysis
## This is a script to process sleep data exported from your Fitbit and place it in a usable Pandas dataframe within Python. Unless otherwise requested, do not change the code within the code boxes.

### These are the libraries we are importing. They have special functions that have additional functionality beyond the built-in functions within Python. 
- os allows us to work with paths
- zipfile allows us to unzip files
- shutil allows us to copy and move files
- pandas allows us to work with data (from the initial json to a dataframe)

In [20]:
import os
import zipfile
import shutil
import pandas as pd

### This unzips the specified folder!

In [21]:
while True:
    zipName = input('Copy and paste the name of your zip file:')
    if os.path.exists(zipName):
        print('This is a valid zip file.')
        with zipfile.ZipFile(zipName, 'r') as zip_ref:
            zip_ref.extractall() 
            print('All unzipped and undone for you.')
            break
    else:
        print('That doesn\'t exist, try again.')

Copy and paste the name of your zip file: 20Jun26-Anton_Fitbit_Data_Export (1).zip


This is a valid zip file.
All unzipped and undone for you.


### Finds the unique data files inside the newly extracted folder

In [22]:
while True:
    extractFile = input('Copy and paste the name of the main folder than was extracted:')
    if os.path.exists(f"./"+extractFile+"/user-site-export"):
        myPath = f"./"+extractFile+"/user-site-export/"
        filenames = os.listdir(myPath)
        print(f"Parsing {len(filenames)} files for unique types of data.")
        unique_filenames = set()
        for f in filenames:
            unique_filenames.add(f.split("-")[0])
        print(f"Found " + str(len(unique_filenames))+" unique types of data.")
        for name in sorted(unique_filenames):
            print(name)
        break
    else:
        print('That doesn\'t seem right... try again.')

Copy and paste the name of the main folder than was extracted: AntonTrinh


Parsing 369 files for unique types of data.
Found 18 unique types of data.
altitude
badge.json
calories
distance
exercise
food_logs
heart_rate
height
lightly_active_minutes
moderately_active_minutes
resting_heart_rate
sedentary_minutes
sleep
steps
time_in_heart_rate_zones
very_active_minutes
water_logs
weight


### This will copy and paste the data files into a newly created folder
- The folder's name can be anything you want!

In [23]:
myName = input('Enter name:')
if os.path.exists(myName):
    print('This exists!')
else:
    shutil.copytree(myPath, myName)
    print('Copied the data in '+myName+' for you.')

Enter name: letest


Copied the data in letest for you.


### This places all the sleep specific data files into a pandas dataframe within Python

In [24]:
jsondf = []
print('Working on:')
count=0
for file in os.listdir(myName):
    if file.startswith('sleep'):
        print(file)
        jsondf.append(pd.read_json(myName+"/"+file))
    count += 1
df = pd.concat(jsondf)

Working on:
sleep-2017-05-30.json
sleep-2018-03-26.json
sleep-2019-02-19.json


### Here you can see what the raw data looks like!

In [25]:
df

Unnamed: 0,logId,dateOfSleep,startTime,endTime,duration,minutesToFallAsleep,minutesAsleep,minutesAwake,minutesAfterWakeup,timeInBed,efficiency,type,infoCode,levels,mainSleep
0,14887531885,2017-06-08,2017-06-07T22:50:00.000,2017-06-08T05:48:30.000,25080000,0,386,32,0,418,92,classic,0,"{'summary': {'restless': {'count': 10, 'minute...",True
1,14887531884,2017-06-07,2017-06-06T23:11:00.000,2017-06-07T06:48:00.000,27420000,0,409,48,0,457,89,classic,0,"{'summary': {'restless': {'count': 26, 'minute...",True
2,14820107248,2017-06-01,2017-05-31T23:01:30.000,2017-06-01T06:41:30.000,27600000,0,378,82,0,460,82,classic,0,"{'summary': {'restless': {'count': 36, 'minute...",True
3,14810849328,2017-05-31,2017-05-30T22:49:00.000,2017-05-31T06:28:00.000,27540000,0,394,64,1,459,86,classic,0,"{'summary': {'restless': {'count': 32, 'minute...",True
0,17964829932,2018-04-21,2018-04-20T23:24:00.000,2018-04-21T06:54:00.000,27000000,0,384,66,0,450,85,classic,0,"{'summary': {'restless': {'count': 30, 'minute...",True
1,17950685220,2018-04-20,2018-04-19T22:44:00.000,2018-04-20T05:33:00.000,24540000,0,367,42,0,409,90,classic,0,"{'summary': {'restless': {'count': 22, 'minute...",True
2,17939795699,2018-04-19,2018-04-18T22:21:00.000,2018-04-19T05:29:00.000,25680000,0,373,55,0,428,87,classic,0,"{'summary': {'restless': {'count': 30, 'minute...",True
3,17930492554,2018-04-18,2018-04-17T22:19:00.000,2018-04-18T05:27:00.000,25680000,0,384,42,2,428,90,classic,0,"{'summary': {'restless': {'count': 26, 'minute...",True
4,17917102503,2018-04-17,2018-04-16T22:37:00.000,2018-04-17T05:43:00.000,25560000,0,389,37,0,426,91,classic,0,"{'summary': {'restless': {'count': 18, 'minute...",True
0,21640626981,2019-03-20,2019-03-19T22:28:30.000,2019-03-20T05:34:00.000,25500000,0,388,37,0,425,91,classic,0,"{'summary': {'restless': {'count': 19, 'minute...",True


### Creating a function to get the minutes info out of the summary column

In [26]:
def get_minutes(levels, sleep_phase):
    if not levels.get('summary'):
        return None
    if not levels.get('summary').get(sleep_phase):
        return None
    if not levels.get('summary').get(sleep_phase).get('minutes'):
        return None
    return levels['summary'][sleep_phase]['minutes']

### Applies the function to columns within the summary column

In [27]:
for x in df.iloc[0].levels['summary']:
    df[x] = df.levels.apply(get_minutes, args=(x,))

# This converts the dateOfSleep column into the dataframe's index

In [29]:
df.dateOfSleep = pd.to_datetime(df.dateOfSleep)
df.set_index("dateOfSleep", drop=True, inplace=True)
df.sort_index(inplace=True)

### Removes our unused columns

In [30]:
df.drop(columns=([
    "logId", 
    "startTime", 
    "endTime", 
    "duration", 
    "minutesToFallAsleep", 
    "minutesAwake", 
    "minutesAfterWakeup", 
    "efficiency",
    "type",
    "infoCode",
    "levels",
    "mainSleep"
]), inplace=True)

### Look at our data now!

In [31]:
df

Unnamed: 0_level_0,minutesAsleep,timeInBed,restless,awake,asleep
dateOfSleep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-05-31,394,459,61.0,4.0,394
2017-06-01,378,460,78.0,4.0,378
2017-06-07,409,457,44.0,4.0,409
2017-06-08,386,418,29.0,3.0,386
2018-04-17,389,426,37.0,,389
2018-04-18,384,428,43.0,1.0,384
2018-04-19,373,428,45.0,10.0,373
2018-04-20,367,409,42.0,,367
2018-04-21,384,450,61.0,5.0,384
2019-02-24,216,493,272.0,5.0,216


### For easy access, we save our dataframe to a .csv file

In [32]:
with open(myName+'.csv', "w") as df_csv:
    df.to_csv(myName+'.csv') 