# September-November 2017 Exploratory Analysis for My Data Project

## Introduction

The purpose of this notebook is to conduct some preliminary analysis into both the Total Exercise and Total Sleep datasets as part of the 'My Data' project. This data has been imported from my Fitbit dashboard as CSV and XSLX files. I have conducted some preliminary data manipulation in order to provide more context for the data. The raw files can be found in the [project's github repository](https://github.com/willvelida/mydata/tree/master/datasets).

We are interested in the following questions:

- How many more calories do we burn on active days than we do on rest days?
- Does the amount of steps we take, amount of floors climbed and the amount of distance we cover have an effect on calorie burn?
- Is there a difference in calorie burn between the months?
- Do we burn more during the week than we do on the weekend?

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import requests

In [4]:
# Read the Exercise data into a DataFrame: exercise_df
exercise_df = pd.read_csv("TotalExercise.csv")

# Print the head of exercise_df
print(exercise_df.head())

# Print the shape of exercise_df
print(exercise_df.shape)

# Print the columns of df
print(exercise_df.columns)

        Date Calories Burned  Steps  Distance  Floors Minutes Sedentary  \
0  1/09/2017            3721  13030      9.91       5               498   
1  2/09/2017            3015   8147      6.35       3               732   
2  3/09/2017            3837  13857     10.79       8               480   
3  4/09/2017            4274  12846      9.79       8               593   
4  5/09/2017            3519  11041      8.62       7              1044   

   Minutes Lightly Active  Minutes Fairly Active  Minutes Very Active  \
0                     460                     13                   10   
1                     223                     32                    2   
2                     414                     26                   25   
3                     334                     18                   87   
4                     380                     10                    6   

  Activity Calories  
0              2266  
1              1245  
2              2379  
3              2757  


### Summary Statistics

In [5]:
# Describe the data
print(exercise_df.describe())

        Distance     Floors  Minutes Lightly Active  Minutes Fairly Active  \
count  91.000000  91.000000               91.000000              91.000000   
mean    6.637582   5.472527              288.274725              14.824176   
std     2.714200   5.481972               92.904379              15.172778   
min     0.760000   0.000000               45.000000               0.000000   
25%     4.580000   1.000000              221.000000               0.000000   
50%     6.350000   5.000000              295.000000              12.000000   
75%     8.060000   7.500000              350.000000              27.000000   
max    15.090000  29.000000              486.000000              66.000000   

       Minutes Very Active  
count            91.000000  
mean             21.835165  
std              25.377708  
min               0.000000  
25%               0.000000  
50%              10.000000  
75%              44.000000  
max              87.000000  


From this inital analysis, we can infer the following statistics:

- The mean distance covered is **6.6 miles per day.**
- The mean amount of floors climbed is **5.5 floors per day.**
- The mean amount of minutes of light activity is **288.3 minutes per day.**
- The mean amount of minutes of fairly active activity is **14.8 minutes per day.**
- The mean amount of minutes of very active activity is **21.8 minutes per day.**