# Questions to guide analysis
* What are some trends in smart device usage?
* How could these trends apply to Bellabeat customers?
* How could these trends help influence Bellabeat marketing strategy?

# Business Task
Gain insight into how consumers use non-Bellabeat smart devices and apply these insights into better informing the Bellabeat marketing strategy toward the Leaf and Bellabeat app.

# Look into data
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set
contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of
personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes
information about daily activity, steps, and heart rate that can be used to explore users’ habits.

In [None]:
library(tidyverse) # metapackage of all tidyverse packages
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

list.files(path = "../input")

In [None]:
dailyAct <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
heartRateSecs <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
hourlyCal <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
hourlyInt <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
hourlySteps <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
minCalNarrow <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteCaloriesNarrow_merged.csv")
minIntNarrow <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteIntensitiesNarrow_merged.csv")
minStepsNarrow <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")
minMetNarrow <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteMETsNarrow_merged.csv")
minSleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteSleep_merged.csv")
sleep <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weightLog <- read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

In [None]:
head(dailyAct)

Data looks good and is ready for analysis.

# Re-formatting Data

### Adding New Date Data

The current date columns for most dataframes can not be used with ease. Thus, a change in format is needed to make it so. 
For all of the dataframes, two new columns, "date" and "weekday", were created to be able to use the date data more easily, and the weekday data will be useful for comparison over a given week. In the case of date-time columns, the "time" column was created for looking at time series visuals of activity.

In [None]:
# daily activity data
dailyAct$ActivityDate = as.POSIXct(dailyAct$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
dailyAct$date <- format(dailyAct$ActivityDate, format = "%m/%d/%y")
dailyAct$weekday <- format(dailyAct$ActivityDate, "%A")
dailyAct$weekday <- factor(dailyAct$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
dailyAct$TotalActiveMins <- dailyAct$VeryActiveMinutes + dailyAct$FairlyActiveMinutes + dailyAct$LightlyActiveMinutes

# daily sleep data
sleep$SleepDay = as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")
sleep$weekday <- format(sleep$SleepDay, "%A")
sleep$weekday <- factor(sleep$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# hourly calorie data
hourlyCal$ActivityHour = as.POSIXct(hourlyCal$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlyCal$time <- format(hourlyCal$ActivityHour, format="%H:%M:%S")
hourlyCal$date <- format(hourlyCal$ActivityHour, format="%m/%d/%y")
hourlyCal$weekday <- format(hourlyCal$ActivityHour, "%A")
hourlyCal$weekday <- factor(hourlyCal$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# hourly intensity data
hourlyInt$ActivityHour = as.POSIXct(hourlyInt$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlyInt$time <- format(hourlyInt$ActivityHour, format="%H:%M:%S")
hourlyInt$date <- format(hourlyInt$ActivityHour, format="%m/%d/%y")
hourlyInt$weekday <- format(hourlyInt$ActivityHour, "%A")
hourlyInt$weekday <- factor(hourlyInt$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# hourly steps data
hourlySteps$ActivityHour = as.POSIXct(hourlySteps$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlySteps$time <- format(hourlySteps$ActivityHour, format="%H:%M:%S")
hourlySteps$date <- format(hourlySteps$ActivityHour, format="%m/%d/%y")
hourlySteps$weekday <- format(hourlySteps$ActivityHour, "%A")
hourlySteps$weekday <- factor(hourlySteps$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every minute calorie data
minCalNarrow$ActivityMinute = as.POSIXct(minCalNarrow$ActivityMinute, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
minCalNarrow$time <- format(minCalNarrow$ActivityMinute, format="%H:%M:%S")
minCalNarrow$date <- format(minCalNarrow$ActivityMinute, format="%m/%d/%y")
minCalNarrow$weekday <- format(minCalNarrow$ActivityMinute, "%A")
minCalNarrow$weekday <- factor(minCalNarrow$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every minute intensity data
minIntNarrow$ActivityMinute = as.POSIXct(minIntNarrow$ActivityMinute, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
minIntNarrow$time <- format(minIntNarrow$ActivityMinute, format="%H:%M:%S")
minIntNarrow$date <- format(minIntNarrow$ActivityMinute, format="%m/%d/%y")
minIntNarrow$weekday <- format(minIntNarrow$ActivityMinute, "%A")
minIntNarrow$weekday <- factor(minIntNarrow$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every minute steps data
minStepsNarrow$ActivityMinute = as.POSIXct(minStepsNarrow$ActivityMinute, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
minStepsNarrow$time <- format(minStepsNarrow$ActivityMinute, format="%H:%M:%S")
minStepsNarrow$date <- format(minStepsNarrow$ActivityMinute, format="%m/%d/%y")
minStepsNarrow$weekday <- format(minStepsNarrow$ActivityMinute, "%A")
minStepsNarrow$weekday <- factor(minStepsNarrow$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every minute met data
minMetNarrow$ActivityMinute = as.POSIXct(minMetNarrow$ActivityMinute, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
minMetNarrow$time <- format(minMetNarrow$ActivityMinute, format="%H:%M:%S")
minMetNarrow$date <- format(minMetNarrow$ActivityMinute, format="%m/%d/%y")
minMetNarrow$weekday <- format(minMetNarrow$ActivityMinute, "%A")
minMetNarrow$weekday <- factor(minMetNarrow$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every minute sleep data
minSleep <- rename(minSleep, SleepTime=date)
minSleep$SleepTime = as.POSIXct(minSleep$SleepTime, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
minSleep$time <- format(minSleep$SleepTime, format="%H:%M:%S")
minSleep$date <- format(minSleep$SleepTime, format="%m/%d/%y")
minSleep$weekday <- format(minSleep$SleepTime, "%A")
minSleep$weekday <- factor(minSleep$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# every second heart rate data
heartRateSecs$Time = as.POSIXct(heartRateSecs$Time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
heartRateSecs$time <- format(heartRateSecs$Time, format="%H:%M:%S")
heartRateSecs$date <- format(heartRateSecs$Time, format="%m/%d/%y")
heartRateSecs$weekday <- format(heartRateSecs$Time, "%A")
heartRateSecs$weekday <- factor(heartRateSecs$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# weight log data
weightLog$Date = as.POSIXct(weightLog$Date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
weightLog$time <- format(weightLog$Date, format="%H:%M:%S")
weightLog$date <- format(weightLog$Date, format="%m/%d/%y")
weightLog$weekday <- format(weightLog$Date, "%A")
weightLog$weekday <- factor(weightLog$weekday, levels= c("Sunday", "Monday", 
    "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))


# Looking into the data

Determining if a dataset has enough dataset to work with

In [None]:
n_distinct(dailyAct$Id)
n_distinct(sleep$Id)
n_distinct(hourlyCal$Id)
n_distinct(hourlyInt$Id)
n_distinct(hourlySteps$Id)
n_distinct(minCalNarrow$Id)
n_distinct(minIntNarrow$Id)
n_distinct(minStepsNarrow$Id)
n_distinct(minMetNarrow$Id)
n_distinct(minSleep$Id)
n_distinct(heartRateSecs$Id)
n_distinct(weightLog$Id)

In [None]:
length(intersect(dailyAct$Id, weightLog$Id))
length(intersect(sleep$Id, weightLog$Id))
length(intersect(dailyAct$Id, heartRateSecs$Id))
length(intersect(sleep$Id, heartRateSecs$Id))
length(intersect(heartRateSecs$Id, weightLog$Id))

From the look of it, these observations become clear about the data:
* There looks to be not a lot of users that use the smart devices for daily activity and weight loss/heart rate
* There also look to be not a lot of users that use the smart devices for sleep and weight loss/heart rate
* The heart rate and weight log data do not have sufficient enough data to make a reasonable analysis from them
* The sleep by every minute data is oddly formatted in terms of the time data, having most of the data be by half-minute intervals

# Sorting Data

For all datasets, I am going to be sorting the data by date, and selecting the columns that would be relevant to the analysis

In [None]:
dailyAct <- dailyAct %>%
    select(-ActivityDate) %>%
    arrange(date)

sleep <- sleep %>%
    select(-SleepDay) %>%
    arrange(date)

hourlyCal <- hourlyCal %>%
    select(-ActivityHour) %>%
    arrange(date)

hourlyInt <- hourlyInt %>%
    select(-ActivityHour) %>%
    arrange(date)

hourlySteps <- hourlySteps %>%
    select(-ActivityHour) %>%
    arrange(date)

minCalNarrow <- minCalNarrow %>%
#    select(-ActivityMinute) %>%
    arrange(date)

minIntNarrow <- minIntNarrow %>%
#    select(-ActivityMinute) %>%
    arrange(date)

minStepsNarrow <- minStepsNarrow %>%
#    select(-ActivityMinute) %>%
    arrange(date)

minMetNarrow <- minMetNarrow %>%
#    select(-ActivityMinute) %>%
    arrange(date)

# Summary Statistics

The aim is to get a good idea of what data to target, and to better isolate the data that would help in the analysis.

### Daily Activity

In [None]:
# head(dailyAct)
dailyAct %>%
    select(TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, Calories) %>%
    summary()

dailyAct %>% 
    select(VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance) %>%
    summary()

dailyAct %>%
    select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes) %>%
    summary()

dailyAct <- dailyAct %>%
    select(-TrackerDistance, -LoggedActivitiesDistance, -SedentaryActiveDistance)

#### Observations:
1. Most participants do lighter exercise over very and moderate exercise in terms of distance (3.341 miles over 1.503 and 0.5675 miles, respectively)
2. In terms of active minutes, participants were highly sedentary (991 minutes ~ 16 hours)
3. The number of steps taken daily by participants is relatively low to the recommended amount of 10,000 steps
4. The calories burned are about what you would expect from a person doing light exercise

In [None]:
# head(sleep)
sleep %>%
    select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
    summary()

#### Observations
1. Most participants have a total of 1 sleep record a night, indicating they are not waking up in the middle of the night too often
2. The minimum and maximum values of both total minutes asleep and total time in bed is concerning
3. Looking at the summary values of the previously mentioned columns, they do not differ too much in terms of time, meaning that people spend a lot of time in their bed after waking

In [None]:
# head(hourlyCal)
hourlyCal %>%
    select(Calories) %>%
    summary()

# head(hourlyInt)
hourlyInt %>%
    select(TotalIntensity, AverageIntensity) %>%
    summary()

# head(hourlySteps)
hourlySteps %>%
    select(StepTotal) %>%
    summary()

#### Observations
1. The average calories burned is what you would expect from participants who do perform some type of exercise on a daily basis
2. The intensity and average intensity levels are low, showing yet again participants are doing less-intense exercise
3. The average amount of steps taken by participants is worryingly low

In [None]:
# head(minCalNarrow)
minCalNarrow %>%
    select(Calories) %>%
    summary()

# head(minIntNarrow)
minIntNarrow %>%
    select(Intensity) %>%
    summary()

# head(minStepsNarrow)
minStepsNarrow %>%
    select(Steps) %>%
    summary()

# head(minMetNarrow)
minMetNarrow %>%
    select(METs) %>%
    summary()

#### Observations
1. Calories burned per minute are low, meaning participants are sedentary more than not
2. Intensity levels and steps are just as low, which needs to be improved upon

# Merging Data

### Combine Activity Data with Sleep Data

In [None]:
activitySleep <- 
    merge(dailyAct, sleep, by=c('Id','date','weekday'))
head(activitySleep)

### Combing Hourly data

In [None]:
merge_1 <- merge(hourlyCal, hourlyInt, by=c('Id','date','weekday','time'))
hourlyData <- merge(merge_1, hourlySteps, by=c('Id','date','weekday','time'))
head(hourlyData)

### Combing Minute data

In [None]:
merge_1 <- merge(minCalNarrow, minIntNarrow, by=c('Id','date','weekday','time', 'ActivityMinute'))
head(merge_1)

In [None]:
merge_2 <- merge(merge_1, minStepsNarrow, by=c('Id','date','weekday','time', 'ActivityMinute'))
head(merge_2)

In [None]:
minuteData <- merge(merge_2, minMetNarrow, by=c('Id','date','weekday','time', 'ActivityMinute'))
head(minuteData)

# Visualizations
It should be noted that correlation does not equal causation, and further data would be needed to gain better insights into how users use their smart devices.

### Activity and Sleep data



In [None]:
head(dailyAct)
head(activitySleep)

In [None]:
# Total steps and calories burned
stepsCals <- 
    dailyAct %>%
    ggplot(aes(x=TotalSteps, y=Calories)) + 
    geom_point() + 
    geom_smooth()
stepsCals + labs(title="Total Steps vs. Calories Burned") + xlab("Total Steps") + ylab("Calories Burned")

 As shown, there is a positive correlation between Total Steps and Calories Burned, indicating that obviously the more steps you take, the more calories you burn. The instances where participants took few steps but burned many calories may indicate they participated in biking or another stationary exercise. **The Bellabeat app can keep records of personal achievement to tell the user their highest step count, calories burned, etc. to further motivate them.**

In [None]:
# Total distance and calories burned
distCals <- 
    dailyAct %>%
    ggplot(aes(x=TotalDistance, y=Calories)) + 
    geom_point() + 
    geom_smooth() 
distCals + labs(title="Total Distance vs. Calories Burned") + xlab("Total Distance (miles)") + ylab("Calories Burned")

Similar to the previous code snippets, the more distance a person covers, the more calories they burn. However, it does depend on the duration and intensity of the exercise, since sitting still burns calories. **This does allow Bellabeat to potentially put in notifications of how many calories a customer has burned over each mile covered, encouraging the customer to keep going.**

Now, we are going to look into the distance with intensity and calories burned.

In [None]:
VADCals <-
    dailyAct %>%
    ggplot(aes(x=VeryActiveDistance, y=Calories)) + 
    geom_point() + 
    geom_smooth(method='loess')
VADCals + labs(title="Very Active Distance vs. Calories Burned") + xlab("Total Distance (miles)") + ylab("Calories Burned")

MADCals <-
    dailyAct %>%
    ggplot(aes(x=ModeratelyActiveDistance, y=Calories)) + 
    geom_point() + 
    geom_smooth(method='loess')
MADCals + labs(title="Moderately Active  vs. Calories Burned") + xlab("Total Distance (miles)") + ylab("Calories Burned")

LADCals <-
    dailyAct %>%
    ggplot(aes(x=LightActiveDistance, y=Calories)) + 
    geom_point() + 
    geom_smooth(method='loess')
LADCals + labs(title="Light Active Distance vs. Calories Burned") + xlab("Total Distance (miles)") + ylab("Calories Burned")

The data shows that high activity are short in distance but do burn the most amount of calories. Light exercise is the most popular form of exercise amongst participants. This shows that people burn more calories over a shorter distance due to higher intensity levels, and people can burn the same amount of calories with a greater distance covered. **One suggestion would be to tell a customer their overall pace and stats throughout the exercise, be it running, walking, biking, etc. This could include miles/min, calories/hr, and so on**


Next, we are going to look at how long a participant is sedantary versus the total minutes they are active.

In [None]:
VACals <-    
    dailyAct %>%
    group_by(Id) %>%
    summarise(avg_VaMins=mean(VeryActiveMinutes), avg_cal=mean(Calories)) %>%
    ggplot(aes(x=avg_VaMins, y=avg_cal)) + 
    geom_point() + 
    geom_smooth() 
VACals + labs(title="Very Active vs. Calories Burned") + xlab("Active Avg. Minutes") + ylab("Calories Burned")

FACals <- 
    dailyAct %>%
    group_by(Id) %>%
    summarise(avg_FaMins=mean(FairlyActiveMinutes), avg_cal=mean(Calories)) %>%
    ggplot(aes(x=avg_FaMins, y=avg_cal)) + 
    geom_point() + 
    geom_smooth() 
FACals + labs(title="Fairly Active vs. Calories Burned") + xlab("Active Avg. Minutes") + ylab("Calories Burned")

LACals <- 
    dailyAct %>%
    group_by(Id) %>%
    summarise(avg_LaMins=mean(LightlyActiveMinutes), avg_cal=mean(Calories)) %>%
    ggplot(aes(x=avg_LaMins, y=avg_cal)) + 
    geom_point() + 
    geom_smooth() 
LACals + labs(title="Lightly Active vs. Calories Burned") + xlab("Active Avg. Minutes") + ylab("Calories Burned")

More proof that the more time spent performing more rigorous exercise burns more calories over less time. 

In [None]:
sedVsActive <- 
    dailyAct %>%
    ggplot(aes(x=TotalActiveMins, y=SedentaryMinutes)) + 
    geom_point() +
    geom_smooth(method='loess')
sedVsActive + labs(title="Sedentary Minutes vs. Active Minutes") + xlab("Active Minutes") + ylab("Sedentary Minutes")

sedCals <-
    dailyAct %>%
    ggplot(aes(x=SedentaryMinutes, y=Calories)) + 
    geom_point() + 
    geom_smooth(method='loess')
sedCals + labs(title="Sedentary Minutes vs. Calories Burned") + xlab("Sedentary Minutes") + ylab("Calories Burned")

activeCals <-
    dailyAct %>%
    ggplot(aes(x=TotalActiveMins, y=Calories)) + 
    geom_point() + 
    geom_smooth(method='loess')
activeCals + labs(title="Total Active Minutes vs. Calories Burned") + xlab("Active Minutes") + ylab("Calories Burned")


m the looks of the graph, the obvious becomes clearer: the more sedentary you are, the less active you will be, and the more active you are, the healthier you will be. The data shows that most participants hover in the range of 500-1000 sedentary minutes (~8-16 hours), and 200-400 active minutes (3-6 hours), meaning that the participants were active less than half of the time or more than when they are doing hardly any exercise. **The Leaf and the Bellabeat app can come into play with improving these numbers, in which customers are sent notifications to stand up and move around, or merely just stand up.**

The next thing to look into is to see how sleep affects activity. Let's look at how each of the participants slept throughout the week to get a good idea of how well they slept (total minutes asleep and total sleep records).

In [None]:
asleep <- 
    activitySleep %>%
    group_by(weekday) %>%
    summarise(avg_min=mean(TotalMinutesAsleep)) %>%
    ggplot(aes(x=weekday, y=avg_min)) + 
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=round(avg_min,2)), position=position_dodge(width=0.9), vjust=-0.25)
asleep + labs(title="Total Minutes Asleep over Weekday") + xlab("Weekday") + ylab("Average Minutes Asleep")

asleepActivity <-
    activitySleep %>%
    group_by(weekday) %>%
    summarise(avg_min=mean(TotalActiveMins)) %>%
    ggplot(aes(x=weekday, y=avg_min)) + 
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=round(avg_min,2)), position=position_dodge(width=0.9), vjust=-0.25)
asleepActivity + labs(title="Total Minutes Active over Weekday") + xlab("Weekday") + ylab("Average Active Minutes")

Looking at the graphs, we see that participants slept the most on Sundays, while also exercising the least, as well. This could indicate that the participants treat Sunday as a rest day. Participants also got more than 6 hours of sleep three out of the seven days of the week, indicating that there is a lack of required sleep between the participants. It also looks like Monday is a common start for participants to start off the right foot, following by Tuesday having similar activity. Then, participants drop in terms of activity up until Saturday, which is the most active day. Most importantly, participants slept the second-most minutes and were active the third-least minutes on Wednesday, indicating there probably is some burn out occuring do the lack of sleep and high activity levels on Monday and Tuesday. **This could be important for Bellabeat to target these customers that have trouble sleeping but are trying to stay active. The Bellabeat app could help customers show their sleep and activity patterns, which coud help the customer see what they lack in, be that sleep or general activity.** 

Now, let's see how total sleep affects total calories/steps.

In [None]:
asleepCalories <- 
    activitySleep %>%
    ggplot(aes(x=TotalMinutesAsleep, y=Calories)) + 
    geom_point() +
    geom_smooth() + 
    geom_vline(xintercept=360) +
    geom_vline(xintercept=480) 
asleepCalories + labs(title="Total Minutes Asleep vs. Calories Burned") + xlab("Minutes Asleep") + ylab("Calories Burned")

asleepSteps <-
    activitySleep %>%
    ggplot(aes(x=TotalMinutesAsleep, y=TotalSteps)) + 
    geom_point() +
    geom_smooth() + 
    geom_vline(xintercept=360) +
    geom_vline(xintercept=480) 
asleepSteps + labs(title="Total Minutes Asleep vs. Total Steps") + xlab("Minutes Asleep") + ylab("Total Steps")

From the graphs, it would appear that when participants got the recommended 6-8 hours of sleep, they burned more calories and took more steps. This is indicated by the bigger grouping of data points between the 6-8 hours of sleep range, showing that more sleep led to the participants doing more.**This should indicate to Bellabeat that they should let users know that they can track their sleep to show that there is a connection between being active and the amount of sleep you get.**

Now, let's see how total sleep affects total active minutes.

In [None]:
asleepActive <-
    activitySleep %>%
    ggplot(aes(x=TotalMinutesAsleep, y=TotalActiveMins)) + 
    geom_point() +
    geom_smooth() + 
    geom_vline(xintercept=360) +
    geom_vline(xintercept=480) 
asleepActive + labs(title="Total Minutes Asleep vs. Total Active Minutes ") + xlab("Minutes Asleep") + ylab("Active Minutes")

asleepDistance <-  
    activitySleep %>%
    ggplot(aes(x=TotalMinutesAsleep, y=TotalDistance)) + 
    geom_point() +
    geom_smooth() + 
    geom_vline(xintercept=360) +
    geom_vline(xintercept=480) 
asleepDistance + labs(title="Total Minutes Asleep vs. Total Distance") + xlab("Minutes Asleep") + ylab("Total Distance (miles)")

The graph does show that if people sleep between 6-8 hours (recommended amount), then their activity levels stay right around 200-400 minutes/2.5-10 miles of exercise (or more). Anything more in terms of sleep, and their activity levels drop significantly. In terms of less sleep, it would require more information about the overall quality of the exercise the participant was doing. **Regardless, this shows that better sleep can lead to better performance. So, Bellabeat should provide customers a smart sleep interface to show users how well they slept, and compare to how they did other days in terms of activity levels.**

Now, it is time to see how sleep affects being sedentary.

In [None]:
asleepSedentary <- 
    activitySleep %>%
    ggplot(aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) + 
    geom_point() +
    geom_smooth() + 
    geom_vline(xintercept=360) +
    geom_vline(xintercept=480)
asleepSedentary + labs(title="Total Minutes Asleep vs. Total Minutes Sedentary") + xlab("Minutes Asleep") + ylab("Sedentary Minutes")

Conversely about the point prior, no matter how much sleep you get, you will spend a lot of time being sedentary. Although these participants are being active for the most part, they could be working or doing other things that can lead to them doing nothing for long periods of time, which could prove stressful. **Thus, the Leaf and Bellabeat app can provide users a notification to spend more time outside or moving, to lessen the time spent sitting down, etc. in order to lessen stress and improve health.**

Next, we are going to look into if the amount of sleep records determines your activity level.

In [None]:
sleepRecs <-
    activitySleep %>%
    group_by(TotalSleepRecords) %>%
    summarise(avg_min=mean(TotalActiveMins)) %>%
    ggplot(aes(x=TotalSleepRecords, y=avg_min, fill=avg_min)) +
    geom_col(position="dodge")
sleepRecs + labs(title="Total Sleep Records vs. Total Active Minutes") + xlab("Total Sleep Records") + ylab("Active Minutes")

sleepRecsAsleep <-
    activitySleep %>%
    group_by(TotalSleepRecords) %>%
    summarise(avg_sleep=mean(TotalMinutesAsleep)) %>%
    ggplot(aes(x=TotalSleepRecords, y=avg_sleep, fill=avg_sleep)) +
    geom_col(position="dodge")
sleepRecsAsleep + labs(title="Total Sleep Records vs. Total Minutes Asleep") + xlab("Total Sleep Records") + ylab("Minutes Asleep")

sleepRecsSedentary <-
    activitySleep %>%
    group_by(TotalSleepRecords) %>%
    summarise(avg_sed=mean(SedentaryMinutes)) %>%
    ggplot(aes(x=TotalSleepRecords, y=avg_sed, fill=avg_sed)) +
    geom_col(position="dodge")
sleepRecsSedentary + labs(title="Total Sleep Records vs. Sedentary Minutes") + xlab("Total Sleep Records") + ylab("Sedentary Minutes")


Let's look at average number of sleep records per weekday

In [None]:
sleepRecWeekday <-
    activitySleep %>%
    group_by(weekday) %>%
    summarise(sumSR = sum(TotalSleepRecords)) %>%
    ggplot(aes(x=weekday, y=sumSR)) + 
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=sumSR), position=position_dodge(width=0.9), vjust=-0.25)
sleepRecWeekday + labs(title="Sleep Records per Weekday") + xlab("Weekday") + ylab("Average Sleep Records")

These graphs show the importance of continuous sleep, being that the more sleep records you have, the worse you will be off. One peculiar finding is that more sleep records equate to less sedentary minutes, although it is a small difference. An argument could also be made that if you get more restful sleep, than you would be more active, equating to more sedentary minutes accumulated. Otherwise, it is important for users to stay asleep throughout the night, since restful sleep equates to less time in the bed, and more time getting after your day. Also, as indicated by the total minutes asleep and active across the week, Wednesday shows that participants are not getting a restful nights sleep, which hurts their performance when they exercise. **So, the Bellabeat app could help track users sleep in order to help them see when they get up at night, and helps guide them towards methods to improve their sleep.**

What is next is to look into the hourly data to help see if users use their devices at specific times, getting a glimpse into their exercise routines.

### Hourly Data

Let's look briefly into the hourly data to get a better sense of the participants routines, and see if we can get a better sense of what they do.

In [None]:
head(hourlyData)

To begin, let's look at the average calories burned across the day overall.

In [None]:
calByHour <- 
    hourlyData %>%
    group_by(time) %>%
    summarise(avg_cal = mean(Calories)) %>%
    ggplot(aes(x=time, y=avg_cal)) +     
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=round(avg_cal,0)), position=position_dodge(width=0.9), vjust=0.50, hjust=-0.01, angle=90) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
calByHour + labs(title="Average Calories Burned over Time") + xlab("Hour") + ylab("Average Calories Burned")

Next, let's look at calories burned by hour per weekday.

In [None]:
calHourDoW <- 
    hourlyData %>%
    group_by(time, weekday) %>%
    summarise(avg_cal = mean(Calories)) %>%
    ggplot(aes(x=time, y=avg_cal)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    facet_wrap(~weekday)
calHourDoW + labs(title="Average Calories Burned by Weekday") + xlab("Hour") + ylab("Average Calories Burned")

From the graphs, it would appear that participants burned most of their calories between 10:00am-7:00pm.

Let's look into the intensity levels and steps taken by participants to see if this is still the case.

In [None]:
stepsByHour <- 
    hourlyData %>%
    group_by(time) %>%
    summarise(avg_steps = mean(StepTotal)) %>%
    ggplot(aes(x=time, y=avg_steps)) +     
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=round(avg_steps,0)), position=position_dodge(width=0.9), vjust=0.50, hjust=-0.01, angle=90) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
stepsByHour + labs(title="Average Steps over Time") + xlab("Hour") + ylab("Average Steps")

stepsHourDoW <- 
    hourlyData %>%
    group_by(time, weekday) %>%
    summarise(avg_steps = mean(StepTotal)) %>%
    ggplot(aes(x=time, y=avg_steps)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    facet_wrap(~weekday)
stepsHourDoW + labs(title="Average Steps by Weekday") + xlab("Hour") + ylab("Average Steps")

Now for intensity levels over time.

In [None]:
intByHour <- 
    hourlyData %>%
    group_by(time) %>%
    summarise(avg_int = mean(TotalIntensity)) %>%
    ggplot(aes(x=time, y=avg_int)) +     
    geom_bar(position = 'dodge', stat='identity') +
    geom_text(aes(label=round(avg_int,0)), position=position_dodge(width=0.9), vjust=0.50, hjust=-0.01, angle=90) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
intByHour + labs(title="Average Intensity over Time") + xlab("Hour") + ylab("Average Intensity")

intHourDoW <- 
    hourlyData %>%
    group_by(time, weekday) %>%
    summarise(avg_int = mean(TotalIntensity)) %>%
    ggplot(aes(x=time, y=avg_int)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    facet_wrap(~weekday)
intHourDoW + labs(title="Average Intensity by Weekday") + xlab("Hour") + ylab("Average Intensity")

Overall, it still holds true that participants are active through 10:00am-7:00pm, where most of the activity coming from lunchtime (10:00am-12:00pm) and after work (5:00pm-7:00pm). **This could be important for Bellabeat to allow customers to set a routine that the smart devices could track to remind them to stay active.**

### Minute Data

Now, let's look at some of the minute data to see if there are any interesting pieces of information to get from it.

In [None]:
head(minuteData)

Let's look at calories burned per minute overall and by weekday.

In [None]:
calByMin <- 
    minuteData %>%
    group_by(time) %>%
    summarise(avg_cal = mean(Calories)) %>%
    ggplot(aes(x=time, y=avg_cal)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.ticks = element_blank(), axis.text.x = element_blank())
calByMin + labs(title="Average Calories Burned over Time") + xlab("Minute") + ylab("Average Calories Burned")

stepsByMin <- 
    minuteData %>%
    group_by(time) %>%
    summarise(avg_steps = mean(Steps)) %>%
    ggplot(aes(x=time, y=avg_steps)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.ticks = element_blank(), axis.text.x = element_blank())
stepsByMin + labs(title="Average Steps over Time") + xlab("Minute") + ylab("Average Steps")

intByMin <- 
    minuteData %>%
    group_by(time) %>%
    summarise(avg_int = mean(Intensity)) %>%
    ggplot(aes(x=time, y=avg_int)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.ticks = element_blank(), axis.text.x = element_blank())
intByMin + labs(title="Average Intensity over Time") + xlab("Minute") + ylab("Average Intensity")

metsByMin <- 
    minuteData %>%
    group_by(time) %>%
    summarise(avg_mets = mean(METs)) %>%
    ggplot(aes(x=time, y=avg_mets)) +     
    geom_bar(position = 'dodge', stat='identity') +
    theme(axis.ticks = element_blank(), axis.text.x = element_blank())
metsByMin + labs(title="Average METs over Time") + xlab("Minute") + ylab("Average METs")

As shown, most of the activity comes later in the day, indicating that participants use this time to relieve stress and can in a workout. **This can allow Bellabeat to focus on this type of audience: people who use smart devices to relieve stress in the form of exercise.**

# Summarizing Findings for Recommendations

It is clear enough that smart devices have a place in people's fitness and overall lives. With the ability to track activity, sleep, stress, and time, smart devices have an immense impact on how people approach fitness and their well-being. With this in mind, there is still a lot of potential in smart devices, as although they do help keep people motivated, there is still aspects of smart devices that people do not utilize or know of.

With this analysis, it will hopefully bring to light some important observations that will help the Bellabeat marketing team focus on certain aspects of their smart devices that will their devices more marketable to consumers.

### Key Findings
1. Most of the activity from participants came between 10:00am-7:00pm, with most of the activity coming after work hours (5:00pm-7:00pm). This can help Bellabeat in that the Leaf or app can remind the user to walk or run
2. Participants on average had their more active days Monday, Tuesday, and Saturday - which is not good if most of the week they do the bare minimum exercise. The Bellabeat app could help give the user a summary of their total activity each week by weekday to show the user their activity levels. This can help them make adjustments, as needed.
3. Going off the activity levels of each participants, the sleep quality of the participants showed that they did not get the recommended amount of sleep of 6-8 hours most days of the week. The Bellabeat app can send reminders to the users to go to bed in order to get rest.
4. Going further into sleep, the total sleep records of the participants was concerning. It showed that the less sleep records you had the more active you were, but the more sleep records you had the more you slept in, leading to more sedentary minutes. The Bellabeat app can show an overall look into how a user sleeps each night, showing when they were in deep sleep or awake, in order to allow the user to make adjustments to their sleep schedules.
5. The average steps taken by the participants was 7638, which is well-below the recommended amount of steps each day of 10,000. Although steps taken is not an overall indicator of health, it can help users keep an active lifestyle. In that way, the Bellabeat app or Leaf can send reminders to users to get up and move around.
6. Most of the participats performed more light exercise more than rigorous exercise. The Bellabeat marketing team can target these types of users in that the Bellabeat app or Leaf is a great device for beginners to start their fitness journey.