In [None]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list.files(path = "../input")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Loading the libraries

In [None]:
library(tidyverse)
library(lubridate)
library(reshape2)

# Loading the dataset

In [None]:
get_file_path <- function(file_name){
    base_path <- '/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16'
    return (paste(base_path, file_name, sep="/"))
}

In [None]:
daily_activity_df <- read_csv(get_file_path('dailyActivity_merged.csv'))
hourly_calories_df <- read_csv(get_file_path('hourlyCalories_merged.csv'))
hourly_intensities_df <- read_csv(get_file_path('hourlyIntensities_merged.csv'))
daily_sleep_df <- read_csv(get_file_path('sleepDay_merged.csv'))
weight_info_df <- read_csv(get_file_path('weightLogInfo_merged.csv'))

In [None]:
head(daily_activity_df)

- Total steps, calories and intensities is already merged in the dataset so we won't need **dailyCalories_merged.csv, dailyIntensities_merged.csv, dailySteps_merged.csv**

# Data Preparation

#### Daily activity dataset

In [None]:
daily_activity_df$TotalSteps = as.integer(daily_activity_df$TotalSteps)
daily_activity_df$VeryActiveMinutes = as.integer(daily_activity_df$VeryActiveMinutes)
daily_activity_df$FairlyActiveMinutes = as.integer(daily_activity_df$FairlyActiveMinutes)
daily_activity_df$LightlyActiveMinutes = as.integer(daily_activity_df$LightlyActiveMinutes)
daily_activity_df$SedentaryMinutes = as.integer(daily_activity_df$SedentaryMinutes)
daily_activity_df$Calories = as.integer(daily_activity_df$Calories)
daily_activity_df$ActivityDate <- parse_datetime(daily_activity_df$ActivityDate, format = '%m/%d/%Y')

#### Hourly calories dataset

In [None]:
hourly_calories_df$ActivityHour <- parse_datetime(hourly_calories_df$ActivityHour, 
                                               format = '%m/%d/%Y %I:%M:%S %p')
hourly_calories_df$Calories <- as.integer(hourly_calories_df$Calories)

#### Hourly Intensity dataset

In [None]:
hourly_intensities_df$ActivityHour <- parse_datetime(hourly_intensities_df$ActivityHour, 
                                                             format='%m/%d/%Y %I:%M:%S %p')
hourly_intensities_df$TotalIntensity <- as.integer(hourly_intensities_df$TotalIntensity)

#### Sleep day dataset

In [None]:
daily_sleep_df$SleepDay <- parse_datetime(daily_sleep_df$SleepDay, 
                                          format="%m/%d/%Y %I:%M:%S %p")
daily_sleep_df$TotalSleepRecords <- as.integer(daily_sleep_df$TotalSleepRecords)
daily_sleep_df$TotalMinutesAsleep <- as.integer(daily_sleep_df$TotalMinutesAsleep)
daily_sleep_df$TotalTimeInBed <- as.integer(daily_sleep_df$TotalTimeInBed)

#### Weight info dataset

In [None]:
weight_info_df$Date <- parse_datetime(weight_info_df$Date, 
                                      format='%m/%d/%Y %I:%M:%S %p')

### Checking for NA values

In [None]:
colSums(is.na(daily_activity_df))
print("----------")
colSums(is.na(daily_sleep_df))
print("----------")
colSums(is.na(hourly_calories_df))
print("----------")
colSums(is.na(hourly_intensities_df))

- No NA values so far

### Number of users in each dataset

In [None]:
print(paste("Number of users in daily activity dataset:",
            length(unique(daily_activity_df$Id))))

print(paste("Number of users in daily sleep dataset:",
            length(unique(daily_sleep_df$Id))))

print(paste("Number of users in hourly calories dataset:",
            length(unique(hourly_calories_df$Id))))

print(paste("Number of users in hourly intensities dataset:",
            length(unique(hourly_intensities_df$Id))))

print(paste("Number of users in weight info dataset:",
            length(unique(weight_info_df$Id))))

- Everything other than **weight info dataset** has significant amount of users.
- Lets discard any use of this dataset for now.

# Analysis

#### Data Summary

In [None]:
cat("Daily activity----")
daily_activity_df %>% 
select(TotalSteps, TotalDistance, SedentaryMinutes, Calories) %>% 
summary()

cat("\nDaily sleep----")
daily_sleep_df %>% 
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()

cat("\nHourly calories----")
hourly_calories_df %>% select(Calories) %>% summary()

cat("\nHourly intensities----")
hourly_intensities_df %>% select(TotalIntensity, AverageIntensity) %>% summary()

- Average daily steps are 7638, According to a [2011 study by BioMedCentral](https://ijbnpa.biomedcentral.com/articles/10.1186/1479-5868-8-79), **10000 steps** is a good target for an adult to stay active.

- Sedentary minutes are approx **991 mins (16.5 hours)** daily which can be improved.

- Elevate Your Well-being with Thoughtful Activity Reminders from Bellabeat after every **45 minutes** of sedentary activity.

In [None]:
# sleeping time
daily_sleep_df %>% 
group_by(Id) %>% 
summarize(avg_sleep_time = mean(TotalMinutesAsleep), 
          avg_time_in_bed = mean(TotalTimeInBed)) %>%
select(avg_sleep_time, avg_time_in_bed) %>%
summary()

- Average time in bed each day is **420 mins (7 hours)** and average amount of sleep per day is **378 mins (6.3 hours)**

- According to [CDC](https://www.cdc.gov/sleep/about_sleep/how_much_sleep.html), the recommended amount of sleep for age **18-60 yrs** is **7 hours or more**. 

- Lets check for any correlation between time in bed and time asleep

In [None]:
# Checking for relation between Time in bed and time asleep
corr_mat <- daily_sleep_df %>% select(TotalMinutesAsleep, TotalTimeInBed) %>% cor()
corr <- corr_mat[1,2]
print(paste("Correlation between TotalMinutesAsleep and TotalTimeInBed:", corr))

# plot
daily_sleep_df %>% ggplot(aes(x = TotalMinutesAsleep, y = TotalTimeInBed)) +
geom_point() + 
labs(title = "Time in Bed Vs Time Asleep",
     x = "Total minutes asleep",
     y="Total time in bed (in mins)") +
scale_y_continuous(breaks=seq(0,1000,100))

- Time in bed and time asleep seems to be highly correlated.

- We can enhance Sleep Quality and amount with **Personalized Time-to-Sleep (TTS) Notifications** to Bellabeat users.

In [None]:
# Activity Type and amount of time spend
daily_activity_df %>% 
summarize(avg_veryActiveMinutes = mean(VeryActiveMinutes),
          avg_FairlyActiveMinutes = mean(FairlyActiveMinutes),
          avg_LightlyActiveMinutes = mean(LightlyActiveMinutes)) %>% 
melt() %>% 
ggplot(aes(x=variable, y = value)) + 
geom_col() + 
labs(x = "Type of Activity",
     y = "Average amount of Time (in mins)", 
     title="Activity vs Time") +
scale_x_discrete(labels=c('Very Active', 
                          'Fairly Active',
                          'Lightly Active')) +
scale_y_continuous(breaks=seq(0, 200, 25))

- It appears that the frequency of "**Lightly Active**" activities is notably higher compared to **"Very Active" and "Fairly Active"** activities.

- We can recommend Bellabeat users diverse activities to help users achieve a more equitable distribution across these activity levels.

In [None]:
# correlation matrix
daily_activity_df %>% 
select(TotalSteps, TotalDistance,
       VeryActiveMinutes, FairlyActiveMinutes, 
       LightlyActiveMinutes, SedentaryMinutes, 
       Calories) %>% 
cor()

- **High Correlation** Between Total Steps and Total Distance (which is obvious).
- Calories Burned Highly Correlated with Distance and Steps.
- **Potential Influence of VeryActiveMinutes on Calories Burned.**

In light of these findings, we can offer tailored recommendations to **Bellabeat users** seeking to **lose weight**:

* Highly Active Exercise Suggestions
* Varied Workouts for Diversity
* Guidance on Duration and Intensity
* Tracking Progress

In [None]:
# merging hourly calories and intensities
merged_calories_intensities_df <- hourly_calories_df %>% 
left_join(hourly_intensities_df, 
          by = c('Id','ActivityHour'))

In [None]:
merged_calories_intensities_df %>% 
group_by(hour(ActivityHour)) %>% 
rename(hour_of_the_day = 'hour(ActivityHour)') %>%
summarize(avg_calBurned = mean(Calories),
          avg_TotalIntensity = mean(TotalIntensity)) %>%
ggplot() +
geom_line(aes(x=hour_of_the_day, y=avg_calBurned, 
              color = 'Average Calories Burned'))+
geom_line(aes(x=hour_of_the_day, y=avg_TotalIntensity, 
              color = 'Average Total Intensity')) +
scale_y_continuous(name = "Average Calories Burned", 
                   sec.axis = sec_axis(~./1, name = "Average Total Intensity")) +
scale_x_continuous(breaks=seq(0, 23, 1)) +
theme(legend.position = "bottom")

- Users are mostly active from 9am to 7pm.

- total intensity and calories burned follow a similar kind of graph, lets check for correlation.

In [None]:
merged_calories_intensities_df %>% select(Calories, TotalIntensity) %>% cor()

merged_calories_intensities_df %>% select(Calories, TotalIntensity) %>% ggplot(aes(x=Calories,y=TotalIntensity)) + geom_point() + geom_smooth()

The high correlation between "TotalIntensity" and "calories burned" makes sense! Essentially, when workouts get more intense, you tend to burn more calories.

# Does activity affect amount of sleep??

In [None]:
daily_sleep_activity_df <- daily_activity_df %>% 
rename(SleepDay = ActivityDate) %>% 
inner_join(daily_sleep_df, by = c('Id', 'SleepDay'))

In [None]:
head(daily_sleep_activity_df)

In [None]:
daily_sleep_activity_df %>% mutate(TotalActiveMinutes = (VeryActiveMinutes + 
                                                   FairlyActiveMinutes +
                                                  LightlyActiveMinutes)) %>%
select(TotalActiveMinutes, TotalSteps,
       SedentaryMinutes, Calories,
       TotalMinutesAsleep, TotalTimeInBed) %>% cor()


- TotalTimeInBed and SedentaryMinutes shows a good negative correlation. 

- Bellabeat users can be adviced to exercise more often in order to see a good sleep pattern.(need to be analysed more).

# Summary


- Bellabeat users can exhibit an average step count below 10,000, suggesting potential for increased activity.

- Correlations reveal intense workouts, calorie burn, and consistent sleep patterns among users.(Needs to be further analysed)

- Recommendations tailored for Bellabeat users: Encourage diverse exercises, aim for 10k steps, introduce sleep-focused TTS notifications, and offer personalized workout guidance.

- Address sedentary habits with timely activity prompts.

- Emphasize active periods between 9 am to 7 pm to align with users' schedules.

- Track progress to foster balanced activity, better sleep, and reduced sedentary behavior, enhancing the well-being of Bellabeat users.