# Sleep Health Data Analysis

The Sleep Health and Lifestyle Dataset offers valuable insights into multiple factors influencing sleep patterns and overall lifestyle. Containing 400 rows and 13 columns, this Kaggle dataset covers a wide range of variables such as sleep duration, sleep quality, physical activity levels, stress levels, BMI categories, cardiovascular health indicators, and the presence of sleep disorders. 

The aim of this report is to explore and interpret key insights from the dataset using descriptive statistics and data visualization methods.

## Initialization and Preparing Data

In [59]:
# loading all the libraries
import pandas as pd
from matplotlib import pyplot as plt
from scipy import stats as st
import numpy as np

In [69]:
sleep = pd.read_csv('Sleep_health_and_lifestyle_dataset.csv', sep=','); sleep.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


According to the documentation:  
  
`'Person ID'` — Unique identifier for each individual  
`'Age'` — Age of the person in years  
`'Gender'` — Male or Female  
`'Occupation'` — The occupation or profession of the person  
`'Sleep Duration'` — The number of hours the person sleeps per day  
`'Quality of SLeep'` — Rating of the quality of sleep (1 to 10)  
`'Physical Activity Level'` — The number of minutes the person engages in physical activity daily  
`'Stress Level'` — Rating of the person's stress level (1 to 10)  
`'BMI Category'` — The BMI category of the person (e.g., Underweight, Normal, Overweight)  
`'Blood Pressure'` — The blood pressure measurement of the person (systolic or diastolic pressure)  
`'Heart Rate'` — The resting heart rate of the person in beats per minute  
`'Daily Steps'` — The number of steps the person takes per day  
`'Sleep Disorder'` — The presence/absence of a sleep disorder (None, Insomnia, Sleep Apnea)   

In [67]:
sleep.duplicated().sum()

0

In [66]:
sleep.isna().sum()

Person ID                    0
Gender                       0
Age                          0
Occupation                   0
Sleep Duration               0
Quality of Sleep             0
Physical Activity Level      0
Stress Level                 0
BMI Category                 0
Blood Pressure               0
Heart Rate                   0
Daily Steps                  0
Sleep Disorder             219
dtype: int64

As shown above, although there are no duplicated rows, there are missing values in the `Sleep Disorder` varibale. Because the missing values are due to the studied person not having any sleep disorder, I will replace the NaN values with the word "None" for better usability and clarity.

In [73]:
sleep['Sleep Disorder'].fillna('None', inplace=True)

In [72]:
sleep.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [63]:
sleep.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


In [None]:
sleep.groupby('Sleep Quality')['Productivity Score']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x139000f80>