# Sleep Study Dataset Analysis

The goal for this analysis was to understand human sleep behaviour using the "sleepstudy.csv" dataset. The dataset has information about the sleep duration, efficieny, REM, deep, and light sleep percentages of the participants. As well as some lifestyle factors like caffeine and alcohol consumption, exercise frequency, and smoking status. Using this data, we have the goal to understand how demographic facotrs and habits in the daily life influence sleep quality and duration. 

This analysis will include statistics, visualizations, and correlations to provide insights into which factors are most strongly associated with better or worse sleep. Understanding the patterns in the dataset can help with knowing healthier habits for sleep and overall lifestyle.

In [3]:
#Load Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
#Load csv file and check if it loaded correctly
df = pd.read_csv("sleepstudy.csv")
df.head()

Unnamed: 0,ID,Age,Gender,Bedtime,WakeupTime,SleepDuration,SleepEfficiency,REMSleepPercentage,DeepSleepPercentage,LightSleepPercentage,Awakenings,CaffeineConsumption,AlcoholConsumption,SmokingStatus,ExerciseFrequency
0,1,80,Female,2025-09-30 7:32,49:16.7,6.283241,0.57,15,35,50,0,25.0,1,Yes,1
1,2,24,Male,2025-06-29 20:59,09:10.2,7.155613,0.91,29,68,3,4,50.0,0,No,2
2,3,37,Male,2025-12-24 21:28,31:34.3,6.050627,0.58,15,35,50,3,50.0,0,No,5
3,4,68,Female,2025-02-22 0:25,26:37.0,7.017791,0.88,28,44,28,1,50.0,0,Yes,4
4,5,58,Male,2025-09-02 12:31,17:46.3,8.764793,0.95,28,40,32,4,25.0,4,No,4


In [5]:
#Summary Statistics
df.describe(include = "all")

Unnamed: 0,ID,Age,Gender,Bedtime,WakeupTime,SleepDuration,SleepEfficiency,REMSleepPercentage,DeepSleepPercentage,LightSleepPercentage,Awakenings,CaffeineConsumption,AlcoholConsumption,SmokingStatus,ExerciseFrequency
count,1000.0,1000.0,1000,1000,1000,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,744.0,1000.0,1000,1000.0
unique,,,2,1000,990,,,,,,,,,2,
top,,,Male,2025-09-30 7:32,42:25.2,,,,,,,,,Yes,
freq,,,513,1,2,,,,,,,,,513,
mean,500.5,50.125,,,,7.027443,0.72015,22.46,47.357,30.183,2.539,25.168011,2.56,,2.565
std,288.819436,18.076397,,,,1.135365,0.128139,4.589059,13.097293,13.788968,1.726691,20.446036,1.746267,,1.712515
min,1.0,18.0,,,,5.002335,0.5,15.0,25.0,1.0,0.0,0.0,0.0,,0.0
25%,250.75,34.0,,,,6.07698,0.61,18.0,36.0,18.0,1.0,0.0,1.0,,1.0
50%,500.5,51.0,,,,6.981505,0.72,22.0,47.0,30.0,3.0,25.0,3.0,,3.0
75%,750.25,65.0,,,,8.039587,0.83,27.0,59.0,42.0,4.0,50.0,4.0,,4.0


In [6]:
#Count missing values in each column
df.isnull().sum()

ID                        0
Age                       0
Gender                    0
Bedtime                   0
WakeupTime                0
SleepDuration             0
SleepEfficiency           0
REMSleepPercentage        0
DeepSleepPercentage       0
LightSleepPercentage      0
Awakenings                0
CaffeineConsumption     256
AlcoholConsumption        0
SmokingStatus             0
ExerciseFrequency         0
dtype: int64

In [7]:
#Shape of dataset to help with Basic Summary
df.shape

(1000, 15)

In [15]:
#Figuring out the percentage of smokers for descriptive insights

smoker_percentage = (df['SmokingStatus'] == 'Yes').mean() * 100
print(round(smoker_percentage, 2))


51.3


## Basic Summary

This dataset has info on participants sleep paterns and lifestyle habits. Columns include SleepDuration, SleepEfficiency, REMSleepPercentage, DeepSleepPercentage, LightSleepPercentage, lifestyle habits like CaffeineConsumption, AlcoholConsumption, SmokingStatus, and ExerciseFrequency, along with Age and Gender. There are 1000 columns and 15 rows with 256 entries in "CaffeineConsumption" being blank.

## Descriptive Statistics

### Age

Participants range from 18-80 years old, wiht a median age of 51. The IQR shows that most participants are on the older side (34-65)

### Sleep Duration

On average participants sleep about 7 hours per night (mean = 7.027 hours), with a range of 5 to 9 hours across the whole dataset. Most of the dataset falls between the 6-8 hour range (the IQR).

### Sleep Efficiency

Average sleep efficency is around 72% with most participants between 61% and 83% (the 25th and 75th percentile respect). Some participants however, have very low sleep efficency (50%), while some reach a very high efficiency (95%).

### Sleep Stages

REM Sleep (~22% of total sleep):

REM (Rapid Eye Movement) sleep is the stage of sleep where dreaming occurs, characterized by high brain activity, increased heart rate, and temporary paralysis of major muscles. REM sleep is CRUCIAL for high quality sleep with the optimal amount comprising of 20-25% of your total sleep duration (essentially too much is bad). Participants spend roughs a fifth of their sleep in REM, which is normal.

Source: National Institutes of Health (https://www.nhlbi.nih.gov/health/sleep/stages-of-sleep and https://pmc.ncbi.nlm.nih.gov/articles/PMC2847051/)



Deep Sleep (~47% of total sleep):

Deep sleep is the most restorative stage of sleep. It helps the body recover, strengthens the immunge system, and supports muscle and tissue repair. Deep sleep is the largest portion of sleep. This number indicates participants are getting enough restorative sleep even when the total sleep duration varies.

Source: Sleep Foundation (https://www.sleepfoundation.org/stages-of-sleep/deep-sleep)

Light Sleep (~30% of total sleep):

Light sleep is the stage between waking up and deep sleep. Easier to wake up from and plays a role in physical recover and memory consolidation but is lest restorative than deep sleep. Participants spend about 30% which is normal.

Source: CLMSleep (https://www.clmsleep.com/light-sleep/)

### Awakenings

Participants experience 0-5 awakenings per night, with a median of 3, suggesting sleep interruptions are common. Upon research, sleepless nights are  common with approximately 2 in 5 Canadians experiencing sleepless nights according to Stats Canada.

Source: Statistics Canada (https://www.statcan.gc.ca/o1/en/plus/1653-cant-sleep-count-sheep)

### Lifestyle Factors

Exercise Frequency: Most participants exercise 0-5 days per week (median 3). More frequent exercise is generally linked to better sleep quality

Caffeine Consumption: Caffeine intake varies widely (0-50 mg/day). Higher caffeince use may reduce sleep duration or efficiency

Alcohol Consumption: Alcohol consumption ranges 0-5 ounces per day. Higher alcohol can interrupt sleep cycles and ruin sleep quality

Smoking Status: About half of the participants being smokers. Nicotine can negatively effect sleep duration and efficiency