# ECON 0150 | Replication Notebook

**Title:** Sleep and Social Media

**Original Authors:** Molz; Thompson

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** How are students' daily sleep affected by social media usage?

**Data Source:** Students Social Media Addiction Survey (705 students)

**Methods:** OLS regression: Sleep_Hours ~ Social_Media_Usage

**Main Finding:** Negative relationship between social media usage and sleep hours.

**Course Concepts Used:**
- Simple linear regression
- Scatter plots with regression lines
- Survey data analysis
- Correlation vs. causation

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0019/data/'

data = pd.read_csv(base_url + 'Students Social Media Addiction.csv')

print(f"Number of students: {len(data)}")
print(f"\nColumns: {data.columns.tolist()}")
data.head()

---
## Step 1 | Data Preparation

In [None]:
# Rename columns for clarity
data = data.rename(columns={
    'Avg_Daily_Usage_Hours': 'Daily_Usage_Hours',
    'Sleep_Hours_Per_Night': 'Sleep_Hours'
})

# Check for missing values
print("Missing values:")
print(data[['Daily_Usage_Hours', 'Sleep_Hours']].isnull().sum())

# Drop missing values if any
data = data.dropna(subset=['Daily_Usage_Hours', 'Sleep_Hours'])
print(f"\nCleaned data: {len(data)} students")

In [None]:
# Key variables
print("\nKey Variables:")
print(data[['Daily_Usage_Hours', 'Sleep_Hours', 'Mental_Health_Score', 'Addicted_Score']].head(10))

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics
print("Summary Statistics:")
print(data[['Daily_Usage_Hours', 'Sleep_Hours', 'Mental_Health_Score', 'Addicted_Score']].describe())

In [None]:
# Correlation
correlation = data['Daily_Usage_Hours'].corr(data['Sleep_Hours'])
print(f"\nCorrelation between social media usage and sleep: {correlation:.4f}")

---
## Step 3 | Visualization

In [None]:
# Distribution of sleep hours
plt.figure(figsize=(10, 6))
sns.histplot(data['Sleep_Hours'], kde=True, bins=15)
plt.title('Distribution of Sleep Hours Per Night')
plt.xlabel('Sleep Hours')
plt.ylabel('Frequency')
plt.axvline(data['Sleep_Hours'].mean(), color='red', linestyle='--', 
            label=f"Mean: {data['Sleep_Hours'].mean():.1f} hours")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Distribution of social media usage
plt.figure(figsize=(10, 6))
sns.histplot(data['Daily_Usage_Hours'], kde=True, bins=15)
plt.title('Distribution of Daily Social Media Usage')
plt.xlabel('Hours Per Day')
plt.ylabel('Frequency')
plt.axvline(data['Daily_Usage_Hours'].mean(), color='red', linestyle='--',
            label=f"Mean: {data['Daily_Usage_Hours'].mean():.1f} hours")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot with regression line
plt.figure(figsize=(10, 6))
sns.regplot(x='Daily_Usage_Hours', y='Sleep_Hours', data=data,
            scatter_kws={'s': 50, 'alpha': 0.5})
plt.title('Social Media Usage vs. Sleep Hours')
plt.xlabel('Daily Social Media Usage (Hours)')
plt.ylabel('Sleep Hours Per Night')
plt.grid(True, alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression
model = smf.ols('Sleep_Hours ~ Daily_Usage_Hours', data=data).fit()
print("OLS Regression: Sleep_Hours ~ Daily_Usage_Hours")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: Social media usage does not affect sleep (beta = 0)")
print(f"\nModel Results:")
print(f"  Intercept: {model.params['Intercept']:.2f} hours")
print(f"  Usage coefficient: {model.params['Daily_Usage_Hours']:.4f}")
print(f"  P-value: {model.pvalues['Daily_Usage_Hours']:.6f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nInterpretation:")
print(f"  Each additional hour of social media use is associated with")
print(f"  {model.params['Daily_Usage_Hours']:.3f} hours of sleep")

In [None]:
# Additional analysis: Mental health as mediator
model_mental = smf.ols('Sleep_Hours ~ Daily_Usage_Hours + Mental_Health_Score', data=data).fit()
print("\nWith Mental Health Score as Control:")
print(model_mental.summary().tables[1])

---
## Step 5 | Results Interpretation

### Key Findings

1. **Negative Relationship:** More social media use is associated with less sleep

2. **Effect Size:** Each hour of social media use correlates with reduced sleep

3. **Variance Explained:** Social media usage explains some variance in sleep patterns

### Possible Mechanisms

1. **Time Displacement:** Hours on social media take away from sleep time
2. **Blue Light:** Screen exposure before bed disrupts sleep quality
3. **Stimulation:** Engaging content keeps minds active before sleep
4. **FOMO:** Fear of missing out may cause late-night checking

### Causation Warning

The relationship could be reverse:
- **Usage → Sleep:** Social media keeps students awake
- **Sleep → Usage:** Students who can't sleep use social media
- **Third variables:** Stress, anxiety, or lifestyle factors affect both

### Limitations

- Self-reported data (may be inaccurate)
- Cross-sectional (can't establish causation)
- No time-of-day information

---
## Replication Exercises

### Exercise 1: Platform Differences
Does the relationship differ by most-used platform?

### Exercise 2: Gender Differences
Add gender as a control or interaction term. Does the effect differ?

### Exercise 3: Addiction Score
Is the addiction score a better predictor of sleep than raw usage hours?

### Challenge Exercise
Research the literature on social media and sleep. What does the science say?

In [None]:
# Your code for exercises

# Example: Average sleep by platform
# print(data.groupby('Most_Used_Platform')['Sleep_Hours'].mean().sort_values())