# Social Media Usage and Mental Health Analysis

**Project:** Personal project for data analysis

**Date:** 28/10/2025

**Data Source:** [Social Media and Mental Health Balance Dataset on Kaggle](https://www.kaggle.com/datasets/ayeshaimran123/social-media-and-mental-health-balance)

## 1. Introduction

This project is an Exploratory Data Analysis (EDA) investigating the relationships between social media usage, daily habits, and mental well-being indicators such as stress, sleep quality, and happiness.

**Research Questions:**
1.  What is the profile (demographics, behavior) of the participants?
2.  Is there a correlation between screen time and reported happiness?
3.  Which factors (e.g., exercise, platform choice) have the most significant impact on stress and sleep?


In [3]:
# import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import kagglehub

# load dataset
path = kagglehub.dataset_download(
    "ayeshaimran123/social-media-and-mental-health-balance"
)

## 2. Data Loading and Initial Inspection

Now that the dataset is downloaded, we load it into a pandas DataFrame. We then perform an initial inspection to understand its structure, checking for:

* Column names and data types (`.info()`)
* The first few rows of data (`.head()`)
* Any missing (null) values

In [6]:
import os

file_name = "Mental_Health_and_Social_Media_Balance_Dataset.csv"
full_path = os.path.join(path, file_name)

try:
    df = pd.read_csv(full_path)
    df.info()
    print(df.head())

except FileNotFoundError:
    print(f"ERROR: File not found at {full_path}")
except Exception as e:
    print(f"An error has occurred: {e}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   User_ID                    500 non-null    object 
 1   Age                        500 non-null    int64  
 2   Gender                     500 non-null    object 
 3   Daily_Screen_Time(hrs)     500 non-null    float64
 4   Sleep_Quality(1-10)        500 non-null    float64
 5   Stress_Level(1-10)         500 non-null    float64
 6   Days_Without_Social_Media  500 non-null    float64
 7   Exercise_Frequency(week)   500 non-null    float64
 8   Social_Media_Platform      500 non-null    object 
 9   Happiness_Index(1-10)      500 non-null    float64
dtypes: float64(6), int64(1), object(3)
memory usage: 39.2+ KB
  User_ID  Age  Gender  Daily_Screen_Time(hrs)  Sleep_Quality(1-10)  \
0    U001   44    Male                     3.1                  7.0   
1    U002   30   O