# Bellabeat Case Study – Analysis Notebook
This notebook walks through the exploratory data analysis and segmentation steps used in the Bellabeat case study. Data was cleaned using SQL scripts and saved in the `data/` folder, and visualizations are exported to the `visuals/` directory.

## 1. Load Cleaned Data

In [None]:
import pandas as pd

activity = pd.read_csv('../data/cleaned/daily_activity_clean.csv')
sleep = pd.read_csv('../data/cleaned/minute_sleep_merged.csv')
print(activity.shape)
activity.head()

## 2. Exploratory Data Analysis

In [None]:
# Summary statistics
activity.describe()

In [None]:
# Check missing values
activity.isnull().sum()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Distribution of steps
sns.histplot(activity['TotalSteps'], bins=30)
plt.title('Distribution of Daily Steps')
plt.xlabel('Steps')
plt.ylabel('Frequency')
plt.tight_layout()
plt.savefig('../visuals/steps_distribution.png')

In [None]:
# Correlation heatmap
sns.heatmap(activity.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.tight_layout()
plt.savefig('../visuals/correlation_matrix.png')

## 3. User Segmentation

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

features = activity[['TotalSteps', 'Calories']].copy()
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

kmeans = KMeans(n_clusters=3, random_state=42)
activity['Cluster'] = kmeans.fit_predict(features_scaled)
activity['Cluster'].value_counts()

In [None]:
# Cluster visualization
sns.scatterplot(data=activity, x='TotalSteps', y='Calories', hue='Cluster', palette='Set2')
plt.title('User Segments by Steps and Calories')
plt.tight_layout()
plt.savefig('../visuals/user_segments.png')

## 4. Conclusion
The dataset reveals meaningful distinctions in activity levels and calorie expenditure across users. Segmentation shows a mix of low-engagement, moderate, and high-performing users. These insights can support personalized marketing, in-app messaging, and engagement strategies.