# Module 3 — Pandas for Analysis & Visualization

This notebook covers grouping, pivoting, visualization, sorting, ranking and time-series.
We'll continue using `eda_course_dataset_100rows.csv`.

## Lesson 3.1 — Grouping & Aggregation
Group data and compute aggregated metrics to summarize patterns.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('eda_course_dataset_100rows.csv', parse_dates=['order_date'])
grp_region = df.groupby('region')['total_amount'].mean().sort_values(ascending=False)
display(grp_region)
grp_cat = df.groupby('product_category').agg({'price':'mean','quantity':'sum'}).sort_values('price', ascending=False)
display(grp_cat)
display(df['channel'].value_counts())
display(pd.crosstab(df['channel'], df['returned']))

## Lesson 3.2 — Pivot & Correlation
Use pivot tables to slice metrics and correlation to check numeric relationships.

In [None]:
pv = pd.pivot_table(df, values='total_amount', index='region', columns='channel', aggfunc='mean', margins=True)
display(pv)
display(df[['price','quantity','total_amount']].corr())
plt.figure(figsize=(6,4))
sns.heatmap(df[['price','quantity','total_amount']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Lesson 3.3 — Sorting & Ranking
Sort rows and create ranking columns.

In [None]:
display(df.sort_values('total_amount', ascending=False).head())
df['price_rank'] = df['price'].rank(ascending=False)
display(df[['price','price_rank']].head())

## Lesson 3.4 — Visualization with Pandas & Seaborn
Common charts used in EDA.

In [None]:
plt.figure(figsize=(8,4))
sns.histplot(df['price'], kde=True)
plt.title('Price distribution')
plt.show()

plt.figure(figsize=(8,4))
sns.boxplot(x='product_category', y='price', data=df)
plt.title('Price by product_category')
plt.xticks(rotation=30)
plt.show()

plt.figure(figsize=(6,4))
sns.countplot(x='region', data=df)
plt.title('Counts by region')
plt.show()

plt.figure(figsize=(6,4))
sns.scatterplot(x='price', y='total_amount', data=df)
plt.title('Price vs Total Amount')
plt.show()

## Lesson 3.5 — Time Series
Resampling and simple time-based visualizations.

In [None]:
df_ts = df.set_index('order_date')
monthly = df_ts['total_amount'].resample('M').sum()
plt.figure(figsize=(10,4))
monthly.plot()
plt.title('Monthly Sales')
plt.show()
display(monthly.head())

**Next steps / Exercises:**
- Create a pivot table of average price by (region x product_category).
- Build a 1-page report with top 3 findings (text + 2 charts).