# Module 2 — Pandas Foundations (Data Wrangling)

This notebook covers Pandas basics for tabular data analysis. Lessons: 2.1–2.5
We'll use the provided dataset `eda_course_dataset_100rows.csv` for examples.

## Lesson 2.1 — Introduction to Pandas

Pandas provides `Series` and `DataFrame` built on NumPy arrays. It's ideal for cleaning, transforming, and analyzing tabular data.

In [None]:
import pandas as pd
pd.Series([10,20,30], index=['a','b','c'])

In [None]:
pd.DataFrame({'Name':['Alice','Bob'],'Age':[25,30]})

## Lesson 2.2 — Importing & Inspecting
Read the CSV and inspect using `.head()`, `.info()`, `.describe()`, `.shape`, `.dtypes`.

In [None]:
df = pd.read_csv('eda_course_dataset_100rows.csv', parse_dates=['order_date'])
display(df.head())
print('shape:', df.shape)
display(df.info())
display(df.describe(include='all').T)

## Lesson 2.3 — Indexing & Selection
Use `df['col']`, `df[['c1','c2']]`, `.loc[]` (label) and `.iloc[]` (position). Avoid chained indexing.

In [None]:
display(df['region'].head())
display(df[['customer_id','price']].head())
display(df.loc[0])
display(df.iloc[0])
display(df[df['price'] > 200].head())

## Lesson 2.4 — Cleaning Data
Missing values, duplicates, type conversion, dropping columns.

In [None]:
display(df.isna().sum())
df['product_rating'] = df['product_rating'].fillna(df['product_rating'].median())
df_clean = df.drop_duplicates()
print('duplicates removed, new shape:', df_clean.shape)

## Lesson 2.5 — Transformation
Create new columns, apply functions, string operations, and date extraction.

In [None]:
df_clean['price_per_item'] = df_clean['total_amount'] / df_clean['quantity']
df_clean['age_group'] = df_clean['age'].apply(lambda x: 'Youth' if x < 30 else ('Senior' if x>=65 else 'Adult'))
df_clean['payment_method'] = df_clean['payment_method'].astype(str).str.lower()
df_clean['order_year'] = df_clean['order_date'].dt.year
display(df_clean[['price_per_item','age_group','payment_method','order_year']].head())

**Homework / Practice:**
- Identify 3 columns with missing values and propose imputation strategies.
- Create a `high_value` boolean column where `total_amount` > median(total_amount).