# Lesson 16 – Data Cleaning

Handle missing values and inconsistent formats in the ecommerce data. Use techniques such as filling gaps and removing outliers.

## Learning Objectives
- detecting NaNs, filtering data, using `pandas` cleaning methods.

## Loading the E-commerce Dataset

In [None]:
import pandas as pd
import sys
sys.path.append('python4finance')
from generate_ecommerce_dataset import generate_ecommerce_dataset

orders, order_items, products = generate_ecommerce_dataset()

The dataset provides Orders, Order Items, and Products tables for analysis.

### Example 1: Preview Orders Data

In [None]:
orders.head()

### Example 2: Count Total Orders

In [None]:
print('Total orders:', len(orders))

### Example 3: Merge Orders with Items

In [None]:
merged = order_items.merge(orders, on='OrderID')
merged.head()

### Example 4: Revenue by Product Category

In [None]:
temp = order_items.merge(orders, on='OrderID').merge(products, on='ProductID')
revenue = temp.groupby('ProductCategory')['PurchaseAmount'].sum()
revenue.head()

### Example 5: Plot Daily Order Counts

In [None]:
orders['OrderDate'] = pd.to_datetime(orders['OrderDate'])
daily = orders.set_index('OrderDate').resample('D').size()
daily.plot(figsize=(8,4))

## Exercises
1. Identify orders with missing customer locations and fill them with "Unknown".
2. Remove outlier purchase amounts using the interquartile range method.