# 🧼 Cleaning Dirty Data (Missing Values & Type Fixes)

## 🔹 LEARNING GOALS:
- Detect and count missing values (`NaN`)
- Fill or drop missing data
- Convert column data types safely
- Understand the difference between `NaN`, `None`, `""`, and type mismatches


### 🧪 1. Load a Messy Dataset

In [None]:
import pandas as pd
import numpy as np

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", None],
    "Age": ["25", "thirty", 35, np.nan, "40"],
    "Signup Date": ["2022-01-01", "not a date", "2022/03/01", None, "April 5, 2022"],
    "Score": [95.5, None, 88.0, 92.5, ""]
}

df = pd.DataFrame(data)
df

### 🧯 2. Detecting Missing or Broken Values

In [None]:
df.isnull()

In [None]:
df.isnull().sum()

In [None]:
df[df.isnull().any(axis=1)]

### 🧹 3. Cleaning Strategy Options

In [None]:
df.fillna({
    "Name": "Unknown",
    "Age": -1,
    "Signup Date": "1970-01-01",
    "Score": 0.0
})

In [None]:
df.dropna()

### 🧬 4. Data Type Fixes

In [None]:
df.dtypes

In [None]:
df["Age"] = pd.to_numeric(df["Age"], errors="coerce")
df["Score"] = pd.to_numeric(df["Score"], errors="coerce")
df["Signup Date"] = pd.to_datetime(df["Signup Date"], errors="coerce")
df.dtypes

### 🩹 5. Impute (Fill In) Fixed Missing Values

In [None]:
df["Age"].fillna(df["Age"].median(), inplace=True)
df["Score"].fillna(df["Score"].mean(), inplace=True)

In [None]:
df["Signup Date"].fillna(df["Signup Date"].min(), inplace=True)

In [None]:
df["Name"].fillna("Unknown", inplace=True)

### 🤓 6. Cleaned Data Review

In [None]:
df.info()

In [None]:
df.describe(include="all")

### 🧪 Try It Yourself

Modify the `data` dictionary at the top of this notebook. Add:
- A new column with some `None` and `""` values
- At least one row with all columns filled incorrectly
Then re-run the notebook and fix it step-by-step.

### 🧠 Mini-Challenge

> 🗂 Load `"data/survey.csv"` and:
- Identify which columns have missing values
- Use `.isnull().sum()` to get a null report
- Use a mix of `.fillna()`, `.dropna()`, and `pd.to_numeric()` or `pd.to_datetime()` to clean it
- Print a summary with `.info()` and `.describe()`

### 📝 Summary

| Concept        | Tool/Function                      |
|----------------|------------------------------------|
| Detect nulls   | `df.isnull()`, `df.isnull().sum()` |
| Drop rows      | `df.dropna()`                      |
| Fill values    | `df.fillna()`                      |
| Convert types  | `pd.to_numeric()`, `pd.to_datetime()` |
| Replace values | `df.replace()`                     |
