# 🧪 Pandas Wrangle Lab: Clean & Explore a Real Dataset

## 🔹 LEARNING GOALS:
- Practice loading, cleaning, and exploring real-world data
- Apply column creation, renaming, sorting, and filtering
- Use `.info()`, `.describe()`, and `.query()` fluently


### 📥 1. Load the Dataset

In [None]:
import pandas as pd
df = pd.read_csv("../../data/students.csv")
df.head()

### 🔎 2. Inspect and Audit the Data

In [None]:
# Basic overview
df.info()

In [None]:
# Summary stats
df.describe()

### 🧼 3. Clean Missing or Invalid Data

In [None]:
# Check for missing values
df.isnull().sum()

In [None]:
# Drop rows with missing names
df.dropna(subset=["first_name", "last_name"], inplace=True)

# Fill any missing scores with column average
df["math_score"].fillna(df["math_score"].mean(), inplace=True)
df["science_score"].fillna(df["science_score"].mean(), inplace=True)


### 🧠 4. Feature Engineering (New Columns)

In [None]:
# Add average and grade
df["average_score"] = (df["math_score"] + df["science_score"]) / 2

def grade(score):
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    elif score >= 70:
        return "C"
    else:
        return "D"

df["grade"] = df["average_score"].apply(grade)
df.head()

### 🔽 5. Sorting and Filtering

In [None]:
# Top performers
df[df["average_score"] > 90].sort_values(by="average_score", ascending=False).head()

### 📊 6. Group and Describe by Grade

In [None]:
# How many of each grade?
df["grade"].value_counts()

In [None]:
# Average scores per grade group
df.groupby("grade")[["math_score", "science_score", "average_score"]].mean()

### 💾 7. Save the Cleaned Dataset

In [None]:
df.to_csv("student_scores_cleaned.csv", index=False)

### 🧠 Challenge Task

> Your turn! Filter out students who got a D, sort by last name, and export to a new file:
- Only include columns: `first_name`, `last_name`, `grade`
- Save it as `"d_students.csv"`


### 📝 Summary

This lab gave you hands-on experience with:
- Cleaning nulls and type mismatches
- Creating new columns
- Filtering and sorting real data
- Grouping and summarizing by categorical features

Your data wrangling toolbox is now ready for real-world messiness. 🧹🛠️
