# ðŸ§¼ Data Cleaning, Preparation, and Wrangling Tutorial
Welcome to this hands-on tutorial where we clean and wrangle a messy dataset using Python and pandas.

**Objectives:**
- Load messy raw data
- Clean column names and entries
- Handle missing data
- Correct data types
- Standardize formats
- Remove duplicates
- Save cleaned dataset

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('raw_data.csv')
print('Original Data:')
df.head()

### Step 1: Clean Column Names

In [None]:
df.columns = df.columns.str.strip().str.replace(' ', '_').str.lower()
df.columns

### Step 2: Handle Missing Values

In [None]:
print(df.isnull().sum())

# Convert 'age' to numeric
df['age'] = pd.to_numeric(df['age'], errors='coerce')
df['age'].fillna(df['age'].median(), inplace=True)

In [None]:
df['email'].replace({'.': np.nan}, inplace=True)
df.dropna(subset=['email'], inplace=True)

### Step 3: Fix Date Formats

In [None]:
df['joindate'] = pd.to_datetime(df['joindate'], errors='coerce')

### Step 4: Clean Salary Column

In [None]:
df['salary_($)'] = pd.to_numeric(df['salary_($)'], errors='coerce')
df['salary_($)'].fillna(df['salary_($)'].median(), inplace=True)

### Step 5: Standardize Text Columns

In [None]:
df['name'] = df['name'].str.title().str.strip()
df['gender'] = df['gender'].str.title().str.strip()
df['department'] = df['department'].str.upper().str.strip()
df['employee_status'] = df['employee_status'].str.upper().str.strip()

### Step 6: Remove Duplicates

In [None]:
df.drop_duplicates(subset=['name', 'email'], inplace=True)

### Step 7: Handle Missing Department

In [None]:
df['department'].fillna('UNKNOWN', inplace=True)

### âœ… Final Cleaned Data

In [None]:
df.head()

In [None]:
df.to_csv('cleaned_data.csv', index=False)
print('Cleaned data saved as cleaned_data.csv')