# 🔢 Encoding Categorical Variables

This notebook demonstrates how to encode categorical variables using:
- One-Hot Encoding
- Label Encoding
- Custom Ordinal Mapping
- Frequency Encoding

In [None]:
import pandas as pd

## 🎨 Example 1: One-Hot Encoding for Colors

### 🔢 Table: One-Hot Encoding Result

| Color   | color_blue | color_green | color_red |
|---------|------------|-------------|-----------|
| red     | 0          | 0           | 1         |
| blue    | 1          | 0           | 0         |
| green   | 0          | 1           | 0         |
| blue    | 1          | 0           | 0         |
| red     | 0          | 0           | 1         |

In [None]:
df_color = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df_color_encoded = pd.get_dummies(df_color, columns=['color'])
df_color_encoded

### 👤 Table: Label Encoding Result (Gender)

| Gender  | gender_code |
|---------|-------------|
| male    | 1           |
| female  | 0           |
| female  | 0           |
| male    | 1           |

> ⚠️ Label encoding assumes order that may not exist. Use carefully!

## 👤 Example 2: Label Encoding for Gender (not ordinal safe)

### 🎓 Table: Ordinal Encoding (Education)

| Education     | education_encoded |
|---------------|-------------------|
| high school   | 0                 |
| bachelor      | 1                 |
| master        | 2                 |
| PhD           | 3                 |
| bachelor      | 1                 |

In [None]:
df_gender = pd.DataFrame({'gender': ['male', 'female', 'female', 'male']})
df_gender['gender_code'] = df_gender['gender'].astype('category').cat.codes
df_gender

### 📮 Table: Frequency Encoding (ZIP Codes)

| ZIP Code | zip_encoded (frequency) |
|----------|-------------------------|
| 90001    | 2                       |
| 90002    | 1                       |
| 90003    | 3                       |
| 90001    | 2                       |
| 90003    | 3                       |

## 🎓 Example 3: Ordinal Mapping for Education Level

In [None]:
df_edu = pd.DataFrame({'education': ['high school', 'bachelor', 'master', 'PhD', 'bachelor']})
education_order = {'high school': 0, 'bachelor': 1, 'master': 2, 'PhD': 3}
df_edu['education_encoded'] = df_edu['education'].map(education_order)
df_edu

## 🔢 Example 4: Frequency Encoding for Zip Codes

In [None]:
df_zip = pd.DataFrame({'zip': ['90001', '90001', '90002', '90003', '90003', '90003']})
zip_freq = df_zip['zip'].value_counts()
df_zip['zip_encoded'] = df_zip['zip'].map(zip_freq)
df_zip