## Target encoding

*Target Encoding replaces each category with the mean of the target variable for that category.*

In [1]:
import pandas as pd
import category_encoders as ce

In [2]:
# Sample dataset
data = pd.DataFrame({
    'City': ['Chennai', 'Mumbai', 'Chennai', 'Delhi', 'Mumbai'],
    'Purchased': [1, 0, 1, 0, 1]
})

In [3]:
# Create encoder
encoder = ce.TargetEncoder(cols=['City'])

# Apply encoding
encoded_data = encoder.fit_transform(data['City'], data['Purchased'])
print(encoded_data)

       City
0  0.656740
1  0.585815
2  0.656740
3  0.521935
4  0.585815


# Pros

- Handles high-cardinality features well (e.g., 100+ categories).

- Does not increase dataset size like One-Hot.

- Adds meaningful information (based on target).

- Works well with tree models (XGBoost, Random Forest).

# Cons

Data Leakage Risk ⚠
(Uses target values while encoding)

Can overfit small datasets.

Needs careful handling (cross-validation / smoothing required).

Not suitable for unsupervised learning.

# When to use

| Situation                | Use Target Encoding? |
| ------------------------ | -------------------- |
| Supervised learning      | ✅ Yes                |
| High-cardinality feature | ✅ Yes                |
| Tree-based models        | ✅ Yes                |
| Small dataset            | ⚠ Careful            |
| Unsupervised problem     | ❌ No                 |
