## binary encoding - numbers -> binary

Binary Encoding converts categorical values into binary numbers (0s and 1s).

It first converts categories into numbers,

then converts those numbers into binary format.

In [1]:
import pandas as pd
import category_encoders as ce

In [8]:
crop_data=pd.read_csv("dataset/Crop_recommendation.csv")
crop_data

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,rice
1,85,58,41,21.770462,80.319644,7.038096,226.655537,rice
2,60,55,44,23.004459,82.320763,7.840207,263.964248,rice
3,74,35,40,26.491096,80.158363,6.980401,242.864034,rice
4,78,42,42,20.130175,81.604873,7.628473,262.717340,rice
...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,coffee
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,coffee
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,coffee
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,coffee


*Apply binary encoding*

In [9]:
binary_data=ce.BinaryEncoder(cols=['label'])
encoded_data= binary_data.fit_transform(crop_data)


In [10]:
encoded_data

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label_0,label_1,label_2,label_3,label_4
0,90,42,43,20.879744,82.002744,6.502985,202.935536,0,0,0,0,1
1,85,58,41,21.770462,80.319644,7.038096,226.655537,0,0,0,0,1
2,60,55,44,23.004459,82.320763,7.840207,263.964248,0,0,0,0,1
3,74,35,40,26.491096,80.158363,6.980401,242.864034,0,0,0,0,1
4,78,42,42,20.130175,81.604873,7.628473,262.717340,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,1,0,1,1,0
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,1,0,1,1,0
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,1,0,1,1,0
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,1,0,1,1,0


# How It Works (Example)

Color → Red, Blue, Green, Yellow

Step 1: Assign numbers

Red → 1
Blue → 2

Step 2: Convert to binary

1 → 001
2 → 010

# Pros & Cons
*Pros*

- Fewer columns than One-Hot

- Good for high cardinality data

- Reduces memory usage

*Cons*

- Harder to interpret

- Still introduces numeric relationship

- Requires external library

# When to use

| Situation              | Use Binary Encoding? |
| ---------------------- | -------------------- |
| Small categories       | ❌ Not necessary      |
| Many unique categories | ✅ Yes                |
| Tree-based models      | ✅ Good               |
| Linear models          | ⚠ Use carefully      |
