###### Encoding means converting categorical (text) data into numeric (number) form so that the machine learning model can understand it.
Because most ML algorithms work only with numbers â€” not text.

###### Why Encoding is Needed

-> Machine Learning algorithms cannot process text directly.

-> It helps convert qualitative categorical  (non-numeric) data into quantitative , numeric form.

-> Improves model accuracy and performance.

###### Types Of Encoding .
1)Label Encoding	
2)One-Hot Encoding	
3)Ordinal Encoding	
4)Binary Encoding	
5)Frequency Encoding

###### 1)Label Encoding
Converts each category into a unique number.
Label encoding gives order (ranking) to the values, which can confuse some models.
Use it for ordinal data (like Small < Medium < Large).

In [2]:
#LABEL ENCODING 
from sklearn.preprocessing import LabelEncoder
import pandas as pd

data = pd.DataFrame({'Country': ['India', 'USA', 'Japan', 'India']})

le = LabelEncoder()
data['Country_Encoded'] = le.fit_transform(data['Country'])
print(data)


  Country  Country_Encoded
0   India                0
1     USA                2
2   Japan                1
3   India                0


###### 2)One-Hot Encoding
Converts each category into separate columns (0 or 1).
One-hot encoding avoids giving order but increases number of columns (called dummy variables).

In [3]:
import pandas as pd

data = pd.DataFrame({'Country': ['India', 'USA', 'Japan', 'India']})
encoded = pd.get_dummies(data, columns=['Country'])
print(encoded)

   Country_India  Country_Japan  Country_USA
0           True          False        False
1          False          False         True
2          False           True        False
3           True          False        False


###### 3)Ordinal Encoding
Used for ordered categories (like education level, size, rating)

In [4]:
from sklearn.preprocessing import OrdinalEncoder
import pandas as pd

data = pd.DataFrame({'Size': ['Small', 'Medium', 'Large', 'Medium']})

oe = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
data['Size_Encoded'] = oe.fit_transform(data[['Size']])
print(data)


     Size  Size_Encoded
0   Small           0.0
1  Medium           1.0
2   Large           2.0
3  Medium           1.0


###### 4)Binary Encoding
Combination of Label + Binary digits.

In [8]:

import pandas as pd
import category_encoders as ce

data = pd.DataFrame({'Country': ['India', 'USA', 'Japan', 'India']})
encoder = ce.BinaryEncoder(cols=['Country'])
data_encoded = encoder.fit_transform(data)
print(data_encoded)


ModuleNotFoundError: No module named 'category_encoders'

###### 5)Frequency (Count) Encoding
Replaces each category with its frequency (how many times it appears).

In [7]:
import pandas as pd

data = pd.DataFrame({'Country': ['India', 'USA', 'Japan', 'India']})
data['Country_Encoded'] = data['Country'].map(data['Country'].value_counts())
print(data)

  Country  Country_Encoded
0   India                2
1     USA                1
2   Japan                1
3   India                2
