# What is Encoding in Machine Learning

Encoding in ML is the process of converting categorical data into numerical format so that algorithms can process it.


![image.png](attachment:image.png)


After

![image-2.png](attachment:image-2.png)



# Ordinal Encoding

Ordinal encoding respects order in categories. For example, education levels have a natural order:

Highschool < Graduate < Postgrad


![image-3.png](attachment:image-3.png)

In [20]:
import pandas as pd 


df = pd.read_csv("binary_classification_sample.csv")
df

Unnamed: 0,Age,Salary,Experience,Gender,Department,Education,LocationScore,Purchased
0,56,51905.183591,27,Female,HR,Bachelors,67.964728,0
1,69,31258.344158,16,Female,Engineering,High School,21.825389,0
2,46,79176.734217,4,Male,HR,PhD,94.996118,0
3,32,47699.953137,4,Male,Engineering,High School,78.634501,1
4,60,36395.191619,5,Male,Marketing,High School,8.941100,1
...,...,...,...,...,...,...,...,...
195,69,69228.805705,10,Female,Engineering,Bachelors,77.985099,1
196,30,49573.678136,14,Female,Marketing,Bachelors,3.961883,1
197,58,24253.633311,27,Female,HR,High School,48.050695,0
198,20,,12,Female,Sales,High School,,0


# How to Apply Oridinal Encoding?

In [21]:
df.dropna(inplace=True)

In [22]:
df['Education'].value_counts()

Education
Masters        46
Bachelors      45
High School    41
PhD            37
Name: count, dtype: int64

In [23]:
from sklearn.preprocessing import OrdinalEncoder

ord_encoder = OrdinalEncoder()

df['Education'] = ord_encoder.fit_transform(df[['Education']]).astype(int)
df

Unnamed: 0,Age,Salary,Experience,Gender,Department,Education,LocationScore,Purchased
0,56,51905.183591,27,Female,HR,0,67.964728,0
1,69,31258.344158,16,Female,Engineering,1,21.825389,0
2,46,79176.734217,4,Male,HR,3,94.996118,0
3,32,47699.953137,4,Male,Engineering,1,78.634501,1
4,60,36395.191619,5,Male,Marketing,1,8.941100,1
...,...,...,...,...,...,...,...,...
194,44,33125.723933,13,Male,Sales,3,86.012240,1
195,69,69228.805705,10,Female,Engineering,0,77.985099,1
196,30,49573.678136,14,Female,Marketing,0,3.961883,1
197,58,24253.633311,27,Female,HR,1,48.050695,0


In [24]:
df['Department'] = ord_encoder.fit_transform(df[['Department']]).astype(int)
df

Unnamed: 0,Age,Salary,Experience,Gender,Department,Education,LocationScore,Purchased
0,56,51905.183591,27,Female,1,0,67.964728,0
1,69,31258.344158,16,Female,0,1,21.825389,0
2,46,79176.734217,4,Male,1,3,94.996118,0
3,32,47699.953137,4,Male,0,1,78.634501,1
4,60,36395.191619,5,Male,2,1,8.941100,1
...,...,...,...,...,...,...,...,...
194,44,33125.723933,13,Male,3,3,86.012240,1
195,69,69228.805705,10,Female,0,0,77.985099,1
196,30,49573.678136,14,Female,2,0,3.961883,1
197,58,24253.633311,27,Female,1,1,48.050695,0
