# One-Hot-Encoding
One-hot encoding is used to convert categorical variables into a format that can be readily used by machine learning algorithms. The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to represent the original categorical values. It is often useful to transform categorical features into vectors so that you can do vector operations (such as calculating the cosine distance) on them.

In [None]:
import pandas as pd
import numpy as np

For example, we want to convert/transform a categorical variable that contains the three network protocol 'ICMP', 'TCP', 'UDP' into features/variables contain only 0 and 1 values. One way to do this is to one-hot encode the protocols. That is, <br><br>
`| ICMP | TCP | UDP |`<br>
`|  0   |  1  |  0  |`<br>
`|  1   |  0  |  0  |`<br>
`|  0   |  0  |  1  |`<br>



In [None]:
# Create the data
protocolDF = pd.DataFrame({'protocol': ['TCP', 'IMCP', 'UDP']})

#view DataFrame
print(protocolDF)
protocolDF.shape[0]

  protocol
0      TCP
1     IMCP
2      UDP


3

In [None]:
# Next, we use Pandas' 'get_dummies' method to create the one-hot encoding
protocolOneHotdf = pd.DataFrame({'protocol': ['TCP', 'IMCP', 'UDP']})
protocolOneHotdf = pd.get_dummies(protocolOneHotdf['protocol'], prefix='protocol')
protocolOneHotdf

Unnamed: 0,protocol_IMCP,protocol_TCP,protocol_UDP
0,0,1,0
1,1,0,0
2,0,0,1


In [None]:
one_hot = pd.get_dummies(protocolDF["protocol"])
for row in range(protocolDF.shape[0]):
    protocolDF["protocol"].at[row]=str(one_hot.iloc[row,0])+','+str(one_hot.iloc[row,1])+','+str(one_hot.iloc[row,2])
protocolDF

#kddcup["protocol_type"].at[row] = str(one_hot.iloc[row, 0]) + ',' + str(one_hot.iloc[row, 1]) + ',' + str(one_hot.iloc[row, 2])

Unnamed: 0,protocol
0,10
1,100
2,1
