# One hot encoding

Linear models will love you for it and deep learning won't let you run without it.
<p>What is it?</p>

$OneHotEncoded = \begin{bmatrix} 
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0\\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 \\
\end{bmatrix}$

In [2]:
import numpy as np
sexy_list = ["Muzo","Omer","Jimmy"]
random_list = [np.random.choice(sexy_list) for x in range(50)]

In [4]:
random_list[:5]

['Omer', 'Omer', 'Muzo', 'Jimmy', 'Muzo']

In [15]:
mapping = {"Muzo":0, "Omer":1, "Jimmy":2}
mapped = list(map(lambda x: mapping[x],random_list))

In [16]:
def onehot(x):
    unique = set(x)
    shape = (len(x),len(unique))
    hot = np.zeros(shape,dtype='int8')
    for i,j in enumerate(x):
        hot[i][j-1]=1
    return hot

In [18]:
onehot(mapped)[:5]

array([[1, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1]], dtype=int8)

In [25]:
# Alternatively
from sklearn.preprocessing import OneHotEncoder
coder = OneHotEncoder()
coder.fit_transform(np.array(mapped).reshape(-1, 1)).todense()[:5]

matrix([[0., 1., 0.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]])

But why?

Suppose I have $M = \begin{bmatrix} 
1 & 2 & 3 \\
4 & 5 & 6\\
7 & 8 & 9 \\
\end{bmatrix}$. How do I access the first row?

In [30]:
np.array([1,0,0]) @ np.array([([1,2,3],[4,5,6],[7,8,9])]) # Recall @ is the dot product operator

array([[1, 2, 3]])