# Word Embedding
Word Embeddings are a type of word representation in vector form in the predefined vector space where similar words closer in the vector space. Each word is mapped to one vector in the predefined vector space, where the vector is learned using a neural network.

### Embedding layer
Keras API provides 'Embedding Layer' which can be used for learning word embeddings. But it requires that document be cleaned and prepared. Each word is encoded as one hot vector. The size of vector space is then the size of known vocabulary. The trained weights of the embedding layers makes the embedding matrix, which one hot encoded words are mapped to the word vectors.

In [82]:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow import keras

In [36]:
# define documents
reviews = ['Well done!',
        'Good work',
        'Great effort',
        'nice work',
        'Excellent!',
        'Weak',
        'Poor effort!',
        'not good',
        'poor work',
        'Could have done better.']
# define class labels
sentiments = np.array([1,1,1,1,1,0,0,0,0,0])

```one_hot()``` provided by Keras API assigns each word a unique number within the defined range. The define range must be at lease the vocabulary size Here, we have defined the vocabulary size 20. So, each word is assigned a number from 1 to 20.

In [53]:
# integer encode the documents
vocab_size = 30
one_hot_encoded_reviews = [one_hot(d, vocab_size) for d in reviews]
print(one_hot_encoded_reviews)

[[4, 19], [9, 20], [23, 19], [6, 20], [12], [1], [2, 19], [23, 9], [2, 20], [24, 5, 19, 1]]


All the inputs should have the same length. The documents with lenth less than the maximum length must be zero padded.

In [54]:
max_length = 5 # maximum lenth of the documents, here the last document has length 4
padded_docs = pad_sequences(one_hot_encoded_reviews, maxlen=max_length, padding='post')
print(padded_docs)

[[ 4 19  0  0  0]
 [ 9 20  0  0  0]
 [23 19  0  0  0]
 [ 6 20  0  0  0]
 [12  0  0  0  0]
 [ 1  0  0  0  0]
 [ 2 19  0  0  0]
 [23  9  0  0  0]
 [ 2 20  0  0  0]
 [24  5 19  1  0]]


Lets define a Sequential model with an embedding layer. 

In [55]:
# dim of embedding vector
vector_dim = 10

model = keras.models.Sequential([
    keras.layers.Embedding(input_dim=vocab_size, 
                           output_dim=vector_dim, 
                           input_length=max_length,
                           name='Embedding'),
    keras.layers.Flatten(),
    keras.layers.Dense(1, activation='sigmoid')
])

#Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Finally, fit and evaluate the model

In [64]:
# fit the model
X = padded_docs
y = labels

model.fit(X, y, epochs=100, verbose=0)
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Accuracy: 100.000000


In [65]:
model.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Embedding (Embedding)        (None, 5, 10)             300       
_________________________________________________________________
flatten_2 (Flatten)          (None, 50)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 51        
Total params: 351
Trainable params: 351
Non-trainable params: 0
_________________________________________________________________


In [78]:
# one hot vector of word 'Well'

well_one_hot = np.zeros(shape=(vocab_size,))
well_one_hot[4] = 1    # 'well' was assigned the number 4
well_one_hot = np.array(well_one_hot)[:,np.newaxis]
print(well_one_hot)

[[0.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]
 [0.]]


In [73]:
embedding_matrix = model.get_layer('Embedding').get_weights()[0]
print(embedding_matrix)
print(embedding_matrix.shape)

[[-2.51123607e-01 -9.53367203e-02 -1.20475568e-01  6.17100894e-02
   1.48435742e-01  3.45391303e-01  8.00744444e-02  2.83507884e-01
   4.60204817e-02 -9.43058133e-02]
 [-9.77457389e-02  2.04347163e-01 -3.50886971e-01  2.73626328e-01
   3.45430583e-01 -7.51033798e-02 -3.18316191e-01 -3.83961767e-01
  -2.25290105e-01 -3.76695216e-01]
 [-4.39869910e-01  3.62886965e-01 -2.64731318e-01  2.90962964e-01
   2.97272891e-01 -3.94604146e-01 -4.18781847e-01 -3.42983127e-01
  -2.95747161e-01 -2.93500364e-01]
 [-1.17079243e-02 -3.39016542e-02  4.88743670e-02 -3.86748537e-02
  -2.44597923e-02  2.07471885e-02  7.23155588e-03 -7.88573176e-03
  -4.29349020e-03 -1.39094889e-04]
 [ 3.92505467e-01 -2.70496339e-01  2.16423348e-01 -2.72191942e-01
  -2.10554734e-01  3.80384147e-01  3.73877615e-01  2.38095805e-01
   2.82703876e-01  2.88644552e-01]
 [ 1.81708232e-01 -1.57693431e-01 -2.17590198e-01  8.76898244e-02
   1.84784934e-01 -1.99227288e-01  1.46499470e-01 -1.67989641e-01
   1.64520845e-01 -2.36917213e-01

In [81]:
# Embedding vector of 'well'

well_emb_vector = embedding_matrix.T.dot(well_one_hot)
print(well_emb_vector.T)

[[ 0.39250547 -0.27049634  0.21642335 -0.27219194 -0.21055473  0.38038415
   0.37387761  0.23809581  0.28270388  0.28864455]]


Reference:  https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/