# Word Embedding using Keras Embedding Layer

Link to the Youtube tutorial video: https://www.youtube.com/watch?v=Fuw0wv3X-0o&list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO&index=41

1) **Important things to note for this tutorial:**
    1) Using supervised learning method (EG: neural network here) to perform food review (sentiment) classification (Classify if a food review sentence is a positive or negative review) so that we can get word embedding as the by-product of the food review classification. 
    2) It is important to note here that our main goal in this food review classification tasks (tutorial) is to get word embedding (from the embedding layer of the neural network), not to get a good neural network for food review classification.
    3) Word embedding are the parameters/weights in the neural network that used to perform the corresponding sentiment classification tasks.


2) **Important concept of obtaining word embeddings from sentiment classification tasks using supervised learning approach:**
    1) <img src="hidden\photo2.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
        1) The similar vocabularies/words (EG: Cummins & Dhoni are human name; Australia & Zimbabwe are country name) will have similar word feature vector (The value of each feature is close to each other OR even same) 
    2) <img src="hidden\photo3.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    3) <img src="hidden\photo4.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    4) <img src="hidden\photo5.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    5) <img src="hidden\photo6.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    6) <img src="hidden\photo7.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    7) <img src="hidden\photo8.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    8) <img src="hidden\photo9.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />

In [17]:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding

# Load the dataset

In [18]:
# Assume the reviews variable stores the features(reviews) of food review dataset (in total, we only gathered/have 10 food reviews in the dataset here)
reviews = ['nice food',
           'amazing restaurant',
           'too good',
           'just loved it!',
           'will go again',
           'horrible food',
           'never go there',
           'poor service',
           'poor quality',
           'needs improvement']

# The label (ground truth) of each sample of food review dataset (representing each food review either is good [positve] or bad [negative])
sentiment = np.array([1,1,1,1,1,0,0,0,0,0])

# Data Preprocessing

## Convert each word(vocabulary) into one-hot-encoding representation

Vocabulary size refers to the total number of unique vocabulary/word available in a dataset (EG: the food review dataset, that consists of all food review sentences available)

In [19]:
# Initialize the vocabulary size (The vocabulary size is set as 50 here). When you found there are vocabularies assigned with the same unique number, you increase your vocabulary size to solve the problem.
vocab_size = 50

# Encode all the food reviews into one-hot-encoding representation OR encoded vector, using one_hot(). one_hot("the words you want to convert into one-hot-encoding representation", the vocabulary size = maximum word size). Then the one_hot() will assign a unique & fixed number (between 0 and the maximum word size provided) to each word provided. Internally, keras will convert the unique number into one-hot-encoding representation (EG:0, 0, 1, 0, ).
encoded_reviews = [one_hot(d , 30) for d in reviews]
print(encoded_reviews)

[[17, 22], [27, 21], [28, 16], [13, 3, 4], [25, 7, 24], [12, 22], [11, 7, 13], [10, 29], [10, 20], [11, 8]]


## Pad each food review sentence so that each food review sentence has same word/vocabulary size/number (so that later, the input layer of neural network can accept all the food review sentences)

1) The image below shows the food review sentence consists of 3 vocabularies/words  <br />
    <img src="hidden\photo1.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />

In [20]:
# max_length refers to the maximum number of vocabulary/word that each sentence has
max_length = 3

# Pad each food review sentence so that each food review sentence has same word size (here, each food review sentence has word size of 3, even after padding)
# pad_sequences(dataset_of_encoded_reviews, maximum_word_size_of_each_sentence, padding='post'_means_pad_until_the_end_if_the_word_size_of_the_sentence_does_not_reach_maximum)
padded_reviews = pad_sequences(encoded_reviews, maxlen=max_length, padding='post')
print(padded_reviews)

[[17 22  0]
 [27 21  0]
 [28 16  0]
 [13  3  4]
 [25  7 24]
 [12 22  0]
 [11  7 13]
 [10 29  0]
 [10 20  0]
 [11  8  0]]


## Split the food review dataset into features and ground truths variables respectively

In [21]:
# The features variable containing the features of the food review dataset
X = padded_reviews

# The ground truths variable containing the ground truths of the food review dataset
Y = sentiment

# Develop the neural network (model) to perform food review classification

1) The neural network involved in this tutorial consists of 4 layers:
    1) Layer 1: Input layer
    2) Layer 2: Embedding layer
    3) Layer 3: Flatten layer
    4) Layer 4: Output layer
2) <img src="hidden\embedding_layer.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    1) A word embedding vector, also known as word feature vector, is the result of multiplying a word/vocabulary feature vector (a row of feature vector in the embedding matrix corresponds to a vocabulary/word, which having the paramaters/weights that obtained at previous iteration or initialized at the begining [not the latest one]) and its one-hot-encoding representation. It is the feature vector of a vocabulary/word.
    2) At the embedding layer (Emb.L), you can access all the word embedding vectors.
2) <img src="hidden\flatten_layer.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    1) Once you get word embedding vectors from your embedding layer, you want to flatten them into a 1D vector at the flatten layer (Flat.L). So the 3rd layer of the model is flatten layer.
3) <img src="hidden\output_layer.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
    1) The layer after the flatten layer (4th layer of the model) is one neuron sigmoid activation function, so it will be a dense layer with a sigma activation.


In [22]:
# The feature vector of each vocabulary has size of 4 (means each vocabulary is embedded to have 4 features)
embedded_vector_size = 4

model = Sequential() # Create a neural network (model)
model.add(Embedding(vocab_size, embedded_vector_size, input_shape = (max_length,), name='embedding')) # This layer is the 2nd layer of the model (after the 1st layer of the model called input layer, represented by the parameter: input_shape = (max_length,)), which is called embedding layer. name="embedding" means we call this layer as embedding. The input of the embedding layer is a 1D array.
model.add(Flatten()) # The 3rd layer of the model is flatten layer
model.add(Dense(1,activation='sigmoid')) # The 4th layer of the model is output layer, having only 1 output neuron followed by a sigmoid function as the activation function

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # usually we end up using adam as an optimizer. We use binary cross entropy here because the food review classification output is either 1 (the food review sentence is positive) or 0 (the food review sentence is negative)

# Show the summary of the model
model.summary()

In [23]:
# Train the model
model.fit(X, Y, epochs = 50, verbose = 0)

# Evaluate the model
loss, accuracy = model.evaluate(X, Y)

print('The loss of the model: ', loss)
print('The accuracy of the model: ', accuracy)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 165ms/step - accuracy: 1.0000 - loss: 0.6006
The loss of the model:  0.6005924940109253
The accuracy of the model:  1.0


# Access word embedding data

<img src="hidden\get-layer.png" alt="This image is a representation of the simple neural network" style="width: 450px;"/>  <br />
1) model.get_layer('embedding').get_weights()[0] returns the embedding matrix.
2) In the embedding matrix, each row is a word feature vector for a vocabulary.
3) Each word feature vector consists of 4 elements/weigths/values for 4 features respectively. 
4) There are 4 elements in each word feature vector (EG: W1, W2, W3) because we set the embedded_vector_size = 4.
5) Extra information: The way keras embedding layer works is during the process of solving the naturla language processing (NLP) task, it will compute the embeddings before flattening them for classification.

In [24]:
# model.get_layer('name_of_the_layer_you_give') retrieves the specified layer from the neural network, get_weights() returns the parameters/weights of that layer in 3D array. get_weights()[0] returns the parameters/weights of that layer in 2D array.
weights = model.get_layer('embedding').get_weights()[0]

print('The weights variable has size of ' + str(len(weights)) + ' rows (length), because the vocabulary size is set as ' + str(vocab_size) + '. In other words, this weight variable stores the embedding matrix.')
print('\nThe word embedding data:\n', weights)

print('\nThe unique number assigned to the vocabulary of "nice" is ' + str(encoded_reviews[0][0]) + ' , the word feature/embedding vector of the vocabulary of "nice":\n' + str(weights[encoded_reviews[0][0]]))
print('\nThe unique number assigned to the vocabulary of "improvement" is ' + str(encoded_reviews[9][1]) + ' , the word feature/embedding vector of the vocabulary of "improvement":\n' + str(weights[encoded_reviews[9][1]]))

# Insights:
# "nice" and "improvement" are not the similar word (EG: nice is an adjective; improvement is a noun), so the value of the same row of their respective word feature vectors should be different (the values of the same row of their respective word feature vectors should not close to each other).

The weights variable has size of 50 rows (length), because the vocabulary size is set as 50. In other words, this weight variable stores the embedding matrix.

The word embedding data:
 [[-0.00893866 -0.04105903 -0.00990777  0.0472002 ]
 [ 0.01166898 -0.00807108 -0.02471987 -0.02783774]
 [-0.04324081 -0.01338353  0.00510011 -0.03974064]
 [-0.04760553 -0.04837851 -0.05951424 -0.05904164]
 [ 0.03671586  0.08047783 -0.00788675 -0.0542903 ]
 [ 0.03022002 -0.04383122 -0.00568693  0.00713822]
 [-0.01144054 -0.01705159 -0.02937682 -0.04169852]
 [-0.02314463  0.00547925  0.04399073  0.03991968]
 [ 0.04490092  0.04464502  0.02929769  0.05410272]
 [-0.01785148  0.00149143  0.01942505 -0.04957661]
 [-0.06543125  0.05360292  0.07699166 -0.09298114]
 [-0.09568721  0.06171154  0.10459383 -0.07355255]
 [-0.09894915  0.03311571  0.03486707 -0.07115208]
 [ 0.06456498 -0.0963833   0.03941916  0.07568935]
 [-0.04084482 -0.04014798 -0.02574313 -0.00581142]
 [-0.046621    0.03380263 -0.01428796 -0.03250074