## Emoji Predictor

### Step-1. Get The Emoji Package

In [1]:
# !pip install emoji

In [2]:
import emoji
import pandas as pd
import numpy as np
from keras.utils import to_categorical
from keras.layers import *
from keras.models import Sequential
from keras.callbacks import EarlyStopping, ModelCheckpoint
pd.set_option('mode.chained_assignment', None)

Using TensorFlow backend.


In [3]:
# print(emoji.EMOJI_UNICODE)
emoji_dictionary = {"0": "\u2764\uFE0F",
                    "1": ":baseball:",
                    "2": ":grinning_face_with_big_eyes:",
                    "3": ":disappointed_face:",
                    "4": ":fork_and_knife:"
                   }

In [4]:
for e in emoji_dictionary.values():
    print(emoji.emojize(e))

‚ù§Ô∏è
‚öæ
üòÉ
üòû
üç¥


### Step-2. Processing A Custom Dataset

In [5]:
train = pd.read_csv("Dataset/train_emoji.csv", header=None)
test = pd.read_csv("Dataset/test_emoji.csv", header=None)

In [6]:
train.head()

Unnamed: 0,0,1,2,3
0,never talk to me again,3,,
1,I am proud of your achievements,2,,
2,It is the worst day in my life,3,,
3,Miss you so much,0,,[0]
4,food is life,4,,


In [7]:
data = train.values
print(data.shape)

(132, 4)


In [8]:
X_train = train[0]
Y_train = train[1]

X_test = test[0]
Y_test = test[1]

#### Printing The Sentences With Emojis !

In [9]:
for i in range(10):
    print(X_train[i], emoji.emojize(emoji_dictionary[str(Y_train[i])]))

never talk to me again üòû
I am proud of your achievements üòÉ
It is the worst day in my life üòû
Miss you so much ‚ù§Ô∏è
food is life üç¥
I love you mum ‚ù§Ô∏è
Stop saying bullshit üòû
congratulations on your acceptance üòÉ
The assignment is too long  üòû
I want to go play ‚öæ


### Step-3. Converting Sentences Into Embeddings Using Glove Vector

In [10]:
f = open("glove.6B.50d.txt", encoding='utf-8')

In [11]:
embeddings_index = {}

for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float')
    embeddings_index[word] = coefs
    
f.close()

In [12]:
print(embeddings_index["eat"])
print("Shape : ", embeddings_index["eat"].shape)

[ 6.4295e-01 -4.2946e-01 -5.4277e-01 -1.0307e+00  1.2056e+00 -2.7174e-01
 -6.3561e-01 -1.5065e-02  3.7856e-01  4.6474e-02 -1.3102e-01  6.0500e-01
  1.6391e+00  2.3940e-01  1.2128e+00  8.3178e-01  7.3893e-01  1.5200e-01
 -1.4175e-01 -8.8384e-01  2.0829e-02 -3.2545e-01  1.8035e+00  1.0045e+00
  5.8484e-01 -6.2031e-01 -4.3296e-01  2.3562e-01  1.3027e+00 -8.1264e-01
  2.3158e+00  1.1030e+00 -6.0608e-01  1.0101e+00 -2.2426e-01  1.8908e-02
 -1.0931e-01  3.8350e-01  7.7362e-01 -8.1927e-02 -3.4040e-01 -1.5143e-03
 -5.6640e-02  8.7359e-01  1.4805e+00  6.9421e-01 -3.0966e-01 -9.0826e-01
  3.7277e-03  8.4550e-01]
Shape :  (50,)


### Step-4. Converting Sentences Into Vectors (Embedding Layer Output)

In [13]:
def embedding_output(X):
    maxLen = 10
    emb_dim = 50
    embedding_out = np.zeros((X.shape[0], maxLen, emb_dim))
    
    for ix in range(X.shape[0]):
        X[ix] = X[ix].split()
        for ij in range(len(X[ix])):
            # Go To Every Word In The Current (ix) Sentence
            try:
                embedding_out[ix][ij] = embeddings_index[X[ix][ij].lower()]
            except:
                embedding_out[ix][ij] = np.zeros((50,))
    
    return embedding_out

In [14]:
embedding_matrix_train = embedding_output(X_train)
embedding_matrix_test = embedding_output(X_test)

In [15]:
print(X_train[0])
print(len(X_train[0]))

['never', 'talk', 'to', 'me', 'again']
5


In [16]:
print(embedding_matrix_train.shape)
print(embedding_matrix_test.shape)

(132, 10, 50)
(56, 10, 50)


In [17]:
Y_train = to_categorical(Y_train, num_classes=5)
Y_test = to_categorical(Y_test, num_classes=5)
print(Y_train.shape, Y_test.shape)
print(Y_train[0])

(132, 5) (56, 5)
[0. 0. 0. 1. 0.]


### Step-5. Define The RNN/LSTM Model

In [18]:
model = Sequential()
model.add(LSTM(64, input_shape=(10,50))) # 10 Words In Each Sentence With Each Word Being A 50 Dimension Vector
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 64)                29440     
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 325       
_________________________________________________________________
activation_1 (Activation)    (None, 5)                 0         
Total params: 29,765
Trainable params: 29,765
Non-trainable params: 0
_________________________________________________________________


In [19]:
earlystop = EarlyStopping(monitor="val_accuracy", patience=10)
checkpoint = ModelCheckpoint("Best_LSTM_Model.h5", monitor='val_loss', verbose=True, save_best_only=True)
hist = model.fit(embedding_matrix_train, Y_train, epochs=100, batch_size=64, shuffle=True, validation_split=0.2, callbacks=[earlystop, checkpoint])

Train on 105 samples, validate on 27 samples
Epoch 1/100

Epoch 00001: val_loss improved from inf to 1.57644, saving model to Best_LSTM_Model.h5
Epoch 2/100

Epoch 00002: val_loss did not improve from 1.57644
Epoch 3/100

Epoch 00003: val_loss did not improve from 1.57644
Epoch 4/100

Epoch 00004: val_loss did not improve from 1.57644
Epoch 5/100

Epoch 00005: val_loss did not improve from 1.57644
Epoch 6/100

Epoch 00006: val_loss did not improve from 1.57644
Epoch 7/100

Epoch 00007: val_loss did not improve from 1.57644
Epoch 8/100

Epoch 00008: val_loss did not improve from 1.57644
Epoch 9/100

Epoch 00009: val_loss did not improve from 1.57644
Epoch 10/100

Epoch 00010: val_loss did not improve from 1.57644
Epoch 11/100

Epoch 00011: val_loss did not improve from 1.57644
Epoch 12/100

Epoch 00012: val_loss improved from 1.57644 to 1.56205, saving model to Best_LSTM_Model.h5
Epoch 13/100

Epoch 00013: val_loss improved from 1.56205 to 1.53403, saving model to Best_LSTM_Model.h5
Epo

In [20]:
# Loading The Best Model
model.load_weights("Best_LSTM_Model.h5")

In [21]:
pred = model.predict_classes(embedding_matrix_test)
print(pred)

[4 3 2 2 2 2 3 2 4 2 1 2 2 3 1 3 3 2 3 4 0 0 4 2 3 3 1 0 1 2 2 1 2 3 0 2 2
 0 4 0 1 0 2 1 2 0 3 2 3 1 1 0 3 2 2 3]


In [22]:
print("Testing Accuracy : ", round(100 * model.evaluate(embedding_matrix_test, Y_test)[1], 2), "%")

Testing Accuracy :  71.43 %


In [23]:
for i in range(30):
    print("\n", " ".join(X_test[i]))
    print("Actual : ",emoji.emojize(emoji_dictionary[str(np.argmax(Y_test[i]))]))
    print("Prediction : ",emoji.emojize(emoji_dictionary[str(pred[i])]))    


 I want to eat
Actual :  üç¥
Prediction :  üç¥

 he did not answer
Actual :  üòû
Prediction :  üòû

 he got a very nice raise
Actual :  üòÉ
Prediction :  üòÉ

 she got me a nice present
Actual :  üòÉ
Prediction :  üòÉ

 ha ha ha it was so funny
Actual :  üòÉ
Prediction :  üòÉ

 he is a good friend
Actual :  üòÉ
Prediction :  üòÉ

 I am upset
Actual :  üòû
Prediction :  üòû

 We had such a lovely dinner tonight
Actual :  üòÉ
Prediction :  üòÉ

 where is the food
Actual :  üç¥
Prediction :  üç¥

 Stop making this joke ha ha ha
Actual :  üòÉ
Prediction :  üòÉ

 where is the ball
Actual :  ‚öæ
Prediction :  ‚öæ

 work is hard
Actual :  üòû
Prediction :  üòÉ

 This girl is messing with me
Actual :  üòû
Prediction :  üòÉ

 are you serious
Actual :  üòû
Prediction :  üòû

 Let us go play baseball
Actual :  ‚öæ
Prediction :  ‚öæ

 This stupid grader is not working
Actual :  üòû
Prediction :  üòû

 work is horrible
Actual :  üòû
Prediction :  üòû

 Congratulation 