## Text Sequence Prediction

### A dataset consisting of multiple text sentence

In [1]:
import warnings
warnings.filterwarnings('ignore')
import re

In [2]:
text_data = """
Long Short-Term Memory Networks is a deep learning, sequential neural network that allows information to persist. It is a special type of Recurrent Neural Network which is capable of handling the vanishing gradient problem faced by RNN. LSTM was designed by Hochreiter and Schmidhuber that resolves the problem caused by traditional rnns and machine learning algorithms. LSTM can be implemented in Python using the Keras library.

Let’s say while watching a video, you remember the previous scene, or while reading a book, you know what happened in the earlier chapter. RNNs work similarly; they remember the previous information and use it for processing the current input. The shortcoming of RNN is they cannot remember long-term dependencies due to vanishing gradient. LSTMs are explicitly designed to avoid long-term dependency problems.

This article will cover all the basics about LSTM, including its meaning, architecture, applications, and gates.

Learning Objectives
Understand what LSTM is.
Understand the architecture and working of an LSTM network.
Learn about the different parts/gates in an LSTM unit.
Note: If you are more interested in learning concepts in an Audio-Visual format, We have the tutorial of this entire article explained in the video below. If not, you may continue reading.

Table of contents
What is LSTM?
LSTM Architecture
Forget Gate
Input Gate
New Information
Output Gate
LTSM vs RNN
What are Bidirectional LSTMs?
Conclusion
Frequently Asked Questions
What is LSTM?
LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture widely used in Deep Learning. It excels at capturing long-term dependencies, making it ideal for sequence prediction tasks.

Unlike traditional neural networks, LSTM incorporates feedback connections, allowing it to process entire sequences of data, not just individual data points. This makes it highly effective in understanding and predicting patterns in sequential data like time series, text, and speech.
In the introduction to long short-term memory, we learned that it resolves the vanishing gradient problem faced by RNN, so now, in this section, we will see how it resolves this problem by learning the architecture of the LSTM. At a high level, LSTM works very much like an RNN cell. Here is the internal functioning of the LSTM network. The LSTM network architecture consists of three parts, as shown in the image below, and each part performs an individual function.

The first part chooses whether the information coming from the previous timestamp is to be remembered or is irrelevant and can be forgotten. In the second part, the cell tries to learn new information from the input to this cell. At last, in the third part, the cell passes the updated information from the current timestamp to the next timestamp. This one cycle of LSTM is considered a single-time step.

These three parts of an LSTM unit are known as gates. They control the flow of information in and out of the memory cell or lstm cell. The first gate is called Forget gate, the second gate is known as the Input gate, and the last one is the Output gate. An LSTM unit that consists of these three gates and a memory cell or lstm cell can be considered as a layer of neurons in traditional feedforward neural network, with each neuron having a hidden layer and a current state.

Just like a simple RNN, an LSTM also has a hidden state where H(t-1) represents the hidden state of the previous timestamp and Ht is the hidden state of the current timestamp. In addition to that, LSTM also has a cell state represented by C(t-1) and C(t) for the previous and current timestamps, respectively.

Bidirectional LSTMs (Long Short-Term Memory) are a type of recurrent neural network (RNN) architecture that processes input data in both forward and backward directions. In a traditional LSTM, the information flows only from past to future, making predictions based on the preceding context. However, in bidirectional LSTMs, the network also considers future context, enabling it to capture dependencies in both directions.

The bidirectional LSTM comprises two LSTM layers, one processing the input sequence in the forward direction and the other in the backward direction. This allows the network to access information from past and future time steps simultaneously. As a result, bidirectional LSTMs are particularly useful for tasks that require a comprehensive understanding of the input sequence, such as natural language processing tasks like sentiment analysis, machine translation, and named entity recognition.

By incorporating information from both directions, bidirectional LSTMs enhance the model’s ability to capture long-term dependencies and make more accurate predictions in complex sequential data.
"""

In [3]:
print(f"Total characters including blank space : {len(text_data)}")
print(f"Total words : {len(text_data.split())}")

Total characters including blank space : 4773
Total words : 753


In [4]:
sentences = re.split(r'[.\n]', text_data)
#sentences

In [5]:
# Logic - if word having blank space then it will return 0 and it is then i have to exclude  it
print([1 if sentences[5].strip() else 0])

sentences = [sentence.strip() for sentence in sentences if sentence.strip()]

[0]


In [6]:
sentences[:40]

['Long Short-Term Memory Networks is a deep learning, sequential neural network that allows information to persist',
 'It is a special type of Recurrent Neural Network which is capable of handling the vanishing gradient problem faced by RNN',
 'LSTM was designed by Hochreiter and Schmidhuber that resolves the problem caused by traditional rnns and machine learning algorithms',
 'LSTM can be implemented in Python using the Keras library',
 'Let’s say while watching a video, you remember the previous scene, or while reading a book, you know what happened in the earlier chapter',
 'RNNs work similarly; they remember the previous information and use it for processing the current input',
 'The shortcoming of RNN is they cannot remember long-term dependencies due to vanishing gradient',
 'LSTMs are explicitly designed to avoid long-term dependency problems',
 'This article will cover all the basics about LSTM, including its meaning, architecture, applications, and gates',
 'Learning Objectiv

In [7]:
# Tokenization

from keras.preprocessing.text import Tokenizer

2023-11-25 23:31:48.363171: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-25 23:31:48.450053: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-11-25 23:31:48.451369: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [8]:
tokenizer = Tokenizer()

In [9]:
tokenizer.fit_on_texts(sentences)

In [10]:
print("\nTokenized Word with index")
tokenizer.word_index


Tokenized Word with index


{'the': 1,
 'lstm': 2,
 'in': 3,
 'and': 4,
 'of': 5,
 'a': 6,
 'is': 7,
 'to': 8,
 'network': 9,
 'information': 10,
 'it': 11,
 'cell': 12,
 'long': 13,
 'term': 14,
 'rnn': 15,
 'this': 16,
 'an': 17,
 'gate': 18,
 'that': 19,
 'by': 20,
 'input': 21,
 'architecture': 22,
 'memory': 23,
 'learning': 24,
 'neural': 25,
 'lstms': 26,
 'are': 27,
 'bidirectional': 28,
 'as': 29,
 'from': 30,
 'previous': 31,
 'what': 32,
 'current': 33,
 'data': 34,
 'timestamp': 35,
 'state': 36,
 'short': 37,
 'problem': 38,
 'traditional': 39,
 'be': 40,
 'you': 41,
 'or': 42,
 'for': 43,
 'dependencies': 44,
 'gates': 45,
 'like': 46,
 'part': 47,
 'hidden': 48,
 'sequential': 49,
 'recurrent': 50,
 'vanishing': 51,
 'gradient': 52,
 'resolves': 53,
 'can': 54,
 'remember': 55,
 'they': 56,
 'processing': 57,
 'parts': 58,
 'unit': 59,
 'we': 60,
 'at': 61,
 'sequence': 62,
 'tasks': 63,
 'time': 64,
 'three': 65,
 'one': 66,
 'also': 67,
 't': 68,
 'both': 69,
 'directions': 70,
 'future': 71,
 'n

In [11]:
print("\nNo. of occurence of each word")
tokenizer.word_counts


No. of occurence of each word


OrderedDict([('long', 8),
             ('short', 4),
             ('term', 8),
             ('memory', 6),
             ('networks', 2),
             ('is', 16),
             ('a', 18),
             ('deep', 2),
             ('learning', 6),
             ('sequential', 3),
             ('neural', 6),
             ('network', 10),
             ('that', 7),
             ('allows', 2),
             ('information', 10),
             ('to', 14),
             ('persist', 1),
             ('it', 9),
             ('special', 1),
             ('type', 2),
             ('of', 20),
             ('recurrent', 3),
             ('which', 1),
             ('capable', 1),
             ('handling', 1),
             ('the', 55),
             ('vanishing', 3),
             ('gradient', 3),
             ('problem', 4),
             ('faced', 2),
             ('by', 7),
             ('rnn', 8),
             ('lstm', 25),
             ('was', 1),
             ('designed', 2),
             ('hochreiter', 1),

In [12]:
print("Total Unique words : ",len(tokenizer.word_counts))

Total Unique words :  291


In [13]:

# Table of input and output
# eg : for sentence : He plays Cricket
# input       output
# He          plays
# He plays    Cricket


input_sentences = []

for sentence in sentences:
    tokenized_sequence = tokenizer.texts_to_sequences([sentence])[0]
    
    for i in range(1,len(tokenized_sequence)):
        input_sentences.append(tokenized_sequence[:i+1])

In [14]:
input_sentences

[[13, 37],
 [13, 37, 14],
 [13, 37, 14, 23],
 [13, 37, 14, 23, 72],
 [13, 37, 14, 23, 72, 7],
 [13, 37, 14, 23, 72, 7, 6],
 [13, 37, 14, 23, 72, 7, 6, 73],
 [13, 37, 14, 23, 72, 7, 6, 73, 24],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9, 19],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9, 19, 74],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9, 19, 74, 10],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9, 19, 74, 10, 8],
 [13, 37, 14, 23, 72, 7, 6, 73, 24, 49, 25, 9, 19, 74, 10, 8, 119],
 [11, 7],
 [11, 7, 6],
 [11, 7, 6, 120],
 [11, 7, 6, 120, 75],
 [11, 7, 6, 120, 75, 5],
 [11, 7, 6, 120, 75, 5, 50],
 [11, 7, 6, 120, 75, 5, 50, 25],
 [11, 7, 6, 120, 75, 5, 50, 25, 9],
 [11, 7, 6, 120, 75, 5, 50, 25, 9, 121],
 [11, 7, 6, 120, 75, 5, 50, 25, 9, 121, 7],
 [11, 7, 6, 120, 75, 5, 50, 25, 9, 121, 7, 122],
 [11, 7, 6, 120, 75, 5, 50, 25, 9, 121, 7, 122, 5],
 [

In [15]:
print("Maximum word length in a sentence : ",max([len(x) for x in input_sentences]))

Maximum word length in a sentence :  40


In [16]:
# Padding 
from keras.utils import pad_sequences

In [17]:
padded_input_sentence = pad_sequences(input_sentences,padding='pre')

In [18]:
padded_input_sentence

array([[  0,   0,   0, ...,   0,  13,  37],
       [  0,   0,   0, ...,  13,  37,  14],
       [  0,   0,   0, ...,  37,  14,  23],
       ...,
       [  0,   0,   0, ..., 115,   3, 291],
       [  0,   0,   0, ...,   3, 291,  49],
       [  0,   0,   0, ..., 291,  49,  34]], dtype=int32)

In [19]:
X = padded_input_sentence[:,:-1]

In [20]:
X

array([[  0,   0,   0, ...,   0,   0,  13],
       [  0,   0,   0, ...,   0,  13,  37],
       [  0,   0,   0, ...,  13,  37,  14],
       ...,
       [  0,   0,   0, ..., 290, 115,   3],
       [  0,   0,   0, ..., 115,   3, 291],
       [  0,   0,   0, ...,   3, 291,  49]], dtype=int32)

In [21]:
y = padded_input_sentence[:,-1]

In [22]:
y[:50]

array([ 37,  14,  23,  72,   7,   6,  73,  24,  49,  25,   9,  19,  74,
        10,   8, 119,   7,   6, 120,  75,   5,  50,  25,   9, 121,   7,
       122,   5, 123,   1,  51,  52,  38,  76,  20,  15, 124,  77,  20,
       125,   4, 126,  19,  53,   1,  38, 127,  20,  39,  78], dtype=int32)

In [23]:
# Applying one hot categorization
from keras.utils import to_categorical

In [24]:
y_pad = to_categorical(y,num_classes=292)

In [25]:
for i,c in enumerate(y_pad[0]):
    if c == float(1):
        print(i)

37


In [26]:
for k,v in tokenizer.word_index.items():
    if v == 37:
        print(k)

short


In [27]:
X.shape

(717, 39)

In [28]:
y_pad.shape

(717, 292)

## Model Architecture

In [29]:
from keras import Sequential
from keras.layers import Embedding,LSTM,Dense

In [30]:
model = Sequential([
    Embedding(292,120,input_length=39), # total_words, define o/p dense no, input_length => highest count of word in sentence
    LSTM(150),  # 150 nodes for LSTM Timespam
    Dense(292,activation='softmax')  # total_words and activation fn
])

2023-11-25 23:31:58.783822: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-11-25 23:31:58.789841: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-11-25 23:31:58.793316: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

In [31]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [32]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 39, 120)           35040     
                                                                 
 lstm (LSTM)                 (None, 150)               162600    
                                                                 
 dense (Dense)               (None, 292)               44092     
                                                                 
Total params: 241,732
Trainable params: 241,732
Non-trainable params: 0
_________________________________________________________________


In [33]:
model.fit(X,y_pad,epochs=50)

Epoch 1/50


2023-11-25 23:31:59.724046: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-11-25 23:31:59.727151: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-11-25 23:31:59.729492: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f27b013fd90>

In [40]:
print(model.evaluate(X,y_pad))

[0.2864013910293579, 0.9762901067733765]


## Building Predictive system

In [35]:
import numpy as np
import time

In [44]:

input_text = "Unlike"

for n in range(1,10):

    # tokenization

    token_text = tokenizer.texts_to_sequences([input_text])[0]

    # padding

    padded_token_text = pad_sequences([token_text],maxlen=39,padding='pre')
    # print(padded_token_text)

    # prediction

    pos = np.argmax(model.predict(padded_token_text))
    # print(pos)

    for word,index in tokenizer.word_index.items():    
        if index == pos:
            input_text = input_text + " " + word
            time.sleep(1)
            print(input_text+ ".",end="\n\n")


Unlike traditional.

Unlike traditional neural.

Unlike traditional neural networks.

Unlike traditional neural networks lstm.

Unlike traditional neural networks lstm incorporates.

Unlike traditional neural networks lstm incorporates feedback.

Unlike traditional neural networks lstm incorporates feedback connections.

Unlike traditional neural networks lstm incorporates feedback connections allowing.

Unlike traditional neural networks lstm incorporates feedback connections allowing it.



In [37]:
#import joblib as jb

In [38]:
#jb.dump(model,'lstm_next_word.jb')

In [39]:
#jb.dump(tokenizer,'lstm_next_word_tokenizer.jb')