# Next Word Prediction with RNN, LSTM, and GRU

**how sequence models predict the next word in a sentence**.

We will:
- Load a small text corpus (Shakespeare sample).
- Preprocess text into input-output word sequences.
- Train and compare SimpleRNN, LSTM, and GRU models.
- Test models by typing partial sentences to see predictions.


## 1) Setup and Imports

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, GRU, Dense

print('TensorFlow version:', tf.__version__)

TensorFlow version: 2.19.0


## 2) Load Sample Text Corpus
We’ll use a few lines from Shakespeare for demonstration.

In [2]:
data = [
    "To be or not to be that is the question",
    "All the world is a stage and all the men and women merely players",
    "Some are born great some achieve greatness and some have greatness thrust upon them",
    "The course of true love never did run smooth",
    "If music be the food of love play on"
]

print("Sample lines:")
for line in data:
    print('-', line)

Sample lines:
- To be or not to be that is the question
- All the world is a stage and all the men and women merely players
- Some are born great some achieve greatness and some have greatness thrust upon them
- The course of true love never did run smooth
- If music be the food of love play on


## 3) Tokenize Text and Create Sequences
We create input-output pairs where X = sequence of words, y = next word.

In [3]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary size:', vocab_size)

sequences = []
for line in data:
    tokens = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(tokens)):
        seq = tokens[:i+1]
        sequences.append(seq)

print('Total sequences:', len(sequences))

# Pad sequences
maxlen = max(len(seq) for seq in sequences)
sequences = pad_sequences(sequences, maxlen=maxlen, padding='pre')

X, y = sequences[:,:-1], sequences[:,-1]
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)

print('X shape:', X.shape, 'y shape:', y.shape)

Vocabulary size: 41
Total sequences: 51
X shape: (51, 13) y shape: (51, 41)


## 4) Build Model Function
We can create SimpleRNN, LSTM, or GRU models with the same architecture.

In [4]:
def build_model(model_type='RNN', vocab_size=100, embed_dim=50, maxlen=10, rnn_units=64):
    model = Sequential()
    model.add(Embedding(vocab_size, embed_dim, input_length=maxlen))
    if model_type == 'RNN':
        model.add(SimpleRNN(rnn_units))
    elif model_type == 'LSTM':
        model.add(LSTM(rnn_units))
    elif model_type == 'GRU':
        model.add(GRU(rnn_units))
    model.add(Dense(vocab_size, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

## 5) Train Models
We will train each model briefly (small dataset, few epochs for demo).

In [5]:
EPOCHS = 200  # longer since dataset is small

model_rnn = build_model('RNN', vocab_size, embed_dim=50, maxlen=maxlen, rnn_units=64)
model_lstm = build_model('LSTM', vocab_size, embed_dim=50, maxlen=maxlen, rnn_units=64)
model_gru = build_model('GRU', vocab_size, embed_dim=50, maxlen=maxlen, rnn_units=64)

print('Training SimpleRNN...')
model_rnn.fit(X, y, epochs=EPOCHS, verbose=0)
print('Training LSTM...')
model_lstm.fit(X, y, epochs=EPOCHS, verbose=0)
print('Training GRU...')
model_gru.fit(X, y, epochs=EPOCHS, verbose=0)

Training SimpleRNN...




Training LSTM...
Training GRU...


<keras.src.callbacks.history.History at 0x7a9838158f20>

## 6) Generate Text Function
We’ll write a function to input a seed text and predict the next words.

In [6]:
def generate_text(model, tokenizer, seed_text, num_words=5, maxlen=10):
    result = seed_text
    for _ in range(num_words):
        token_list = tokenizer.texts_to_sequences([result])[0]
        token_list = pad_sequences([token_list], maxlen=maxlen, padding='pre')
        predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)[0]
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                result += ' ' + word
                break
    return result

## 7) Compare Outputs
Now try generating text with each model and compare results.

In [7]:
print("RNN:", generate_text(model_rnn, tokenizer, "to be", 5))
print("LSTM:", generate_text(model_lstm, tokenizer, "to be", 5))
print("GRU:", generate_text(model_gru, tokenizer, "to be", 5))

RNN: to be or not to be that
LSTM: to be or not is is the
GRU: to be or not to be that


## Teaching Notes
- RNN struggles with long-term dependencies.
- LSTM and GRU remember longer contexts better.
- This small demo corpus exaggerates differences but illustrates concepts.

## Conclusion
This example shows how RNN-family models can **generate sequences**. This intuition sets the stage for **Transformers (BERT, GPT)** which model sequences with attention instead of recurrence.