1. Implementing a Basic RNN Model
Step 1: Choose and Preprocess the Dataset
Let's assume you're working with text data for a text generation task (e.g., Shakespeare's text).

Dataset Loading and Preprocessing:

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load dataset (Shakespeare's text)
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# Tokenize the text
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts([text])
total_chars = len(tokenizer.word_index) + 1

# Create sequences
input_sequences = []
for i in range(0, len(text) - 100):
    seq = text[i:i+100]
    input_sequences.append(tokenizer.texts_to_sequences([seq])[0])

# Convert to numpy arrays and split into input (X) and output (y)
X = np.array([seq[:-1] for seq in input_sequences])
y = np.array([seq[-1] for seq in input_sequences])

# One-hot encode y
y = tf.keras.utils.to_categorical(y, num_classes=total_chars)


Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


Step 2: Build the RNN Model
Basic RNN Model Implementation::

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

# Assuming total_chars is defined, remove input_length
model = Sequential([
    Embedding(total_chars, 64),  # Embedding layer without input_length
    SimpleRNN(128),  # RNN layer
    Dense(total_chars, activation='softmax')  # Output layer
])

# Build the model explicitly
model.build(input_shape=(None, 99))  # Specify input shape explicitly

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()  # This should now display the correct output shape and parameters


Step 3: Train the Model
Training:

In [None]:
model.fit(X, y, epochs=10, batch_size=128)


Epoch 1/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1064s[0m 122ms/step - accuracy: 0.5339 - loss: 1.5240
Epoch 2/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1128s[0m 125ms/step - accuracy: 0.5342 - loss: 1.5211
Epoch 3/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1078s[0m 124ms/step - accuracy: 0.5364 - loss: 1.5142
Epoch 4/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1075s[0m 123ms/step - accuracy: 0.5366 - loss: 1.5125
Epoch 5/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1114s[0m 125ms/step - accuracy: 0.5352 - loss: 1.5159
Epoch 6/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1082s[0m 124ms/step - accuracy: 0.5299 - loss: 1.5388
Epoch 7/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1092s[0m 123ms/step - accuracy: 0.5375 - loss: 1.5075
Epoch 8/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1066s[0m 122ms/step - accuracy: 0.5381

<keras.src.callbacks.history.History at 0x7df151a0b100>

In [None]:
def generate_text(model, tokenizer, seed_text, num_chars):
    for _ in range(num_chars):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=99, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        next_char = tokenizer.index_word[np.argmax(predicted)]
        seed_text += next_char
    return seed_text

print(generate_text(model, tokenizer, seed_text="To be, or not to be,", num_chars=100))


To be, or not to be, and the way the way the way the way the way the way the way the way the way the way the way the way


The model is showing stable but slow improvement, with accuracy and loss barely changing over the epochs. This suggests that it might be struggling to learn more complex patterns, possibly due to a small learning rate. Additionally, the slight fluctuations in accuracy and loss in some epochs, like the sixth one, hint at potential overfitting or that the model has plateaued in its learning.

To address these issues, you could consider increasing the model's complexity by adding more layers, such as stacking additional RNNs or using a more advanced RNN variant like LSTM or GRU. Adjusting the learning rate might also help speed up the learning process. Introducing regularization techniques, like dropout layers, could prevent overfitting, especially if you notice the model starts to overfit with more training epochs.

Step 4: Evaluate and Generate Text
Text Generation:

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

# Stacked RNN model
model_stacked = Sequential([
    Embedding(total_chars, 64),  # Embedding layer without input_length
    SimpleRNN(128, return_sequences=True),  # First RNN layer, return sequences
    SimpleRNN(128),  # Second RNN layer
    Dense(total_chars, activation='softmax')  # Output layer
])

# Build the model explicitly
model_stacked.build(input_shape=(None, 99))  # Specify input shape

# Compile the model
model_stacked.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Show the model summary
model_stacked.summary()  # Now it should show the correct output shape and parameters

# Train the model
model_stacked.fit(X, y, epochs=4, batch_size=128)



Epoch 1/4
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2189s[0m 251ms/step - accuracy: 0.3817 - loss: 2.1274
Epoch 2/4
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2175s[0m 250ms/step - accuracy: 0.5136 - loss: 1.6033
Epoch 3/4
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2166s[0m 249ms/step - accuracy: 0.5323 - loss: 1.5288
Epoch 4/4
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2163s[0m 248ms/step - accuracy: 0.5408 - loss: 1.4953


<keras.src.callbacks.history.History at 0x7bc86678e770>

The model demonstrates a steady improvement in accuracy, indicating that the increased complexity from stacking RNN layers enhances its learning capabilities. However, this added complexity comes at a cost, as the training time is significantly longer, reflecting the higher computational demands of stacked RNNs. Despite the progress in accuracy, the model's performance remains below optimal levels, suggesting that further tuning is needed. Exploring alternative architectures, such as LSTM or GRU, might help achieve better results and address the model's current limitations.

2. Stacking RNN Layers and Bi-directional RNNs
Stacked RNNs:
Modify the model:

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, Bidirectional

# Bi-directional RNN model
model_bi = Sequential([
    Embedding(total_chars, 64),  # Embedding layer without input_length
    Bidirectional(SimpleRNN(128)),  # Bi-directional RNN layer
    Dense(total_chars, activation='softmax')  # Output layer
])

# Build the model explicitly
model_bi.build(input_shape=(None, 99))  # Specify input shape

# Compile the model
model_bi.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Show the model summary
model_bi.summary()  # Now it should show the correct output shape and parameters


In [None]:

model_bi.fit(X, y, epochs=5, batch_size=128)


Epoch 1/5
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1822s[0m 209ms/step - accuracy: 0.3690 - loss: 2.1687
Epoch 2/5
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1775s[0m 204ms/step - accuracy: 0.4928 - loss: 1.6898
Epoch 3/5
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1849s[0m 207ms/step - accuracy: 0.5126 - loss: 1.6106
Epoch 4/5
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1781s[0m 202ms/step - accuracy: 0.5200 - loss: 1.5797
Epoch 5/5
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1756s[0m 201ms/step - accuracy: 0.5260 - loss: 1.5567


<keras.src.callbacks.history.History at 0x7f4e8aa863e0>

The model shows a clear improvement in accuracy over the epochs, starting at 36.9% and reaching 52.6% by the fifth epoch. This steady progress suggests that the model is effectively learning from the data. However, the relatively high loss values indicate that there's still room for improvement in optimizing the model. The use of a Bidirectional RNN layer has increased the complexity of the model, as seen in the total parameters, and has also contributed to better performance compared to a single-directional RNN. Despite these improvements, the accuracy is still not ideal, which means further tuning, such as adjusting the learning rate or experimenting with different architectures like LSTM or GRU, could help achieve better results. Additionally, the training time is substantial, reflecting the computational demands of the bidirectional layer.

Bi-Directional RNNs:
Modify the model:

In [4]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, MaxPooling1D, SimpleRNN, Dense

# Hybrid model combining CNN and RNN
model_hybrid = Sequential([
    Embedding(total_chars, 64),  # Embedding layer without input_length
    Conv1D(64, kernel_size=3, activation='relu'),  # Convolutional layer
    MaxPooling1D(pool_size=2),  # MaxPooling layer
    SimpleRNN(128),  # SimpleRNN layer
    Dense(total_chars, activation='softmax')  # Output layer
])

# Build the model explicitly with an input shape
model_hybrid.build(input_shape=(None, 99))  # Specify input shape

# Compile the model
model_hybrid.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Show the model summary
model_hybrid.summary()  # Now it should show the correct output shape and parameters


In [6]:

# Train the model
model_hybrid.fit(X, y, epochs=20, batch_size=128)

Epoch 1/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m799s[0m 92ms/step - accuracy: 0.3534 - loss: 2.2449
Epoch 2/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m761s[0m 87ms/step - accuracy: 0.3557 - loss: 2.2337
Epoch 3/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m821s[0m 89ms/step - accuracy: 0.3556 - loss: 2.2323
Epoch 4/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m770s[0m 88ms/step - accuracy: 0.3588 - loss: 2.2217
Epoch 5/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m809s[0m 89ms/step - accuracy: 0.3602 - loss: 2.2166
Epoch 6/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m791s[0m 88ms/step - accuracy: 0.3607 - loss: 2.2133
Epoch 7/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m780s[0m 89ms/step - accuracy: 0.3613 - loss: 2.2098
Epoch 8/20
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m790s[0m 88ms/step - accuracy: 0.3631 - loss: 2.2058


<keras.src.callbacks.history.History at 0x7bc8665710c0>

The model’s performance reveals several important aspects. Firstly, the accuracy shows a slow but steady improvement, increasing from 35.3% to 36.6% over 20 epochs. This gradual rise indicates that the model is making incremental progress but still exhibits relatively low performance. Despite the slight decrease in loss from 2.2449 to 2.1904, the improvements are modest, suggesting that while the model is learning, it is doing so at a slow pace.

The training time per epoch remains consistent, averaging around 13 minutes. This indicates that the model’s complexity and the data size are manageable, but the training efficiency might be impacted by the model’s slow learning rate or its architecture. The model incorporates a Conv1D layer followed by MaxPooling1D and a SimpleRNN layer, which should ideally help in capturing both spatial and temporal features. However, the limited improvement in accuracy and the minimal reduction in loss imply that the feature extraction might not be optimal or that further adjustments are needed.

In summary, while the current model architecture theoretically supports learning both local and sequential patterns, the slow progress in performance suggests that additional tuning or exploration of more complex architectures might be necessary to achieve better results.