<a href="https://colab.research.google.com/github/vineetdave/LangChainTutorials/blob/main/AdvancedLSTM_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
# Cell 1: Import Libraries
# We're adding new tools for a more robust workflow
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam  # Import Adam
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore', category=Warning)

In [22]:
# Cell 2: Define Sample Data (Now with more text!)
# --- DATASET SIGNIFICANTLY EXPANDED ---
# The model needs much more data to learn real patterns.
data = """
Alice was beginning to get very tired of sitting by her sister on the bank,
and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, 'and what is the use of a book,' thought Alice 'without pictures or
conversations?' So she was considering in her own mind (as well as she
could, for the hot day made her feel very sleepy and stupid), whether
the pleasure of making a daisy-chain would be worth the trouble of
getting up and picking the daisies, when suddenly a White Rabbit with
pink eyes ran close by her.

There was nothing so very remarkable in that; nor did Alice
think it so very much out of the way to hear the Rabbit say to
itself, 'Oh dear! Oh dear! I shall be late!' (when she thought
it over afterwards, it occurred to her that she ought to have
wondered at this, but at the time it all seemed quite natural);
but when the Rabbit actually took a watch out of its waistcoat-pocket,
and looked at it, and then hurried on, Alice started to her feet,
for it flashed across her mind that she had never before seen a
rabbit with either a waistcoat-pocket, or a watch to take out of it,
and burning with curiosity, she ran across the field after it,
and fortunately was just in time to see it pop down a large
rabbit-hole under the hedge.

In another moment down went Alice after it, never once considering how
in the world she was to get out again.

The rabbit-hole went straight on like a tunnel for some way, and then
dipped suddenly down, so suddenly that Alice had not a moment to
think about stopping herself before she found herself falling down
what seemed to be a very deep well.

Either the well was very deep, or she fell very slowly, for she
had plenty of time as she went down to look about her and to wonder
what was going to happen next. First, she tried to look down and
make out what she was coming to, but it was too dark to see anything;
then she looked at the sides of the well, and noticed that they were
filled with cupboards and book-shelves; here andD there she saw maps
and pictures hung upon pegs. She took down a jar from one of the
shelves as she passed; it was labelled 'ORANGE MARMALADE', but to her
great disappointment it was empty: she did not like to drop the jar
for fear of killing somebody underneath, so managed to put it into
one of the cupboards as she fell past it.

'Well!' thought Alice to herself. 'After such a fall as this,
I shall think nothing of tumbling down stairs! How brave they'll
all think me at home! Why, I wouldn't say anything about it,
even if I fell off the top of the house!' (Which was very likely true.)

Down, down, down. Would the fall never come to an end? 'I wonder
how many miles I've fallen by this time?' she said aloud. 'I must
be getting somewhere near the centre of the earth. Let me see:
that would be four thousand miles down, I think–' (for, you
see, Alice had learnt several things of this sort in her lessons
in the schoolroom, and though this was not a very good opportunity
for showing off her knowledge, as there was no one to listen to
her, still it was good practice to say it over) '–yes, that's
about the right distance–but then I wonder what Latitude or
Longitude I've got to?' (Alice had no idea what Latitude was,
or Longitude either, but thought they were nice grand words to say.)
"""

print("Data loaded (now with more text!).")

Data loaded (now with more text!).


In [23]:
#-------------------------------------------------------------------------------
# Cell 3: Tokenize the Text
# This step is the same, but the tokenizer will now find more unique words.
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])
vocab_size = len(tokenizer.word_index) + 1
encoded_text = tokenizer.texts_to_sequences([data])[0]

print(f"Total unique words (vocab size): {vocab_size}")

Total unique words (vocab size): 271


In [24]:
#-------------------------------------------------------------------------------
# Cell 4: Create Input Sequences and Targets (Longer Sequence)
# We're increasing the seq_length to 10. This gives the model
# more context to learn from.
sequences = []
seq_length = 10  # Increased from 5 to 10

for i in range(seq_length, len(encoded_text)):
    seq = encoded_text[i-seq_length:i]
    label = encoded_text[i]
    sequences.append((seq, label))

print(f"Total number of sequences created: {len(sequences)}")

Total number of sequences created: 641


In [25]:
#-------------------------------------------------------------------------------
# Cell 5: Prepare Data for Keras (with Validation Split)
# This is a major improvement. We split our data into a training set
# (for learning) and a validation set (for testing). This lets us
# see if the model is just memorizing or actually learning.

X, y = zip(*sequences)
X = np.array(X)
y = to_categorical(y, num_classes=vocab_size)

# Split the data: 80% for training, 20% for validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Shape of X_train (inputs): {X_train.shape}")
print(f"Shape of y_train (targets): {y_train.shape}")
print(f"Shape of X_val (validation inputs): {X_val.shape}")
print(f"Shape of y_val (validation targets): {y_val.shape}")

Shape of X_train (inputs): (512, 10)
Shape of y_train (targets): (512, 271)
Shape of X_val (validation inputs): (129, 10)
Shape of y_val (validation targets): (129, 271)


In [26]:
#-------------------------------------------------------------------------------
# Cell 6: Define the Improved Model Architecture
# --- MODEL SIMPLIFIED EVEN FURTHER TO PREVENT OVERFITTING ---
embedding_dim = 32  # Reduced dimensions
lstm_units = 64     # Reduced memory units

model = Sequential()

# Embedding Layer
model.add(Embedding(input_dim=vocab_size,
                    output_dim=embedding_dim,
                    input_length=seq_length))
# Increased dropout
model.add(Dropout(0.2))

# Layer 1: A single, smaller LSTM layer
model.add(LSTM(lstm_units))
# Increased dropout
model.add(Dropout(0.2))

# Output Layer (same as before)
model.add(Dense(vocab_size, activation='softmax'))

model.summary()

In [27]:
#-------------------------------------------------------------------------------
# Cell 7: Compile and Train the Model (with Early Stopping)
# --- ADJUSTED LEARNING RATE AND PATIENCE ---
model.compile(
    optimizer=Adam(learning_rate=0.001),  # Using Adam with a slightly smaller rate
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Increased patience to 15
early_stopper = EarlyStopping(
    monitor='val_loss',
    patience=15,
    restore_best_weights=True,
    verbose=1
)

print("Starting enhanced model training...")
# We set epochs to a reasonable number. EarlyStopping will
# likely stop it before this number is reached.
model.fit(
    X_train,
    y_train,
    epochs=100,
    batch_size=8,
    validation_data=(X_val, y_val),  # Pass in the validation set
    callbacks=[early_stopper],       # Add the early stopping callback
    verbose=2
)
print("Model training complete.")

Starting enhanced model training...
Epoch 1/100
64/64 - 4s - 60ms/step - accuracy: 0.0391 - loss: 5.5839 - val_accuracy: 0.0388 - val_loss: 5.5029
Epoch 2/100
64/64 - 0s - 8ms/step - accuracy: 0.0312 - loss: 5.2360 - val_accuracy: 0.0388 - val_loss: 5.5122
Epoch 3/100
64/64 - 1s - 9ms/step - accuracy: 0.0430 - loss: 5.0427 - val_accuracy: 0.0620 - val_loss: 5.6144
Epoch 4/100
64/64 - 1s - 9ms/step - accuracy: 0.0449 - loss: 4.9716 - val_accuracy: 0.0388 - val_loss: 5.9650
Epoch 5/100
64/64 - 1s - 13ms/step - accuracy: 0.0371 - loss: 4.9337 - val_accuracy: 0.0543 - val_loss: 6.0319
Epoch 6/100
64/64 - 1s - 13ms/step - accuracy: 0.0371 - loss: 4.9060 - val_accuracy: 0.0543 - val_loss: 6.0669
Epoch 7/100
64/64 - 1s - 13ms/step - accuracy: 0.0488 - loss: 4.8611 - val_accuracy: 0.0388 - val_loss: 6.1615
Epoch 8/100
64/64 - 1s - 8ms/step - accuracy: 0.0430 - loss: 4.8208 - val_accuracy: 0.0388 - val_loss: 6.0934
Epoch 9/100
64/64 - 1s - 10ms/step - accuracy: 0.0352 - loss: 4.7955 - val_accur

In [17]:
#-------------------------------------------------------------------------------
# Cell 8: Define the Text Generation Function
# --- REPLACED with a much better function that uses Temperature Sampling ---
# This prevents the model from getting stuck in loops ("the the the")
# and allows for more creative, varied text generation.

def sample(preds, temperature=1.0):
    """
    Helper function to sample an index from a probability array,
    controlled by temperature.
    A higher temperature (e.g., 1.2) = more random, creative text.
    A lower temperature (e.g., 0.5) = more conservative, predictable text.
    """
    # Clip predictions to avoid log(0)
    preds = np.clip(preds, 1e-7, 1 - 1e-7)
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_text(seed_text, n_words, temperature=0.7):
    """
    Generates 'n_words' of text based on a 'seed_text' and a 'temperature'.
    """
    generated_text = seed_text
    current_text = seed_text

    int_to_word = {v: k for k, v in tokenizer.word_index.items()}

    for _ in range(n_words):
        # Tokenize the *current* text seed
        encoded = tokenizer.texts_to_sequences([current_text])[0]
        # Pad the sequence from the beginning
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')

        y_pred_probs = model.predict(encoded, verbose=0)[0]

        # Use the 'sample' function instead of np.argmax
        y_pred_index = sample(y_pred_probs, temperature)

        out_word = int_to_word.get(y_pred_index, '?')

        # Append the new word to build the next sequence
        current_text += " " + out_word
        generated_text += " " + out_word

    return generated_text

print("Text generation function defined.")

Text generation function defined.


In [18]:
#-------------------------------------------------------------------------------
# Cell 9: Generate New Text
# Now we test our new, more robust model.

seed_text = "alice was beginning to get"
generated = generate_text(seed_text, 30, temperature=0.8) # Generate 30 new words

print("\n--- SEED TEXT ---")
print(seed_text)
print("\n--- GENERATED TEXT (from Enhanced LSTM) ---")
print(generated)


--- SEED TEXT ---
alice was beginning to get

--- GENERATED TEXT (from Enhanced LSTM) ---
alice was beginning to get straight true fall she one maps what took this one either one down quite opportunity filled is several hurried getting put straight even out nor maps by for fall in


In [None]:
#-------------------------------------------------------------------------------
# Cell 10: Inspect the LSTM Layer Weights
# This cell extracts the trained weights from the LSTM layer.
# It should be run AFTER Cell 7 has completed training.

print("\n--- Inspecting LSTM Layer Weights ---")

# The LSTM layer is the 3rd layer in our model (index 2)
# [0] = Embedding
# [1] = Dropout
# [2] = LSTM
# [3] = Dropout
# [4] = Dense
try:
    lstm_layer = model.layers[2]

    # get_weights() returns a list of 3 NumPy arrays:
    # [0] = Input Kernel (W) - for the input word embeddings
    # [1] = Recurrent Kernel (U) - for the previous hidden state
    # [2] = Biases (b)
    lstm_weights = lstm_layer.get_weights()

    W_input = lstm_weights[0]
    W_recurrent = lstm_weights[1]
    b_all = lstm_weights[2]

    print(f"Total weight arrays found: {len(lstm_weights)}")
    print(f"Shape of Input Kernel (W_input): {W_input.shape}")
    print(f"Shape of Recurrent Kernel (W_recurrent): {W_recurrent.shape}")
    print(f"Shape of Biases (b_all): {b_all.shape}")

    # --- Now, let's break them down ---

    # The Keras LSTM concatenates the weights for its 4 internal gates
    # (Input, Forget, Candidate, Output) into one giant matrix.
    # The total number of columns is 4 * lstm_units (4 * 32 = 128)

    print("\n--- Explanation ---")
    print(f"The Input Kernel shape is (embedding_dim, 4 * lstm_units) -> ({embedding_dim}, {4 * lstm_units})")
    print(f"The Recurrent Kernel shape is (lstm_units, 4 * lstm_units) -> ({lstm_units}, {4 * lstm_units})")
    print(f"The Bias shape is (4 * lstm_units,) -> ({4 * lstm_units},)")

    # We can "slice" these matrices to see the weights for a single gate.
    # The standard Keras order is: Input Gate, Forget Gate, Candidate Gate, Output Gate.

    # Weights for the Forget Gate (the second chunk of size lstm_units)
    start = lstm_units
    end = 2 * lstm_units

    W_input_forget_gate = W_input[:, start:end]
    W_recurrent_forget_gate = W_recurrent[:, start:end]
    b_forget_gate = b_all[start:end]

    print("\n--- Example: Sliced Weights for the 'Forget Gate' ---")
    print(f"Input Kernel (Forget Gate) shape: {W_input_forget_gate.shape}")
    print(f"Recurrent Kernel (Forget Gate) shape: {W_recurrent_forget_gate.shape}")
    print(f"Bias (Forget Gate) shape: {b_forget_gate.shape}")

except Exception as e:
    print(f"\nAn error occurred. Did you run Cell 7 to train the model first?")
    print(f"Error details: {e}")