<a href="https://colab.research.google.com/github/twyeh/AI-in-education/blob/main/RNN_LSTM_%E4%BA%92%E5%8B%95%E5%BC%8F%E4%BA%A4%E8%AB%87_20250331.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# design a RNN network with LSTM for learning the following words: "I am a student learning how to employ the AI."

Here's a step-by-step design for an LSTM-based RNN to learn the given sentence:

## 🔍 Problem Analysis

- Sequence length: 10 words
- Unique words: 10 ("I", "am", "a", "student", "learning", "how", "to", "employ", "the", "AI")
- Task: Sequence prediction (next word prediction)


## 🛠️ Network Architecture

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

model = Sequential([
    # Embedding layer (10 unique words → 8D representation)
    Embedding(input_dim=10, output_dim=8),

    # LSTM layer with 64 units
    LSTM(64, activation='tanh'),

    # Output layer with softmax activation
    Dense(10, activation='softmax')
])

### Key Parameters:

- **Embedding Layer**: Reduces one-hot encoding dimensionality
- **LSTM Units**: 64 (sufficient for this simple pattern)
- **Sequence Length**: 3 (optimal for this sentence structure)


## 📊 Data Preparation

1. **Tokenization**:

In [2]:
word_to_index = {
    "I": 0, "am": 1, "a": 2, "student": 3,
    "learning": 4, "how": 5, "to": 6,
    "employ": 7, "the": 8, "AI": 9
}

2. **Sequence Creation** (3-word window):

## 🎯 Training Configuration

In [3]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

In [4]:
import numpy as np

# Define the sentence
sentence = "I am a student learning how to employ the AI"

# Tokenization (already done)
word_to_index = {
    "I": 0, "am": 1, "a": 2, "student": 3,
    "learning": 4, "how": 5, "to": 6,
    "employ": 7, "the": 8, "AI": 9
}
index_to_word = {index: word for word, index in word_to_index.items()}

# Create input and target sequences
sequence_length = 3  # Adjust as needed
X = []
y = []

words = sentence.split()
for i in range(0, len(words) - sequence_length, 1):
    # Input sequence (e.g., ["I", "am", "a"])
    input_sequence = words[i:i + sequence_length]

    # Target word (e.g., "student")
    target_word = words[i + sequence_length]

    # Convert to indices
    X.append([word_to_index[word] for word in input_sequence])
    y.append(word_to_index[target_word])

# Convert to NumPy arrays
X = np.array(X)
y = np.array(y)

# Now you can use X and y in model.fit
model.fit(X, y, epochs=100, batch_size=1, verbose=1)

Epoch 1/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - accuracy: 0.0000e+00 - loss: 2.3040
Epoch 2/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.1485 - loss: 2.2981     
Epoch 3/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5589 - loss: 2.2872 
Epoch 4/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.1943 - loss: 2.2830     
Epoch 5/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.2509 - loss: 2.2757     
Epoch 6/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8262 - loss: 2.2626 
Epoch 7/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.4679 - loss: 2.2572     
Epoch 8/100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.3283 - loss: 2.2475     
Epoch 9/100
[1m7/7[0m [32m━━━━

<keras.src.callbacks.history.History at 0x7adcd332b790>

### Training Parameters:

- Epochs: 100-150 (for perfect convergence)
- Batch Size: 1 (due to small dataset)
- Learning Rate: 0.001 (default Adam)


## 💡 Key Implementation Notes

1. **Input Handling**: Use sliding window approach with length 3
2. **Padding**: Not required for this exact sequence
3. **Overfitting**: Acceptable/desirable for this demonstration
4. **Temperature**: Add sampling temperature for text generation variety

## 🚀 Prediction Example

In [5]:
def predict_next_word(seed_sequence):
    # Split the seed sequence into individual words
    words = seed_sequence.split()

    # Pad the sequence if it's shorter than the sequence length
    while len(words) < sequence_length:
        words = ["I"] + words  # Prepend with "I" as a simple padding strategy

    # Take the last 'sequence_length' words for prediction
    words = words[-sequence_length:]

    # Convert words to indices
    sequence = [word_to_index[word] for word in words]

    # Reshape the sequence to match the model's input shape
    sequence = np.array(sequence).reshape(1, sequence_length)

    prediction = model.predict(sequence)

    return index_to_word[np.argmax(prediction)]

In [10]:
seed_sequence = "I am a student learning how"
predict_next_word(seed_sequence)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step


'to'

## 📈 Expected Performance

- Should achieve 100% accuracy after proper training
- Can perfectly regenerate the original sequence
- Limited generalization due to minimal training data


## 📚 Recommended Enhancements

1. Add dropout (0.2-0.5) for better generalization
2. Implement beam search for prediction
3. Use bidirectional LSTM for context analysis
4. Add attention mechanism for long-term dependencies

This architecture provides a fundamental framework that can be scaled for larger datasets while maintaining the core LSTM structure for sequence learning.