# Part-I: Theoretical Understanding

## Task 1: Conceptual Questions

1. **What is the difference between RNN and LSTM?**  
RNNs process sequences step by step but struggle with long-term dependencies due to vanishing gradients. LSTMs use memory cells and gates (input, forget, output) to retain information over longer time spans.

2. **What is the vanishing gradient problem, and how does LSTM solve it?**  
In long sequences, gradients shrink during backpropagation, causing learning failure. LSTMs maintain a constant error flow using a cell state and gating mechanisms, which reduces vanishing gradients.

3. **Explain the purpose of the Encoder-Decoder architecture.**  
It maps variable-length input sequences to variable-length outputs by compressing the input into a context vector (encoder) and generating outputs step by step (decoder).

4. **In a sequence-to-sequence model, what are the roles of the encoder and decoder?**  
The encoder processes the input and produces a fixed-length representation (context vector), while the decoder generates the target sequence using this vector.

5. **How is attention different from a basic encoder-decoder model?**  
Attention allows the decoder to focus on specific encoder states at each step rather than relying on a single fixed vector, improving performance for longer inputs.

## Task 2: Sequence-to-Sequence Data Flow

Input sequence → Encoder (LSTM) → Hidden states → Context Vector → Decoder (LSTM) → Output sequence

Labels:
- Input: x1, x2, …, xn
- Encoder hidden states: h1, h2, …, hn
- Context vector: h_n
- Decoder output: y1, y2, …, ym

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Embedding, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import matplotlib.pyplot as plt

input_texts = ['hello', 'how are you', 'good morning', 'thank you', 'i love you']
target_texts = ['bonjour', 'comment ça va', 'bonjour', 'merci', 'je t\'aime']
target_texts = ['<start> ' + t + ' <end>' for t in target_texts]

input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_sequences = pad_sequences(input_sequences, padding='post')

output_tokenizer = Tokenizer()
output_tokenizer.fit_on_texts(target_texts)
output_sequences = output_tokenizer.texts_to_sequences(target_texts)
output_sequences = pad_sequences(output_sequences, padding='post')

input_vocab_size = len(input_tokenizer.word_index) + 1
output_vocab_size = len(output_tokenizer.word_index) + 1
max_encoder_len = input_sequences.shape[1]
max_decoder_len = output_sequences.shape[1]

encoder_input_data = np.array(input_sequences)
decoder_input_data = np.array(output_sequences[:, :-1])
decoder_target_data = np.array(output_sequences[:, 1:])

latent_dim = 256
encoder_inputs = Input(shape=(max_encoder_len,))
enc_emb = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm, state_h, state_c = LSTM(latent_dim, return_state=True)(enc_emb)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(max_decoder_len-1,))
dec_emb_layer = Embedding(output_vocab_size, latent_dim)
dec_emb = dec_emb_layer(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)
decoder_dense = Dense(output_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
history = model.fit([encoder_input_data, decoder_input_data],
                    decoder_target_data[..., np.newaxis],
                    batch_size=2,
                    epochs=10)

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
dec_emb2 = dec_emb_layer(decoder_inputs)
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=[decoder_state_input_h, decoder_state_input_c])
decoder_outputs2 = decoder_dense(decoder_outputs2)
decoder_model = Model([decoder_inputs, decoder_state_input_h, decoder_state_input_c],
                      [decoder_outputs2, state_h2, state_c2])

reverse_output_index = {i: w for w, i in output_tokenizer.word_index.items()}

def decode_sequence(input_seq):
    states_value = encoder_model.predict(input_seq)
    target_seq = np.zeros((1, 1))
    target_seq[0, 0] = output_tokenizer.word_index['start']
    decoded_sentence = ''
    for _ in range(max_decoder_len):
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = reverse_output_index.get(sampled_token_index, '')
        if sampled_word == 'end':
            break
        decoded_sentence += ' ' + sampled_word
        target_seq[0, 0] = sampled_token_index
        states_value = [h, c]
    return decoded_sentence.strip()

for i in range(len(input_texts)):
    input_seq = encoder_input_data[i:i+1]
    print("Input:", input_texts[i])
    print("Predicted:", decode_sequence(input_seq))

plt.plot(history.history['loss'])
plt.title("Training Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.show()

# Part-III: Visualizing and Enhancing Encoder-Decoder

## Task 8: Model Performance Discussion

1. **What are the challenges in training sequence-to-sequence models?**  
They struggle with long-term dependencies, require large datasets, and can overfit small data.

2. **What does a “bad” translation look like? Why might it happen?**  
It may omit words, repeat phrases, or produce unrelated words due to insufficient training or limited vocabulary.

3. **How can the model be improved further?**  
Use attention mechanisms, larger datasets, deeper architectures, and regularization.