# Summer of Code - Artificial Intelligence

## Week 10: Deep Learning

### Day 05: Machine Translation

In this notebook, we will explore **Machine Translation** using **Bidirectional RNNs** in PyTorch.


# Neural Machine Translation

Neural Machine Translation (NMT) is another NLP task where we translate text from one language to another using neural networks.


## Encoder-Decoder Architecture

The most common architecture for NMT is the Encoder-Decoder architecture. The encoder processes the input sentence and encodes it into a fixed-length context vector, which is then used by the decoder to generate the translated sentence.

<img src="images/enc_dec.png" alt="Encoder-Decoder Architecture" width="600"/>


### English to Urdu Translation Dataset


In [None]:
from zipfile import ZipFile
from urllib.request import urlopen, Request


url = "https://www.manythings.org/anki/urd-eng.zip"
req = Request(
    url,
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    },
)
with urlopen(req) as response:
    with open("urd-eng.zip", "wb") as f:
        f.write(response.read())
with ZipFile("urd-eng.zip", "r") as zip_ref:
    zip_ref.extractall("./eng-urd")
print("Dataset downloaded and extracted.")

In [3]:
with open("./eng-urd/urd.txt", "r", encoding="utf-8") as f:
    data = f.readlines()

data[:5]

['Hi.\tسلام۔\tCC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #9020897 (nusrat)\n',
 'Help!\tمدد۔\tCC-BY 2.0 (France) Attribution: tatoeba.org #435084 (lukaszpp) & #1462368 (nabeel_tahir)\n',
 'Thanks.\tشکریہ۔\tCC-BY 2.0 (France) Attribution: tatoeba.org #2057650 (nava) & #9020893 (nusrat)\n',
 'We won.\tہم جیت گئے۔\tCC-BY 2.0 (France) Attribution: tatoeba.org #2107675 (CK) & #2123755 (nabeel_tahir)\n',
 'Beat it.\tبھاگ جائو۔\tCC-BY 2.0 (France) Attribution: tatoeba.org #37902 (CM) & #1610833 (nabeel_tahir)\n']

In [4]:
source_target = [line.strip().split("\t")[:2] for line in data]
source_target[:3]

[['Hi.', 'سلام۔'], ['Help!', 'مدد۔'], ['Thanks.', 'شکریہ۔']]

In [28]:
import numpy as np


np.random.shuffle(source_target)
source_target[:3]

[['Hey, look at this.', 'یہ دیکھو۔'],
 ['I keep a dog.', 'میں کتا رکھتا ہوں۔'],
 ["I just don't know what to say.", 'مجھے ابھی نہیں پتا کیا کہنا ہے ۔']]

In [31]:
source_sentences, target_sentences = zip(*source_target)
for i in range(3):
    print(f"{source_sentences[i]} => {target_sentences[i]}")

Hey, look at this. => یہ دیکھو۔
I keep a dog. => میں کتا رکھتا ہوں۔
I just don't know what to say. => مجھے ابھی نہیں پتا کیا کہنا ہے ۔


### Build Vocabulary


In [33]:
def tokenize(sentences):
    return [s.lower().split() for s in sentences]

In [37]:
source_tokens = tokenize(source_sentences)
target_tokens = tokenize(target_sentences)

source_tokens[:3], target_tokens[:3]

([['hey,', 'look', 'at', 'this.'],
  ['i', 'keep', 'a', 'dog.'],
  ['i', 'just', "don't", 'know', 'what', 'to', 'say.']],
 [['یہ', 'دیکھو۔'],
  ['میں', 'کتا', 'رکھتا', 'ہوں۔'],
  ['مجھے', 'ابھی', 'نہیں', 'پتا', 'کیا', 'کہنا', 'ہے', '۔']])

In [38]:
from collections import Counter


def build_vocabulary(tokenized_sentences, source=True, max_vocab_size=1000):
    word_counts = Counter()
    for s in tokenized_sentences:
        word_counts.update(s)

    # Create vocabulary with special tokens
    if source:
        vocab = {"<PAD>": 0, "<UNK>": 1}
        most_common = word_counts.most_common(max_vocab_size - 2)
    else:
        vocab = {"<PAD>": 0, "<UNK>": 1, "<SOS>": 2, "<EOS>": 3}
        most_common = word_counts.most_common(max_vocab_size - 4)
    # Add most common words
    for word, _ in most_common:
        vocab[word] = len(vocab)

    return vocab, word_counts


source_vocab, source_word_counts = build_vocabulary(source_tokens, max_vocab_size=1500)
target_vocab, target_word_counts = build_vocabulary(
    target_tokens, source=False, max_vocab_size=1500
)
print("English Vocabulary Size:", len(source_vocab))
print("Urdu Vocabulary Size:", len(target_vocab))
print("Most common English words:", source_word_counts.most_common(5))
print("Most common Urdu words:", target_word_counts.most_common(5))

English Vocabulary Size: 1500
Urdu Vocabulary Size: 1500
Most common English words: [('i', 284), ('the', 265), ('to', 223), ('you', 195), ('a', 167)]
Most common Urdu words: [('میں', 376), ('ہے۔', 324), ('نے', 182), ('اس', 155), ('وہ', 149)]


In [39]:
print(target_vocab)

{'<PAD>': 0, '<UNK>': 1, '<SOS>': 2, '<EOS>': 3, 'میں': 4, 'ہے۔': 5, 'نے': 6, 'اس': 7, 'وہ': 8, 'کے': 9, 'کو': 10, 'نہیں': 11, 'کی': 12, 'سے': 13, 'ٹام': 14, 'مجھے': 15, 'تم': 16, 'کیا': 17, 'کہ': 18, 'ہوں۔': 19, 'کا': 20, 'ہو': 21, 'ہے': 22, 'یہ': 23, 'آپ': 24, 'ایک': 25, 'کر': 26, 'ہے؟': 27, 'تھا۔': 28, 'میرے': 29, 'اپنی': 30, 'ہیں۔': 31, 'بہت': 32, 'اور': 33, 'رہا': 34, 'گا۔': 35, 'کافی': 36, 'پہ': 37, 'میری': 38, 'کچھ': 39, 'اسے': 40, 'بھی': 41, 'ہم': 42, 'گھر': 43, 'ہو؟': 44, 'گیا': 45, 'مریم': 46, 'تمہیں': 47, 'نہ': 48, 'ہوا': 49, 'تھی۔': 50, 'زیادہ': 51, 'ہو۔': 52, 'کرنا': 53, 'رہی': 54, 'رہے': 55, 'ہی': 56, 'اپنے': 57, 'کرنے': 58, 'سکتے': 59, 'میرا': 60, 'سال': 61, 'وقت': 62, 'گیا۔': 63, 'گی۔': 64, 'پسند': 65, 'پاس': 66, 'تک': 67, 'تھا': 68, 'کوئی': 69, 'جا': 70, 'لئیے': 71, 'ابھی': 72, 'آج': 73, 'کبھی': 74, 'تو': 75, 'پر': 76, 'یہاں': 77, 'ساتھ': 78, '۔': 79, 'کسی': 80, 'جلدی': 81, 'تمھیں': 82, 'آ': 83, 'سب': 84, 'گاڑی': 85, 'گئے': 86, 'کرتا': 87, 'جائو': 88, 'کام': 89, 'جب': 

In [40]:
train_size = int(0.9 * len(source_tokens))

X_train_tokens = list(source_tokens[:train_size])
X_val_tokens = list(source_tokens[train_size:])

X_train_dec_tokens = [
    ["<SOS>"] + sentence.copy()
    for sentence in target_tokens[:train_size]
]
X_val_dec_tokens = [
    ["<SOS>"] + sentence.copy()
    for sentence in target_tokens[train_size:]
]

y_train_tokens = [
    sentence.copy() + ["<EOS>"]
    for sentence in target_tokens[:train_size]
]
y_val_tokens = [
    sentence.copy() + ["<EOS>"]
    for sentence in target_tokens[train_size:]
]

print("X_train sample:", X_train_tokens[0])
print("X_train_dec sample:", X_train_dec_tokens[0])
print("y_train sample:", y_train_tokens[0])

X_train sample: ['hey,', 'look', 'at', 'this.']
X_train_dec sample: ['<SOS>', 'یہ', 'دیکھو۔']
y_train sample: ['یہ', 'دیکھو۔', '<EOS>']


In [41]:
def pad_tokens(sentence_tokens, max_length=15):
    padded_tokens = []
    for tokens in sentence_tokens:
        if len(tokens) > max_length:
            tokens = tokens[:max_length]
        else:
            tokens = tokens + ["<PAD>"] * (max_length - len(tokens))
        padded_tokens.append(tokens)
    return padded_tokens

In [42]:
pad_tokens(y_train_tokens[:2])

[['یہ',
  'دیکھو۔',
  '<EOS>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>'],
 ['میں',
  'کتا',
  'رکھتا',
  'ہوں۔',
  '<EOS>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>',
  '<PAD>']]

In [44]:
def to_sequence(sentence_tokens, vocab, max_length=15):
    padded_tokens = pad_tokens(sentence_tokens, max_length)
    sequences = []
    for tokens in padded_tokens:
        sequence = [vocab.get(token, vocab["<UNK>"]) for token in tokens]
        sequences.append(sequence)
    return sequences


sample_text = ["I am happy", "This is a test sentence"]
sample_tokens = tokenize(sample_text)
sample_sequence = to_sequence(sample_tokens, source_vocab, max_length=10)
print("Sample text:", sample_sequence)

Sample text: [[2, 41, 1023, 0, 0, 0, 0, 0, 0, 0], [15, 8, 6, 1, 1, 0, 0, 0, 0, 0]]


In [45]:
X_train = to_sequence(X_train_tokens, source_vocab, max_length=15)
X_val = to_sequence(X_val_tokens, source_vocab, max_length=15)

X_train_dec = to_sequence(X_train_dec_tokens, target_vocab, max_length=15)
X_val_dec = to_sequence(X_val_dec_tokens, target_vocab, max_length=15)

y_train = to_sequence(y_train_tokens, target_vocab, max_length=15)
y_val = to_sequence(y_val_tokens, target_vocab, max_length=15)


print("X_train sample:", X_train[0])
print("y_train_dec sample:", X_train_dec[0])
print("y_train sample:", y_train[0])

X_train sample: [773, 59, 32, 191, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_train_dec sample: [2, 23, 298, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_train sample: [23, 298, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [47]:
import torch
from torch.utils.data import TensorDataset

train_data = TensorDataset(
    torch.tensor(X_train, dtype=torch.long),
    torch.tensor(X_train_dec, dtype=torch.long),
    torch.tensor(y_train, dtype=torch.long),
)

val_data = TensorDataset(
    torch.tensor(X_val, dtype=torch.long),
    torch.tensor(X_val_dec, dtype=torch.long),
    torch.tensor(y_val, dtype=torch.long),
)
print("Number of training samples:", len(train_data))
print("Number of validation samples:", len(val_data))

Number of training samples: 1034
Number of validation samples: 115


In [48]:
from torch.utils.data import DataLoader


batch_size = 64
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False)

In [65]:
import torch.nn as nn


class Encoder(nn.Module):

    def __init__(
        self,
        input_vocab_size,
        embed_size,
        hidden_size,
        num_layers=2,
        bidirectional=False,
    ):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(
            input_vocab_size, embed_size, padding_idx=0
        )
        self.bidirectional = bidirectional
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.gru = nn.GRU(
            embed_size,
            hidden_size,
            num_layers,
            batch_first=True,
            bidirectional=bidirectional,
            dropout=0.5 if num_layers > 1 else 0,
        )
        self.embed_dropout = nn.Dropout(0.5)
        if bidirectional:
            self.hidden_projection = nn.Linear(hidden_size * 2, hidden_size)

    def forward(self, x):
        embedded = self.embed_dropout(self.embedding(x))
        outputs, hidden = self.gru(embedded)
        if self.bidirectional:
            # Reshape: (num_layers * 2, batch, hidden) -> (num_layers, 2, batch, hidden)
            hidden = hidden.view(self.num_layers, 2, -1, self.hidden_size)
            # Concatenate forward and backward
            hidden = torch.cat([hidden[:, 0, :, :], hidden[:, 1, :, :]], dim=2)
            # Project to decoder size
            hidden = self.hidden_projection(hidden)
        return outputs, hidden


class Decoder(nn.Module):
    def __init__(self, output_vocab_size, embed_size, hidden_size, num_layers=2):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(
            output_vocab_size, embed_size, padding_idx=0
        )
        self.gru = nn.GRU(
            embed_size,
            hidden_size,
            num_layers,
            batch_first=True,
            dropout=0.5 if num_layers > 1 else 0,
        )
        self.embed_dropout = nn.Dropout(0.5)

    def forward(self, x, hidden):
        embedded = self.embed_dropout(self.embedding(x))
        outputs, hidden = self.gru(embedded, hidden)
        return outputs, hidden

In [50]:
class EncoderDecoder(nn.Module):
    def __init__(self, encoder, decoder):
        super(EncoderDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.fc = nn.Linear(decoder.hidden_size, len(target_vocab))
        self.dropout = nn.Dropout(0.6)

    def forward(self, source, target):
        _, encoder_hidden = self.encoder(source)
        decoder_outputs, _ = self.decoder(target, encoder_hidden)
        decoder_outputs = self.dropout(decoder_outputs)
        return self.fc(decoder_outputs)

# Training and Evaluation


In [51]:
import tqdm


def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0

    progress_bar = tqdm.tqdm(dataloader, desc="Training")
    for enc_inputs, dec_inputs, targets in progress_bar:
        # Move data to device
        enc_inputs, dec_inputs, targets = (
            enc_inputs.to(device),
            dec_inputs.to(device),
            targets.to(device),
        )

        outputs = model(enc_inputs, dec_inputs)
        outputs = outputs.reshape(-1, outputs.size(-1))
        targets = targets.reshape(-1)

        loss = criterion(outputs, targets)
        optimizer.zero_grad()
        loss.backward()

        optimizer.step()

        # Calculate accuracy
        _, predicted = torch.max(outputs.data, 1)
        non_pad_mask = targets != 0  # Create mask for non-padding tokens
        total += non_pad_mask.sum().item()
        correct += ((predicted == targets) & non_pad_mask).sum().item()
        total_loss += loss.item()

        # Update progress bar
        progress_bar.set_postfix(
            {"loss": f"{loss.item():.4f}", "acc": f"{100 * correct / total:.2f}%"}
        )

    avg_loss = total_loss / len(dataloader)
    accuracy = 100 * correct / total

    return avg_loss, accuracy

## Step 11: Validation Function

The validation function evaluates the model without updating weights. This helps us monitor overfitting and select the best model.


In [52]:
def validate(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0

    progress_bar = tqdm.tqdm(dataloader, desc="Validating")
    with torch.no_grad():

        for enc_inputs, dec_inputs, targets in progress_bar:
            # Move data to device
            enc_inputs, dec_inputs, targets = (
                enc_inputs.to(device),
                dec_inputs.to(device),
                targets.to(device),
            )

            outputs = model(enc_inputs, dec_inputs)
            outputs = outputs.reshape(-1, outputs.size(-1))
            targets = targets.reshape(-1)

            loss = criterion(outputs, targets)

            # Calculate accuracy
            _, predicted = torch.max(outputs.data, 1)
            non_pad_mask = targets != 0  # Create mask for non-padding tokens
            total += non_pad_mask.sum().item()
            correct += ((predicted == targets) & non_pad_mask).sum().item()
            total_loss += loss.item()

            # Update progress bar with current accuracy
            progress_bar.set_postfix(
                {"loss": f"{loss.item():.4f}", "acc": f"{100 * correct / total:.2f}%"}
            )

    avg_loss = total_loss / len(dataloader)
    accuracy = 100 * correct / total

    return avg_loss, accuracy

In [70]:
# Instantiate the models
embed_size = 256
hidden_size = 512
num_layers = 2
encoder = Encoder(
    len(source_vocab), embed_size, hidden_size, num_layers, bidirectional=True
)
decoder = Decoder(len(target_vocab), embed_size, hidden_size, num_layers)

model = EncoderDecoder(encoder, decoder)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model

EncoderDecoder(
  (encoder): Encoder(
    (embedding): Embedding(1500, 256, padding_idx=0)
    (gru): GRU(256, 512, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
    (embed_dropout): Dropout(p=0.5, inplace=False)
    (hidden_projection): Linear(in_features=1024, out_features=512, bias=True)
  )
  (decoder): Decoder(
    (embedding): Embedding(1500, 256, padding_idx=0)
    (gru): GRU(256, 512, num_layers=2, batch_first=True, dropout=0.5)
    (embed_dropout): Dropout(p=0.5, inplace=False)
  )
  (fc): Linear(in_features=512, out_features=1500, bias=True)
  (dropout): Dropout(p=0.6, inplace=False)
)

In [71]:
from torchinfo import summary

input_sample = torch.zeros((1, 15), dtype=torch.long).to(device)
summary(model, input_data=(input_sample, input_sample))

Layer (type:depth-idx)                   Output Shape              Param #
EncoderDecoder                           [1, 15, 1500]             --
├─Encoder: 1-1                           [1, 15, 1024]             --
│    └─Embedding: 2-1                    [1, 15, 256]              384,000
│    └─Dropout: 2-2                      [1, 15, 256]              --
│    └─GRU: 2-3                          [1, 15, 1024]             7,090,176
│    └─Linear: 2-4                       [2, 1, 512]               524,800
├─Decoder: 1-2                           [1, 15, 512]              --
│    └─Embedding: 2-5                    [1, 15, 256]              384,000
│    └─Dropout: 2-6                      [1, 15, 256]              --
│    └─GRU: 2-7                          [1, 15, 512]              2,758,656
├─Dropout: 1-3                           [1, 15, 512]              --
├─Linear: 1-4                            [1, 15, 1500]             769,500
Total params: 11,911,132
Trainable params: 11,911,1

In [72]:
import torch.optim as optim


# Loss function (ignore padding tokens)
criterion = nn.CrossEntropyLoss(ignore_index=0)

# Optimizer
learning_rate = 0.0007
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-4)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="min", factor=0.5, patience=5
)

print(f"Number of training batches: {len(train_loader)}")
print(f"Number of validation batches: {len(val_loader)}")

Number of training batches: 17
Number of validation batches: 2


In [73]:
num_epochs = 150
best_val_acc = 0.0
patience = 15
patience_counter = 0

# Track training history
train_losses = []
train_accs = []
val_losses = []
val_accs = []

print("Starting training...")
print(f"Device: {device}")
print(f"Number of epochs: {num_epochs}\n")

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    # Train
    train_loss, train_acc = train_epoch(
        model, train_loader, criterion, optimizer, device
    )
    # Validate
    val_loss, val_acc = validate(model, val_loader, criterion, device)

    # Track history
    train_losses.append(train_loss)
    train_accs.append(train_acc)
    val_losses.append(val_loss)
    val_accs.append(val_acc)
    
    scheduler.step(val_loss)

    # Save best model
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        patience_counter = 0
        torch.save(model.state_dict(), "nmt_model.pth")
        print(f"  ✓ Saved best model (Val Acc: {val_acc:.2f}%)")
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping triggered after {epoch + 1} epochs")
            break

print("Training complete!")
print(f"Best validation accuracy: {best_val_acc:.2f}%")

Starting training...
Device: cuda
Number of epochs: 150

Epoch 1/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.86it/s, loss=5.4288, acc=11.10%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.48it/s, loss=5.2349, acc=14.41%]


  ✓ Saved best model (Val Acc: 14.41%)
Epoch 2/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.92it/s, loss=4.6475, acc=16.08%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 45.91it/s, loss=5.0170, acc=14.62%]


  ✓ Saved best model (Val Acc: 14.62%)
Epoch 3/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.48it/s, loss=5.1446, acc=17.64%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.61it/s, loss=4.8138, acc=18.51%]


  ✓ Saved best model (Val Acc: 18.51%)
Epoch 4/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.38it/s, loss=5.1359, acc=19.21%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.98it/s, loss=4.7100, acc=20.08%]


  ✓ Saved best model (Val Acc: 20.08%)
Epoch 5/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.70it/s, loss=4.9439, acc=20.98%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 37.98it/s, loss=4.6172, acc=19.98%]


Epoch 6/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.52it/s, loss=4.7027, acc=22.27%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 43.88it/s, loss=4.4472, acc=23.87%]


  ✓ Saved best model (Val Acc: 23.87%)
Epoch 7/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.09it/s, loss=4.1918, acc=24.16%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 39.08it/s, loss=4.3547, acc=23.66%]


Epoch 8/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.07it/s, loss=4.2706, acc=25.81%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 44.58it/s, loss=4.2226, acc=26.39%]


  ✓ Saved best model (Val Acc: 26.39%)
Epoch 9/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.41it/s, loss=3.7477, acc=27.26%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 42.97it/s, loss=4.1901, acc=27.66%]


  ✓ Saved best model (Val Acc: 27.66%)
Epoch 10/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.21it/s, loss=3.3512, acc=28.82%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.47it/s, loss=4.0599, acc=30.70%]


  ✓ Saved best model (Val Acc: 30.70%)
Epoch 11/150


Training: 100%|██████████| 17/17 [00:01<00:00, 15.04it/s, loss=2.9100, acc=30.22%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 28.77it/s, loss=4.0883, acc=28.60%]


Epoch 12/150


Training: 100%|██████████| 17/17 [00:01<00:00, 15.61it/s, loss=3.7075, acc=31.80%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 25.26it/s, loss=3.9668, acc=31.13%]


  ✓ Saved best model (Val Acc: 31.13%)
Epoch 13/150


Training: 100%|██████████| 17/17 [00:01<00:00, 15.00it/s, loss=4.0663, acc=32.99%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 20.78it/s, loss=3.9230, acc=31.34%]


  ✓ Saved best model (Val Acc: 31.34%)
Epoch 14/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.14it/s, loss=3.3972, acc=35.29%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 29.41it/s, loss=3.8510, acc=32.28%]


  ✓ Saved best model (Val Acc: 32.28%)
Epoch 15/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.42it/s, loss=3.6718, acc=36.97%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.90it/s, loss=3.9513, acc=31.44%]


Epoch 16/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.59it/s, loss=3.2455, acc=38.53%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 44.72it/s, loss=3.7577, acc=33.54%]


  ✓ Saved best model (Val Acc: 33.54%)
Epoch 17/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.62it/s, loss=2.7499, acc=41.14%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 35.34it/s, loss=3.7840, acc=35.23%]


  ✓ Saved best model (Val Acc: 35.23%)
Epoch 18/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.61it/s, loss=2.4921, acc=42.64%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 40.20it/s, loss=3.7391, acc=35.54%]


  ✓ Saved best model (Val Acc: 35.54%)
Epoch 19/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.27it/s, loss=2.6860, acc=45.10%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 30.79it/s, loss=3.7529, acc=36.17%]


  ✓ Saved best model (Val Acc: 36.17%)
Epoch 20/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.47it/s, loss=2.4139, acc=46.27%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.23it/s, loss=3.7813, acc=36.38%]


  ✓ Saved best model (Val Acc: 36.38%)
Epoch 21/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.55it/s, loss=2.6710, acc=48.79%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.27it/s, loss=3.8153, acc=37.12%]


  ✓ Saved best model (Val Acc: 37.12%)
Epoch 22/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.00it/s, loss=2.2600, acc=50.71%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 20.33it/s, loss=3.7166, acc=39.12%]


  ✓ Saved best model (Val Acc: 39.12%)
Epoch 23/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.05it/s, loss=2.4526, acc=52.26%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.86it/s, loss=3.7148, acc=37.85%]


Epoch 24/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.10it/s, loss=2.0605, acc=53.25%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 39.37it/s, loss=3.7596, acc=38.91%]


Epoch 25/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.57it/s, loss=1.5992, acc=57.45%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 45.72it/s, loss=3.7371, acc=40.17%]


  ✓ Saved best model (Val Acc: 40.17%)
Epoch 26/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.33it/s, loss=1.9008, acc=58.94%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 31.40it/s, loss=3.7388, acc=40.38%]


  ✓ Saved best model (Val Acc: 40.38%)
Epoch 27/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.18it/s, loss=1.4416, acc=60.27%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 34.74it/s, loss=3.7192, acc=41.43%]


  ✓ Saved best model (Val Acc: 41.43%)
Epoch 28/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.45it/s, loss=1.6744, acc=61.44%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 28.17it/s, loss=3.6972, acc=40.90%]


Epoch 29/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.13it/s, loss=1.9789, acc=62.60%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 37.93it/s, loss=3.7527, acc=40.69%]


Epoch 30/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.18it/s, loss=1.8314, acc=63.10%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 44.39it/s, loss=3.7570, acc=41.32%]


Epoch 31/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.47it/s, loss=1.2125, acc=64.26%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 48.79it/s, loss=3.7403, acc=41.54%]


  ✓ Saved best model (Val Acc: 41.54%)
Epoch 32/150


Training: 100%|██████████| 17/17 [00:00<00:00, 17.58it/s, loss=1.5174, acc=65.06%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 23.37it/s, loss=3.7679, acc=41.85%]


  ✓ Saved best model (Val Acc: 41.85%)
Epoch 33/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.29it/s, loss=1.3556, acc=66.02%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 40.47it/s, loss=3.7744, acc=41.54%]


Epoch 34/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.02it/s, loss=1.4578, acc=66.11%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 28.13it/s, loss=3.7871, acc=41.85%]


Epoch 35/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.38it/s, loss=1.3376, acc=66.40%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 33.64it/s, loss=3.8029, acc=41.43%]


Epoch 36/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.25it/s, loss=1.6650, acc=67.68%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 48.30it/s, loss=3.8241, acc=41.75%]


Epoch 37/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.51it/s, loss=1.2242, acc=68.20%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.03it/s, loss=3.8126, acc=41.85%]


Epoch 38/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.73it/s, loss=1.2004, acc=67.89%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 48.65it/s, loss=3.8211, acc=41.85%]


Epoch 39/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.43it/s, loss=1.4818, acc=68.03%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 51.40it/s, loss=3.8211, acc=41.75%]


Epoch 40/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.81it/s, loss=1.1035, acc=68.89%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 47.65it/s, loss=3.8332, acc=42.17%]


  ✓ Saved best model (Val Acc: 42.17%)
Epoch 41/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.87it/s, loss=1.2818, acc=68.92%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.14it/s, loss=3.8212, acc=42.17%]


Epoch 42/150


Training: 100%|██████████| 17/17 [00:00<00:00, 19.18it/s, loss=1.0286, acc=69.42%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 59.82it/s, loss=3.8404, acc=41.85%]


Epoch 43/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.40it/s, loss=1.4923, acc=69.28%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 50.76it/s, loss=3.8408, acc=42.17%]


Epoch 44/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.16it/s, loss=1.1985, acc=69.68%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.06it/s, loss=3.8479, acc=41.85%]


Epoch 45/150


Training: 100%|██████████| 17/17 [00:00<00:00, 21.20it/s, loss=1.3202, acc=70.16%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 52.83it/s, loss=3.8460, acc=41.75%]


Epoch 46/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.33it/s, loss=1.2012, acc=69.96%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 45.75it/s, loss=3.8540, acc=41.96%]


Epoch 47/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.98it/s, loss=1.1938, acc=70.48%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 42.66it/s, loss=3.8464, acc=41.85%]


Epoch 48/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.73it/s, loss=1.2217, acc=70.21%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 40.95it/s, loss=3.8485, acc=41.64%]


Epoch 49/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.52it/s, loss=1.1190, acc=70.60%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 44.62it/s, loss=3.8517, acc=41.75%]


Epoch 50/150


Training: 100%|██████████| 17/17 [00:00<00:00, 20.60it/s, loss=0.9606, acc=70.14%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 46.48it/s, loss=3.8543, acc=41.96%]


Epoch 51/150


Training: 100%|██████████| 17/17 [00:00<00:00, 22.72it/s, loss=1.2703, acc=70.55%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 51.40it/s, loss=3.8557, acc=41.96%]


Epoch 52/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.99it/s, loss=0.8996, acc=70.28%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 42.24it/s, loss=3.8553, acc=41.75%]


Epoch 53/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.35it/s, loss=1.2105, acc=70.88%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 54.25it/s, loss=3.8607, acc=41.96%]


Epoch 54/150


Training: 100%|██████████| 17/17 [00:00<00:00, 18.23it/s, loss=1.3192, acc=71.50%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 53.00it/s, loss=3.8667, acc=42.17%]


Epoch 55/150


Training: 100%|██████████| 17/17 [00:01<00:00, 16.98it/s, loss=1.1098, acc=70.74%]
Validating: 100%|██████████| 2/2 [00:00<00:00, 48.15it/s, loss=3.8690, acc=42.06%]

Early stopping triggered after 55 epochs
Training complete!
Best validation accuracy: 42.17%





# Load Best Model


In [74]:
model.load_state_dict(torch.load("nmt_model.pth", weights_only=True))
model = model.to(device)
validate(model, val_loader, criterion, device)

Validating: 100%|██████████| 2/2 [00:00<00:00, 12.77it/s, loss=3.8332, acc=42.17%]


(3.904155969619751, 42.16614090431125)

In [80]:
target_i2t = {idx: token for token, idx in target_vocab.items()}


def translate(sentence, model, source_vocab, target_vocab, max_length=20, device="cpu"):
    model.eval()
    sentence_tokens = sentence.lower().split()
    sequence = to_sequence([sentence_tokens], source_vocab, max_length=max_length)
    encoder_input = torch.tensor(sequence, dtype=torch.long).to(device)

    # Start with <SOS> token
    decoder_input = [target_vocab["<SOS>"]]
    translation = []

    with torch.no_grad():
        _, encoder_hidden = model.encoder(encoder_input)

        for _ in range(max_length):
            # Prepare decoder input
            dec_input = torch.tensor([decoder_input], dtype=torch.long).to(device)

            # Decode
            decoder_outputs, _ = model.decoder(dec_input, encoder_hidden)

            # Get prediction for the last token
            output = model.fc(decoder_outputs[:, -1, :])
            predicted_id = output.argmax(dim=-1).item()

            # Check for EOS token
            if predicted_id == target_vocab["<EOS>"]:
                break

            # Get the predicted word
            predicted_word = target_i2t.get(predicted_id, "")

            # Skip special tokens in output
            if predicted_word not in ["<PAD>", "<UNK>", "<SOS>", "<EOS>"]:
                translation.append(predicted_word)

            # Add predicted token to decoder input for next iteration
            decoder_input.append(predicted_id)
    return " ".join(translation)


# Test the translation function
test_sentences = source_sentences[:10]

print("Testing translations:\n")
for sentence in test_sentences:
    translated = translate(sentence, model, source_vocab, target_vocab, device=device)
    print(f"Source: {sentence}")
    print(f"Target: {translated}")
    print()

Testing translations:

Source: Hey, look at this.
Target: یہ دیکھو۔

Source: I keep a dog.
Target: میں کتا رکھتا ہوں۔

Source: I just don't know what to say.
Target: مجھے پتا ہے کہ تم امیر ہو۔

Source: How is it going?
Target: کيسا چل رہا ہے ؟

Source: Trust me!
Target: مجھ پر بھروسہ

Source: Tom's scared.
Target: ٹام مر ہے۔

Source: I'm seeing them tonight.
Target: مجھے نیند سے الرجی ہے۔

Source: What did she do today?
Target: انہیوں نے آج کیا کیا؟

Source: He decided to submit his resignation.
Target: اس نے اپنا استعفی کرانے کرانے

Source: His car is two years old.
Target: اس کی گاڑی دو پرانی ہے۔

