| Feature                     | Standard LSTM       | Bidirectional LSTM              |
| --------------------------- | ------------------- | ------------------------------- |
| Sequence direction          | One-way (past only) | Two-way (past + future)         |
| Context available at time t | t-1                 | t-1 and t+1                     |
| Useful when future matters? | ❌ No                | ✅ Yes                           |
| Performance (typically)     | Moderate            | Higher (if bidirectional helps) |


What is a Bidirectional LSTM?

A Bidirectional LSTM processes the input sequence in both directions:

Forward: from first to last word
Backward: from last to first word

This gives the network context from both past and future at every time step. It’s super useful when:
Understanding a word depends on both previous and next words

Tasks like Named Entity Recognition (NER), Sentiment Analysis, POS tagging, etc.


In [4]:
import torch

import torch.nn as nn 

import torch.nn.functional as F

In [5]:
# Sample vocab and data

vocab = {'i':0, 'love':1, 'deep':2, 'learning':3,'very':4, 'much':5}

idx_to_word = {idx: word for word , idx in vocab.items()}

sequences = [['i', 'love', 'deep'], ['love', 'deep', 'learning'], ['deep', 'learning', 'very']]

targets = ['deep', 'learning', 'much']

In [6]:
# Convert to tensors

def encode_seq(seq):

    return torch.tensor([vocab[word] for word in seq ], dtype=torch.long)

input_seqs = [encode_seq(seq) for seq in sequences]

target_seqs = [vocab[word] for word in targets]

target_seqs = torch.tensor(target_seqs, dtype=torch.long)

In [7]:
# Model

class BiLSTMModel(nn.Module):

    def __init__(self, vocab_size, embedding_dim, hidden_dim):
        super(BiLSTMModel, self).__init__()

        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        self.lstm = nn.LSTM(embedding_dim, hidden_dim, bidirectional=True)

        self.fc = nn.Linear(hidden_dim * 2, vocab_size) # Multiply hidden by 2 for bidirectional

    
    def forward(self,x):

        emb = self.embedding(x) # [seq_len, batch] → [seq_len, batch, emb_dim]

        output, (hn, cn) = self.lstm(emb) # output: [seq_len, batch, hidden*2]

        last_output = output[-1]  # Take last time step’s output

        out = self.fc(last_output) # [batch, vocab_size]

        return out

In [8]:
# Parameters

embedding_dim = 8

hidden_dim = 16

model = BiLSTMModel(len(vocab), embedding_dim, hidden_dim)


# Training

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

loss_fn = nn.CrossEntropyLoss()

In [9]:
for epoch in range(100):

    total_loss = 0

    for x, y in zip(input_seqs, target_seqs):

        x = x.unsqueeze(1) # [seq_len] → [seq_len, 1]

        optimizer.zero_grad()

        output = model(x)  # [1, vocab_size]

        loss = loss_fn(output, y.unsqueeze(0))  # y: scalar → shape [1]

        loss.backward()

        optimizer.step()

        total_loss += loss.item()

    
    if (epoch+1) % 20 == 0:

        print(f'{epoch + 1}/100, Loss: {total_loss:.4f}')

20/100, Loss: 0.0376
40/100, Loss: 0.0113
60/100, Loss: 0.0064
80/100, Loss: 0.0042
100/100, Loss: 0.0030


Output


Loss decreasing every 20 epochs
Model learns to predict next word using past + future context