# Advanced NLP: Dynamic Decisions and Bi-LSTM CRF 🇦🇺

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/pytorch-mastery/blob/main/examples/pytorch-nlp/04_advanced_nlp.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/pytorch-mastery/blob/main/examples/pytorch-nlp/04_advanced_nlp.ipynb)

Master advanced NLP techniques using PyTorch with Bi-directional LSTM and Conditional Random Fields (CRF). Features comprehensive Australian Named Entity Recognition with English-Vietnamese multilingual support and dynamic decision making for complex sequence labeling tasks.

## Learning Objectives

By the end of this notebook, you will:

- 🧠 **Master Bi-LSTM CRF architecture** for advanced sequence labeling
- 🏷️ **Implement Australian NER** for locations, organizations, and landmarks
- 🔄 **Apply dynamic computation graphs** for flexible sequence processing
- 🎯 **Handle complex decision making** with CRF layer constraints
- 🇦🇺 **Build Australian-specific NER** models with tourism entities
- 🌏 **Support multilingual NER** for English and Vietnamese text
- 🤗 **Bridge to Hugging Face** transformers for modern NLP

## What You'll Build

1. **Australian Tourism NER** - Extract locations, attractions, and organizations
2. **Bi-LSTM CRF Model** - State-of-the-art sequence labeling architecture
3. **Dynamic Decision Engine** - Handle variable-length sequences efficiently
4. **Multilingual Entity Extractor** - Process English and Vietnamese tourism text
5. **Hugging Face Integration** - Connect PyTorch models to modern transformers

## Key Concepts

### Bi-directional LSTM
- **Forward LSTM**: Processes sequence left-to-right for context
- **Backward LSTM**: Processes sequence right-to-left for future context
- **Combined representation**: Concatenates both directions for rich features

### Conditional Random Fields (CRF)
- **Global optimization**: Considers entire sequence for optimal labeling
- **Transition constraints**: Enforces valid label transitions (e.g., B-LOC → I-LOC)
- **Viterbi decoding**: Finds most likely sequence of labels

### Australian NER Tags
- **LOC**: Locations (Sydney, Melbourne, Great Barrier Reef)
- **ORG**: Organizations (Qantas, Commonwealth Bank, Tourism Australia)
- **MISC**: Miscellaneous (Opera House, Harbour Bridge, Uluru)
- **PER**: Persons (Australian personalities, Aboriginal leaders)

---

In [None]:
# Environment Detection and Setup
import sys
import subprocess
import os
import time

# Detect the runtime environment
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules or "kaggle" in os.environ.get('KAGGLE_URL_BASE', '')
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"🔍 Advanced NLP Environment Detection:")
print(f"   Local Development: {IS_LOCAL}")
print(f"   Google Colab: {IS_COLAB}")
print(f"   Kaggle Notebooks: {IS_KAGGLE}")

# Platform-specific system setup
if IS_COLAB:
    print("\n⚙️  Setting up Google Colab for advanced NLP...")
    !apt update -qq
    !apt install -y -qq software-properties-common
elif IS_KAGGLE:
    print("\n⚙️  Setting up Kaggle for advanced NLP...")
else:
    print("\n⚙️  Setting up local environment for advanced NLP...")

In [None]:
# Install required packages for advanced NLP
required_packages = [
    "torch",
    "transformers",
    "datasets", 
    "tokenizers",
    "pandas",
    "seaborn",
    "matplotlib",
    "scikit-learn",
    "tensorboard",
    "nltk",
    "spacy",  # For advanced NLP features
    "plotly",
    "torchcrf",  # CRF implementation for PyTorch
]

print("📦 Installing packages for advanced NLP...")
for package in required_packages:
    if IS_COLAB or IS_KAGGLE:
        !pip install -q {package}
    else:
        try:
            subprocess.run([sys.executable, "-m", "pip", "install", "-q", package], 
                          capture_output=True, check=True)
        except subprocess.CalledProcessError:
            print(f"   ⚠️  {package} installation skipped (likely already installed)")
            continue
    print(f"   ✅ {package}")

# Special handling for torchcrf if not available
try:
    import torchcrf
    print("   ✅ torchcrf available")
except ImportError:
    print("   📝 torchcrf not available - will implement CRF from scratch")

print("\n🎉 Advanced NLP package installation completed!")

In [None]:
# Import essential libraries for advanced NLP
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence
from torch.utils.tensorboard import SummaryWriter

# Data handling and visualization
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

# Machine learning and evaluation
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Text processing
import re
import string
from collections import Counter, defaultdict
import random
from itertools import zip_longest
import json

# CRF Implementation
try:
    from torchcrf import CRF
    CRF_AVAILABLE = True
    print("✅ Using torchcrf library")
except ImportError:
    CRF_AVAILABLE = False
    print("📝 Implementing CRF from scratch")
    
    class CRF(nn.Module):
        """Conditional Random Field implementation from scratch."""
        
        def __init__(self, num_tags, batch_first=True):
            super(CRF, self).__init__()
            self.num_tags = num_tags
            self.batch_first = batch_first
            
            # Transition parameters: transitions[i, j] = score of transitioning from j to i
            self.transitions = nn.Parameter(torch.randn(num_tags, num_tags))
            
            # Start and end transitions
            self.start_transitions = nn.Parameter(torch.randn(num_tags))
            self.end_transitions = nn.Parameter(torch.randn(num_tags))
            
        def forward(self, emissions, tags, mask=None):
            """Compute negative log likelihood."""
            if self.batch_first:
                emissions = emissions.transpose(0, 1)
                tags = tags.transpose(0, 1)
                if mask is not None:
                    mask = mask.transpose(0, 1)
            
            # Compute log likelihood
            log_likelihood = self._compute_log_likelihood(emissions, tags, mask)
            return -log_likelihood
        
        def decode(self, emissions, mask=None):
            """Find most likely tag sequence using Viterbi algorithm."""
            if self.batch_first:
                emissions = emissions.transpose(0, 1)
                if mask is not None:
                    mask = mask.transpose(0, 1)
            
            return self._viterbi_decode(emissions, mask)
        
        def _compute_log_likelihood(self, emissions, tags, mask):
            """Compute log likelihood of tag sequence."""
            seq_length, batch_size, num_tags = emissions.shape
            
            if mask is None:
                mask = torch.ones(seq_length, batch_size, dtype=torch.bool)
            
            # Compute score of given tag sequence
            score = self._compute_score(emissions, tags, mask)
            
            # Compute partition function (sum over all possible tag sequences)
            partition = self._compute_partition_function(emissions, mask)
            
            return score - partition
        
        def _compute_score(self, emissions, tags, mask):
            """Compute score of given tag sequence."""
            seq_length, batch_size = tags.shape
            score = torch.zeros(batch_size)
            
            # Add start transition scores
            score += self.start_transitions[tags[0]]
            
            # Add emission scores
            for i in range(seq_length):
                score += emissions[i, range(batch_size), tags[i]] * mask[i]
            
            # Add transition scores
            for i in range(1, seq_length):
                score += self.transitions[tags[i], tags[i-1]] * mask[i]
            
            # Add end transition scores
            last_tag_indices = mask.sum(0) - 1
            score += self.end_transitions[tags[last_tag_indices, range(batch_size)]]
            
            return score.sum()
        
        def _compute_partition_function(self, emissions, mask):
            """Compute partition function using forward algorithm."""
            seq_length, batch_size, num_tags = emissions.shape
            
            # Initialize forward variables
            forward_var = self.start_transitions + emissions[0]
            
            # Forward algorithm
            for i in range(1, seq_length):
                # Broadcast for all possible transitions
                emit_score = emissions[i].unsqueeze(1)  # (batch_size, 1, num_tags)
                trans_score = self.transitions.unsqueeze(0)  # (1, num_tags, num_tags)
                next_tag_var = forward_var.unsqueeze(2) + trans_score + emit_score  # (batch_size, num_tags, num_tags)
                
                # Log-sum-exp
                forward_var = torch.logsumexp(next_tag_var, dim=1)
                
                # Apply mask
                if mask is not None:
                    forward_var = forward_var * mask[i].unsqueeze(1) + forward_var * (1 - mask[i].unsqueeze(1))
            
            # Add end transitions
            terminal_var = forward_var + self.end_transitions
            
            return torch.logsumexp(terminal_var, dim=1).sum()
        
        def _viterbi_decode(self, emissions, mask):
            """Viterbi algorithm for finding most likely tag sequence."""
            seq_length, batch_size, num_tags = emissions.shape
            
            if mask is None:
                mask = torch.ones(seq_length, batch_size, dtype=torch.bool)
            
            # Initialize
            viterbi_var = self.start_transitions + emissions[0]
            path = []
            
            # Forward pass
            for i in range(1, seq_length):
                next_tag_var = viterbi_var.unsqueeze(2) + self.transitions.unsqueeze(0)
                best_tag_ids = torch.argmax(next_tag_var, dim=1)
                path.append(best_tag_ids)
                viterbi_var = torch.max(next_tag_var, dim=1)[0] + emissions[i]
            
            # Add end transitions
            terminal_var = viterbi_var + self.end_transitions
            best_last_tags = torch.argmax(terminal_var, dim=1)
            
            # Backward pass
            best_tags = [best_last_tags]
            for best_tag_ids in reversed(path):
                best_last_tags = best_tag_ids[range(batch_size), best_last_tags]
                best_tags.append(best_last_tags)
            
            # Reverse to get correct order
            best_tags.reverse()
            
            return torch.stack(best_tags, dim=0).transpose(0, 1) if self.batch_first else torch.stack(best_tags, dim=0)

# Set style and seeds
sns.set_style("whitegrid")
sns.set_palette("Set2")
plt.rcParams['figure.figsize'] = (14, 8)

torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

print(f"🧠 Advanced NLP Environment Ready!")
print(f"   PyTorch version: {torch.__version__}")
print(f"   CRF implementation: {'torchcrf library' if CRF_AVAILABLE else 'custom implementation'}")
print(f"   Ready for Bi-LSTM CRF modeling")

In [None]:
import platform

def detect_device():
    """Detect optimal device for advanced NLP training."""
    if torch.cuda.is_available():
        device = torch.device("cuda")
        gpu_name = torch.cuda.get_device_name(0)
        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
        
        print(f"🚀 CUDA GPU detected: {gpu_name}")
        print(f"   GPU Memory: {gpu_memory:.1f} GB")
        print(f"   Excellent for Bi-LSTM CRF training")
        
        return device
    
    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        device = torch.device("mps")
        system_info = platform.uname()
        
        print(f"🍎 Apple Silicon MPS detected: {system_info.machine}")
        print(f"   Optimized for M1/M2/M3 chips")
        print(f"   Good performance for advanced NLP models")
        
        return device
    
    else:
        device = torch.device("cpu")
        cpu_count = torch.get_num_threads()
        
        print(f"💻 CPU mode: {platform.processor()}")
        print(f"   Threads: {cpu_count}")
        print(f"   💡 Tip: Use smaller models and shorter sequences")
        
        return device

# Detect and set device
DEVICE = detect_device()
print(f"\n✅ Device selected: {DEVICE}")

# Set device-specific parameters for advanced models
if DEVICE.type == 'cuda':
    BATCH_SIZE = 32
    LSTM_HIDDEN = 256
    EMBED_DIM = 300
    MAX_SEQ_LEN = 128
    NUM_EPOCHS = 15
elif DEVICE.type == 'mps':
    BATCH_SIZE = 16
    LSTM_HIDDEN = 128
    EMBED_DIM = 200
    MAX_SEQ_LEN = 100
    NUM_EPOCHS = 10
else:  # CPU
    BATCH_SIZE = 8
    LSTM_HIDDEN = 64
    EMBED_DIM = 100
    MAX_SEQ_LEN = 64
    NUM_EPOCHS = 5

print(f"\n⚙️  Advanced NLP parameters (device-optimized):")
print(f"   Batch size: {BATCH_SIZE}")
print(f"   LSTM hidden dimension: {LSTM_HIDDEN}")
print(f"   Embedding dimension: {EMBED_DIM}")
print(f"   Max sequence length: {MAX_SEQ_LEN}")
print(f"   Training epochs: {NUM_EPOCHS}")