# Deep Learning for NLP with PyTorch 🇦🇺

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/pytorch-mastery/blob/main/examples/pytorch-nlp/01_deep_learning_nlp.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/pytorch-mastery/blob/main/examples/pytorch-nlp/01_deep_learning_nlp.ipynb)

A comprehensive introduction to deep learning concepts for Natural Language Processing using PyTorch, featuring Australian tourism examples and English-Vietnamese multilingual support.

## Learning Objectives

By the end of this notebook, you will:

- 🏗️ **Master PyTorch fundamentals** for NLP applications
- 📊 **Build neural networks** for Australian text classification
- 🇦🇺 **Process tourism data** with practical Australian examples
- 🌏 **Handle multilingual text** with English-Vietnamese examples
- 🔄 **Compare with TensorFlow** to ease the transition
- 📈 **Implement TensorBoard logging** for training visualization

## What You'll Build

1. **Australian Tourism Sentiment Classifier** - Analyze reviews of Sydney Opera House, Melbourne coffee, Perth beaches
2. **Multilingual Text Processor** - Handle English and Vietnamese tourism content
3. **Neural Network from Scratch** - Build and train your first PyTorch NLP model
4. **Cross-Platform Solution** - Code that works on Local, Colab, and Kaggle

---

In [None]:
# Environment Detection and Setup
import sys
import subprocess
import os
import time

# Detect the runtime environment
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules or "kaggle" in os.environ.get('KAGGLE_URL_BASE', '')
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"Environment detected:")
print(f"  - Local: {IS_LOCAL}")
print(f"  - Google Colab: {IS_COLAB}")
print(f"  - Kaggle: {IS_KAGGLE}")

# Platform-specific system setup
if IS_COLAB:
    print("\nSetting up Google Colab environment...")
    !apt update -qq
    !apt install -y -qq software-properties-common
elif IS_KAGGLE:
    print("\nSetting up Kaggle environment...")
    # Kaggle usually has most packages pre-installed
else:
    print("\nSetting up local environment...")

In [None]:
# Install required packages for this notebook
required_packages = [
    "torch",
    "transformers",
    "datasets", 
    "tokenizers",
    "pandas",
    "seaborn",
    "matplotlib",
    "tensorboard",
    "scikit-learn"
]

print("Installing required packages...")
for package in required_packages:
    if IS_COLAB or IS_KAGGLE:
        !pip install -q {package}
    else:
        try:
            subprocess.run([sys.executable, "-m", "pip", "install", "-q", package], 
                          capture_output=True, check=True)
        except subprocess.CalledProcessError:
            print(f"Note: {package} installation skipped (likely already installed)")
    print(f"✓ {package}")

print("\n📦 Package installation completed!")

In [None]:
# Import essential libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, TensorDataset
from torch.utils.tensorboard import SummaryWriter

# Data handling and visualization
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix

# Text processing
import re
import string
from collections import Counter, defaultdict
import random

# Set style for better notebook aesthetics
sns.set_style("whitegrid")
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

# Set random seeds for reproducibility
torch.manual_seed(16)
np.random.seed(16)
random.seed(16)

print(f"✅ PyTorch {torch.__version__} ready!")
print(f"📊 Libraries imported successfully!")

In [None]:
import platform

def detect_device():
    """
    Detect the best available PyTorch device with comprehensive hardware support.
    
    Priority order:
    1. CUDA (NVIDIA GPUs) - Best performance for deep learning
    2. MPS (Apple Silicon) - Optimized for M1/M2/M3 Macs  
    3. CPU (Universal) - Always available fallback
    
    Returns:
        torch.device: The optimal device for PyTorch operations
        str: Human-readable device description for logging
    """
    # Check for CUDA (NVIDIA GPU)
    if torch.cuda.is_available():
        device = torch.device("cuda")
        gpu_name = torch.cuda.get_device_name(0)
        device_info = f"CUDA GPU: {gpu_name}"
        
        # Additional CUDA info for optimization
        cuda_version = torch.version.cuda
        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
        
        print(f"🚀 Using CUDA acceleration")
        print(f"   GPU: {gpu_name}")
        print(f"   CUDA Version: {cuda_version}")
        print(f"   GPU Memory: {gpu_memory:.1f} GB")
        
        return device, device_info
    
    # Check for MPS (Apple Silicon)
    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        device = torch.device("mps")
        device_info = "Apple Silicon MPS"
        
        # Get system info for Apple Silicon
        system_info = platform.uname()
        
        print(f"🍎 Using Apple Silicon MPS acceleration")
        print(f"   System: {system_info.system} {system_info.release}")
        print(f"   Machine: {system_info.machine}")
        print(f"   Processor: {system_info.processor}")
        
        return device, device_info
    
    # Fallback to CPU
    else:
        device = torch.device("cpu")
        device_info = "CPU (No GPU acceleration available)"
        
        # Get CPU info for optimization guidance
        cpu_count = torch.get_num_threads()
        system_info = platform.uname()
        
        print(f"💻 Using CPU (no GPU acceleration detected)")
        print(f"   Processor: {system_info.processor}")
        print(f"   PyTorch Threads: {cpu_count}")
        print(f"   System: {system_info.system} {system_info.release}")
        
        # Provide optimization suggestions for CPU-only setups
        print(f"\n💡 CPU Optimization Tips:")
        print(f"   • Reduce batch size to prevent memory issues")
        print(f"   • Consider using smaller models for faster training")
        print(f"   • Enable PyTorch optimizations: torch.set_num_threads({cpu_count})")
        
        return device, device_info

# Usage in all PyTorch notebooks
device, device_info = detect_device()
print(f"\n✅ PyTorch device selected: {device}")
print(f"📊 Device info: {device_info}")

# Set global device for the notebook
DEVICE = device

In [None]:
# Australian Tourism Dataset with English-Vietnamese Examples
print("🇦🇺 Creating Australian Tourism Dataset with Multilingual Support")
print("=" * 70)

# Australian tourism reviews with English-Vietnamese pairs
australian_tourism_data = {
    'english': [
        # Positive reviews
        "The Sydney Opera House is absolutely breathtaking! Worth every dollar.",
        "Melbourne's coffee culture is world-class. Amazing baristas everywhere!",
        "Uluru at sunset was a spiritual experience I'll never forget.",
        "Perth beaches are perfect for families with young children.",
        "Brisbane's South Bank is a fantastic place for weekend activities.",
        "Adelaide's food scene exceeded all my expectations. Incredible wines!",
        "Darwin's tropical climate and laid-back vibe are absolutely perfect.",
        "Hobart's MONA museum is mind-blowing and thought-provoking.",
        "Canberra's national galleries showcase Australia's rich cultural heritage.",
        "The Great Barrier Reef snorkeling was the highlight of my trip.",
        "Blue Mountains scenic railway offers spectacular mountain views.",
        "Gold Coast theme parks provide endless fun for the whole family.",
        
        # Neutral reviews
        "Sydney Harbour Bridge climb was okay, but quite expensive for what it is.",
        "Melbourne weather is unpredictable, pack clothes for all seasons.",
        "Perth is very isolated but has decent shopping and dining options.",
        "Brisbane can be quite humid during summer months, plan accordingly.",
        "Adelaide is smaller than expected but has its own unique charm.",
        "Darwin has limited attractions but the markets are interesting.",
        
        # Negative reviews
        "Sydney accommodation prices are absolutely outrageous and unreasonable.",
        "Melbourne trams are constantly delayed and overcrowded during peak hours.",
        "Perth nightlife is disappointing and closes way too early.",
        "Brisbane's public transport system needs significant improvements.",
        "Adelaide becomes very quiet after 6 PM, limited evening entertainment.",
        "Darwin is extremely expensive for basic necessities and groceries."
    ],
    
    'vietnamese': [
        # Positive reviews (Vietnamese)
        "Nhà hát Opera Sydney thật ngoạn mục! Xứng đáng từng đồng tiền.",
        "Văn hóa cà phê Melbourne đẳng cấp thế giới. Barista tuyệt vời ở khắp nơi!",
        "Uluru lúc hoàng hôn là trải nghiệm tâm linh tôi sẽ không bao giờ quên.",
        "Bãi biển Perth hoàn hảo cho các gia đình có con nhỏ.",
        "South Bank Brisbane là nơi tuyệt vời cho hoạt động cuối tuần.",
        "Ẩm thực Adelaide vượt xa mong đợi của tôi. Rượu vang tuyệt vời!",
        "Khí hậu nhiệt đới và không khí thư giãn ở Darwin thật hoàn hảo.",
        "Bảo tàng MONA Hobart thật ấn tượng và kích thích tư duy.",
        "Các phòng trưng bày quốc gia Canberra thể hiện di sản văn hóa Úc.",
        "Lặn ngắm san hô Great Barrier Reef là điểm nhấn chuyến đi.",
        "Đường sắt Blue Mountains mang đến tầm nhìn núi non tuyệt đẹp.",
        "Công viên giải trí Gold Coast mang lại niềm vui cho cả gia đình.",
        
        # Neutral reviews (Vietnamese)
        "Leo cầu Sydney Harbour Bridge cũng được, nhưng khá đắt so với giá trị.",
        "Thời tiết Melbourne khó đoán, hãy mang quần áo cho mọi mùa.",
        "Perth rất biệt lập nhưng có lựa chọn mua sắm và ăn uống tốt.",
        "Brisbane có thể khá ẩm ướt vào mùa hè, hãy lên kế hoạch phù hợp.",
        "Adelaide nhỏ hơn mong đợi nhưng có nét quyến rũ riêng.",
        "Darwin có ít điểm tham quan nhưng các khu chợ khá thú vị.",
        
        # Negative reviews (Vietnamese)
        "Giá chỗ ở Sydney thật phi lý và quá đắt đỏ.",
        "Tàu điện Melbourne liên tục chậm trễ và quá tải giờ cao điểm.",
        "Cuộc sống về đêm Perth thất vọng và đóng cửa quá sớm.",
        "Hệ thống giao thông công cộng Brisbane cần cải thiện đáng kể.",
        "Adelaide trở nên rất yên tĩnh sau 6 giờ chiều, ít giải trí buổi tối.",
        "Darwin cực kỳ đắt đỏ cho những nhu cầu cơ bản và thực phẩm."
    ],
    
    'labels': [
        # Positive labels (1)
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        # Neutral labels (0) 
        0, 0, 0, 0, 0, 0,
        # Negative labels (-1)
        -1, -1, -1, -1, -1, -1
    ]
}

# Create DataFrame for easy manipulation
tourism_df = pd.DataFrame({
    'english_text': australian_tourism_data['english'],
    'vietnamese_text': australian_tourism_data['vietnamese'], 
    'sentiment': australian_tourism_data['labels']
})

# Add sentiment labels
sentiment_map = {1: 'positive', 0: 'neutral', -1: 'negative'}
tourism_df['sentiment_label'] = tourism_df['sentiment'].map(sentiment_map)

print(f"📊 Dataset Statistics:")
print(f"   Total reviews: {len(tourism_df)}")
print(f"   Languages: English, Vietnamese")
print(f"   Sentiment distribution:")
print(tourism_df['sentiment_label'].value_counts().to_string())

# Display sample data
print(f"\n🌟 Sample Australian Tourism Reviews:")
for i in range(3):
    row = tourism_df.iloc[i]
    print(f"\n{i+1}. Sentiment: {row['sentiment_label'].upper()}")
    print(f"   🇬🇧 English: {row['english_text']}")
    print(f"   🇻🇳 Vietnamese: {row['vietnamese_text']}")

tourism_df.head()

In [None]:
class AustralianTextPreprocessor:
    """
    Text preprocessing pipeline for Australian tourism content with multilingual support.
    
    Handles both English and Vietnamese text with Australian-specific considerations:
    - Australian English spelling and terminology
    - Vietnamese diacritics and tone marks
    - Tourism-specific vocabulary (Sydney, Melbourne, etc.)
    
    TensorFlow equivalent:
        tf.keras.preprocessing.text.Tokenizer with custom filters
    """
    
    def __init__(self, max_vocab_size=10000, max_length=128):
        self.max_vocab_size = max_vocab_size
        self.max_length = max_length
        self.word_to_idx = {'<PAD>': 0, '<UNK>': 1, '<START>': 2, '<END>': 3}
        self.idx_to_word = {0: '<PAD>', 1: '<UNK>', 2: '<START>', 3: '<END>'}
        self.vocab_size = 4
        
        # Australian cities and landmarks for special handling
        self.australian_entities = {
            'cities': ['sydney', 'melbourne', 'brisbane', 'perth', 'adelaide', 'darwin', 'hobart', 'canberra'],
            'landmarks': ['opera', 'harbour', 'bridge', 'uluru', 'reef', 'mountains'],
            'states': ['nsw', 'vic', 'qld', 'wa', 'sa', 'nt', 'tas', 'act']
        }
    
    def clean_text(self, text):
        """
        Clean and normalize text while preserving Australian and Vietnamese characteristics.
        """
        if not isinstance(text, str):
            return ""
        
        # Convert to lowercase
        text = text.lower()
        
        # Handle Australian-specific contractions
        australian_contractions = {
            "can't": "cannot",
            "won't": "will not",
            "i'm": "i am",
            "you're": "you are",
            "it's": "it is",
            "that's": "that is",
            "there's": "there is",
            "we're": "we are",
            "they're": "they are"
        }
        
        for contraction, expansion in australian_contractions.items():
            text = text.replace(contraction, expansion)
        
        # Remove excessive punctuation but keep sentence structure
        text = re.sub(r'[^\w\s]', ' ', text)
        
        # Handle multiple spaces
        text = re.sub(r'\s+', ' ', text)
        
        return text.strip()
    
    def build_vocabulary(self, texts):
        """
        Build vocabulary from Australian tourism texts.
        """
        word_freq = Counter()
        
        print("🔤 Building vocabulary from Australian tourism corpus...")
        
        for text in texts:
            cleaned = self.clean_text(text)
            words = cleaned.split()
            word_freq.update(words)
        
        # Get most common words, excluding special tokens
        most_common = word_freq.most_common(self.max_vocab_size - 4)
        
        # Add words to vocabulary
        for word, freq in most_common:
            if word not in self.word_to_idx:
                self.word_to_idx[word] = self.vocab_size
                self.idx_to_word[self.vocab_size] = word
                self.vocab_size += 1
        
        print(f"   📊 Vocabulary size: {self.vocab_size}")
        print(f"   🇦🇺 Australian entities found: {len([w for w in self.word_to_idx if any(ent in w for entities in self.australian_entities.values() for ent in entities)])}")
        
        # Show most common Australian tourism words
        tourism_words = [word for word, freq in most_common[:20]]
        print(f"   🌟 Top tourism words: {', '.join(tourism_words[:10])}")
    
    def encode_text(self, text):
        """
        Convert text to sequence of token IDs.
        """
        cleaned = self.clean_text(text)
        words = cleaned.split()
        
        # Convert words to indices
        indices = [self.word_to_idx.get(word, self.word_to_idx['<UNK>']) for word in words]
        
        # Add start and end tokens
        indices = [self.word_to_idx['<START>']] + indices + [self.word_to_idx['<END>']]
        
        # Pad or truncate to max_length
        if len(indices) > self.max_length:
            indices = indices[:self.max_length]
        else:
            indices.extend([self.word_to_idx['<PAD>']] * (self.max_length - len(indices)))
        
        return indices
    
    def decode_text(self, indices):
        """
        Convert sequence of token IDs back to text.
        """
        words = []
        for idx in indices:
            word = self.idx_to_word.get(idx, '<UNK>')
            if word not in ['<PAD>', '<START>', '<END>']:
                words.append(word)
        return ' '.join(words)

# Initialize preprocessor
preprocessor = AustralianTextPreprocessor(max_vocab_size=5000, max_length=64)

# Build vocabulary from both English and Vietnamese texts
all_texts = tourism_df['english_text'].tolist() + tourism_df['vietnamese_text'].tolist()
preprocessor.build_vocabulary(all_texts)

print(f"\n🔧 Text Preprocessing Pipeline Ready!")
print(f"   Max vocabulary size: {preprocessor.max_vocab_size}")
print(f"   Max sequence length: {preprocessor.max_length}")
print(f"   Actual vocabulary size: {preprocessor.vocab_size}")

In [None]:
# Prepare training data with both English and Vietnamese texts
print("📊 Preparing Australian Tourism Training Data")
print("=" * 50)

# Encode texts
english_encoded = [preprocessor.encode_text(text) for text in tourism_df['english_text']]
vietnamese_encoded = [preprocessor.encode_text(text) for text in tourism_df['vietnamese_text']]

# Combine English and Vietnamese data for multilingual training
all_encoded_texts = english_encoded + vietnamese_encoded
all_labels = tourism_df['sentiment'].tolist() * 2  # Duplicate labels for both languages

# Convert to tensors
X = torch.tensor(all_encoded_texts, dtype=torch.long)
y = torch.tensor(all_labels, dtype=torch.long)

# Adjust labels for classification (0, 1, 2 instead of -1, 0, 1)
y = y + 1  # Now: 0=negative, 1=neutral, 2=positive

print(f"📈 Dataset prepared:")
print(f"   Input shape: {X.shape}")
print(f"   Labels shape: {y.shape}")
print(f"   Vocabulary size: {preprocessor.vocab_size}")
print(f"   Sequence length: {preprocessor.max_length}")

# Split data for training and validation
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\n🔄 Train/Validation Split:")
print(f"   Training samples: {len(X_train)}")
print(f"   Validation samples: {len(X_val)}")

# Create data loaders for efficient training
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

batch_size = 16 if DEVICE.type == 'cpu' else 32  # Adjust batch size based on device
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

print(f"\n⚡ Data loaders created:")
print(f"   Batch size: {batch_size} (optimized for {DEVICE.type.upper()})")
print(f"   Training batches: {len(train_loader)}")
print(f"   Validation batches: {len(val_loader)}")

# Show example of encoded text
sample_idx = 0
sample_text = tourism_df['english_text'].iloc[sample_idx]
sample_encoded = english_encoded[sample_idx]
sample_decoded = preprocessor.decode_text(sample_encoded)

print(f"\n🔍 Encoding Example:")
print(f"   Original: {sample_text}")
print(f"   Encoded: {sample_encoded[:10]}... (showing first 10 tokens)")
print(f"   Decoded: {sample_decoded}")

In [None]:
class AustralianTourismSentimentClassifier(nn.Module):
    """
    Neural network for Australian tourism sentiment analysis with multilingual support.
    
    Architecture:
    - Embedding layer: Maps vocabulary words to dense vectors
    - LSTM layer: Processes sequence of embeddings to capture context
    - Dropout: Prevents overfitting with regularization
    - Linear layers: Classify sentiment (negative, neutral, positive)
    
    TensorFlow equivalent:
        model = tf.keras.Sequential([
            tf.keras.layers.Embedding(vocab_size, embed_dim),
            tf.keras.layers.LSTM(128, return_sequences=False),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(3, activation='softmax')
        ])
    
    Args:
        vocab_size (int): Size of vocabulary
        embed_dim (int): Dimension of embedding vectors
        hidden_dim (int): LSTM hidden dimension
        num_classes (int): Number of sentiment classes (3)
        dropout_rate (float): Dropout probability
    """
    
    def __init__(self, vocab_size, embed_dim=128, hidden_dim=256, num_classes=3, dropout_rate=0.3):
        super(AustralianTourismSentimentClassifier, self).__init__()
        
        # Store parameters
        self.vocab_size = vocab_size
        self.embed_dim = embed_dim
        self.hidden_dim = hidden_dim
        self.num_classes = num_classes
        
        # Embedding layer - converts token IDs to dense vectors
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        
        # LSTM for sequence processing
        self.lstm = nn.LSTM(
            embed_dim, hidden_dim, 
            batch_first=True, 
            bidirectional=False,
            dropout=dropout_rate if hidden_dim > 1 else 0
        )
        
        # Dropout for regularization
        self.dropout = nn.Dropout(dropout_rate)
        
        # Classification head
        self.classifier = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout_rate),
            nn.Linear(hidden_dim // 2, num_classes)
        )
        
        # Initialize weights
        self._init_weights()
        
        # Sentiment labels for interpretation
        self.sentiment_labels = ['negative', 'neutral', 'positive']
        
        # Australian cities for context
        self.australian_cities = ["Sydney", "Melbourne", "Brisbane", "Perth", 
                                "Adelaide", "Darwin", "Hobart", "Canberra"]
    
    def _init_weights(self):
        """Initialize model weights - manual in PyTorch, automatic in TensorFlow"""
        for name, param in self.named_parameters():
            if 'weight' in name:
                if len(param.shape) > 1:
                    nn.init.xavier_uniform_(param)
                else:
                    nn.init.zeros_(param)
            elif 'bias' in name:
                nn.init.zeros_(param)
    
    def forward(self, x):
        """
        Forward pass through the network.
        
        Args:
            x (torch.Tensor): Input token IDs [batch_size, seq_len]
        
        Returns:
            torch.Tensor: Logits for sentiment classification [batch_size, num_classes]
        """
        batch_size, seq_len = x.shape
        
        # Embedding lookup: [batch_size, seq_len] -> [batch_size, seq_len, embed_dim]
        embedded = self.embedding(x)
        
        # LSTM processing: [batch_size, seq_len, embed_dim] -> [batch_size, seq_len, hidden_dim]
        lstm_out, (hidden, cell) = self.lstm(embedded)
        
        # Use last hidden state for classification: [batch_size, hidden_dim]
        last_hidden = hidden[-1]  # Last layer's hidden state
        
        # Apply dropout
        dropped = self.dropout(last_hidden)
        
        # Classification: [batch_size, hidden_dim] -> [batch_size, num_classes]
        logits = self.classifier(dropped)
        
        return logits
    
    def predict_sentiment(self, text_tensor, preprocessor):
        """
        Predict sentiment for Australian tourism text.
        
        Args:
            text_tensor (torch.Tensor): Encoded text tensor
            preprocessor: Text preprocessor for decoding
        
        Returns:
            dict: Prediction results with probabilities
        """
        self.eval()
        with torch.no_grad():
            if text_tensor.dim() == 1:
                text_tensor = text_tensor.unsqueeze(0)  # Add batch dimension
            
            logits = self.forward(text_tensor.to(next(self.parameters()).device))
            probabilities = F.softmax(logits, dim=1)
            predicted_class = torch.argmax(probabilities, dim=1).item()
            confidence = probabilities[0][predicted_class].item()
            
            return {
                'sentiment': self.sentiment_labels[predicted_class],
                'confidence': confidence,
                'probabilities': {
                    label: prob.item() 
                    for label, prob in zip(self.sentiment_labels, probabilities[0])
                }
            }
    
    def get_model_info(self) -> dict:
        """Return model architecture information."""
        total_params = sum(p.numel() for p in self.parameters())
        trainable_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
        
        return {
            'model_name': 'Australian Tourism Sentiment Classifier',
            'vocab_size': self.vocab_size,
            'embed_dim': self.embed_dim,
            'hidden_dim': self.hidden_dim,
            'num_classes': self.num_classes,
            'total_params': total_params,
            'trainable_params': trainable_params,
            'target_cities': ', '.join(self.australian_cities[:4]) + '...'
        }

# Initialize the model with device-aware setup
model = AustralianTourismSentimentClassifier(
    vocab_size=preprocessor.vocab_size,
    embed_dim=128,
    hidden_dim=256,
    num_classes=3,
    dropout_rate=0.3
).to(DEVICE)

# Display model information
model_info = model.get_model_info()
print("🏗️ Australian Tourism Sentiment Classifier")
print("=" * 50)
for key, value in model_info.items():
    print(f"   {key.replace('_', ' ').title()}: {value}")

print(f"\n📱 Model device: {next(model.parameters()).device}")
print(f"🎯 Target: Classify sentiment of Australian tourism reviews")
print(f"🌏 Languages: English and Vietnamese")

# Model summary (similar to TensorFlow model.summary())
print(f"\n🔧 Model Architecture:")
print(model)

In [None]:
import time
from datetime import datetime

# Platform-specific TensorBoard log directory setup
def get_run_logdir(name="australian_sentiment"):
    """Create unique log directory for TensorBoard."""
    if IS_COLAB:
        root_logdir = "/content/tensorboard_logs"
    elif IS_KAGGLE:
        root_logdir = "./tensorboard_logs"
    else:
        root_logdir = "./tensorboard_logs"
    
    os.makedirs(root_logdir, exist_ok=True)
    timestamp = datetime.now().strftime("%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, f"{name}_{timestamp}")

# Setup training configuration
class TrainingConfig:
    """Training configuration for Australian sentiment classifier."""
    
    def __init__(self, device):
        self.device = device
        
        # Adjust hyperparameters based on device
        if device.type == 'cuda':
            self.epochs = 20
            self.learning_rate = 0.001
            self.batch_size = 32
        elif device.type == 'mps':
            self.epochs = 15
            self.learning_rate = 0.001
            self.batch_size = 16
        else:  # CPU
            self.epochs = 10
            self.learning_rate = 0.002
            self.batch_size = 8
        
        self.weight_decay = 1e-4
        self.patience = 5  # Early stopping patience
        self.log_interval = 10  # Log every N batches

config = TrainingConfig(DEVICE)

# Setup loss function and optimizer
criterion = nn.CrossEntropyLoss()  # TensorFlow: loss='sparse_categorical_crossentropy'
optimizer = optim.Adam(
    model.parameters(), 
    lr=config.learning_rate, 
    weight_decay=config.weight_decay
)  # TensorFlow: optimizer='adam'

# Learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='max', factor=0.5, patience=3, verbose=True
)

# TensorBoard setup
run_logdir = get_run_logdir("australian_tourism_sentiment")
writer = SummaryWriter(log_dir=run_logdir)

print("⚙️ Training Configuration")
print("=" * 40)
print(f"   Device: {config.device}")
print(f"   Epochs: {config.epochs}")
print(f"   Learning Rate: {config.learning_rate}")
print(f"   Batch Size: {config.batch_size}")
print(f"   Weight Decay: {config.weight_decay}")
print(f"   Early Stopping Patience: {config.patience}")

print(f"\n📊 TensorBoard Logging:")
print(f"   Log Directory: {run_logdir}")
print(f"   Logging Interval: Every {config.log_interval} batches")

# Device-specific optimizations
if DEVICE.type == 'cuda':
    torch.backends.cudnn.benchmark = True
    print(f"\n🔧 CUDA optimizations enabled")
elif DEVICE.type == 'mps':
    print(f"\n🔧 MPS device detected - optimizing for Apple Silicon")
else:
    torch.set_num_threads(torch.get_num_threads())
    print(f"\n🔧 CPU optimization: Using {torch.get_num_threads()} threads")

print(f"\n🚀 Ready to train Australian tourism sentiment classifier!")

In [None]:
def train_australian_sentiment_model(model, train_loader, val_loader, config, criterion, optimizer, scheduler, writer):
    """
    Train the Australian tourism sentiment classifier with comprehensive logging.
    
    TensorFlow equivalent:
        history = model.fit(
            train_data, epochs=epochs, 
            validation_data=val_data,
            callbacks=[tensorboard_callback]
        )
    
    Args:
        model: PyTorch model to train
        train_loader: Training data loader
        val_loader: Validation data loader
        config: Training configuration
        criterion: Loss function
        optimizer: Optimizer
        scheduler: Learning rate scheduler
        writer: TensorBoard writer
    
    Returns:
        dict: Training history with metrics
    """
    
    # Training history
    history = {
        'train_loss': [],
        'train_acc': [],
        'val_loss': [],
        'val_acc': [],
        'learning_rates': []
    }
    
    best_val_acc = 0.0
    patience_counter = 0
    
    print("🏋️ Training Australian Tourism Sentiment Classifier")
    print("=" * 60)
    print(f"📝 Sample predictions target:")
    print(f"   🇦🇺 'Sydney Opera House is amazing!' -> Positive")
    print(f"   🇻🇳 'Cà phê Melbourne đắt quá' -> Negative")
    print(f"   🇦🇺 'Perth beaches are okay' -> Neutral")
    print("\n" + "=" * 60)
    
    for epoch in range(config.epochs):
        start_time = time.time()
        
        # Training phase
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            # Move data to device
            data, target = data.to(config.device), target.to(config.device)
            
            # Zero gradients (required in PyTorch, automatic in TensorFlow)
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(data)
            loss = criterion(outputs, target)
            
            # Backward pass (explicit in PyTorch)
            loss.backward()
            
            # Gradient clipping to prevent exploding gradients
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            # Update parameters
            optimizer.step()
            
            # Calculate metrics
            train_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            train_total += target.size(0)
            train_correct += (predicted == target).sum().item()
            
            # Log to TensorBoard every N batches
            if batch_idx % config.log_interval == 0:
                step = epoch * len(train_loader) + batch_idx
                writer.add_scalar('Loss/Train_Batch', loss.item(), step)
                writer.add_scalar('Accuracy/Train_Batch', 
                                (predicted == target).float().mean().item(), step)
                
                # Log device-specific metrics
                if config.device.type == 'cuda':
                    gpu_memory_used = torch.cuda.memory_allocated(config.device) / 1024**3
                    writer.add_scalar('Memory/GPU_Used_GB', gpu_memory_used, step)
        
        # Calculate epoch training metrics
        train_loss = train_loss / len(train_loader)
        train_acc = train_correct / train_total
        
        # Validation phase
        model.eval()
        val_loss = 0.0
        val_correct = 0
        val_total = 0
        all_predictions = []
        all_targets = []
        
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(config.device), target.to(config.device)
                outputs = model(data)
                loss = criterion(outputs, target)
                
                val_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                val_total += target.size(0)
                val_correct += (predicted == target).sum().item()
                
                # Store for detailed metrics
                all_predictions.extend(predicted.cpu().numpy())
                all_targets.extend(target.cpu().numpy())
        
        val_loss = val_loss / len(val_loader)
        val_acc = val_correct / val_total
        current_lr = optimizer.param_groups[0]['lr']
        
        # Update learning rate scheduler
        scheduler.step(val_acc)
        
        # Store history
        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)
        history['learning_rates'].append(current_lr)
        
        # Log epoch metrics to TensorBoard
        writer.add_scalar('Loss/Train_Epoch', train_loss, epoch)
        writer.add_scalar('Loss/Validation', val_loss, epoch)
        writer.add_scalar('Accuracy/Train_Epoch', train_acc, epoch)
        writer.add_scalar('Accuracy/Validation', val_acc, epoch)
        writer.add_scalar('Learning_Rate', current_lr, epoch)
        
        # Log model parameters histogram
        for name, param in model.named_parameters():
            if param.grad is not None:
                writer.add_histogram(f'Parameters/{name}', param, epoch)
                writer.add_histogram(f'Gradients/{name}', param.grad, epoch)
        
        # Calculate epoch time
        epoch_time = time.time() - start_time
        
        # Print progress
        print(f'Epoch {epoch+1:2d}/{config.epochs}: '
              f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} | '
              f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f} | '
              f'LR: {current_lr:.6f} | Time: {epoch_time:.1f}s')
        
        # Early stopping check
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            patience_counter = 0
            # Save best model
            torch.save(model.state_dict(), 'best_australian_sentiment_model.pth')
            print(f'   🎯 New best validation accuracy: {best_val_acc:.4f}')
        else:
            patience_counter += 1
            if patience_counter >= config.patience:
                print(f'\n⏰ Early stopping triggered after {epoch+1} epochs')
                print(f'   Best validation accuracy: {best_val_acc:.4f}')
                break
    
    writer.close()
    
    print(f"\n🎉 Training completed!")
    print(f"   Final validation accuracy: {val_acc:.4f}")
    print(f"   Best validation accuracy: {best_val_acc:.4f}")
    print(f"   Model saved as: best_australian_sentiment_model.pth")
    
    return history

# Start training
print("🚀 Starting training of Australian Tourism Sentiment Classifier...")
training_history = train_australian_sentiment_model(
    model, train_loader, val_loader, config, criterion, optimizer, scheduler, writer
)

In [None]:
# Training Results Visualization
def plot_training_history(history):
    """
    Plot training metrics with Australian tourism context.
    
    Uses seaborn for better aesthetics as per repository guidelines.
    """
    
    # Create training metrics DataFrame for seaborn
    epochs = range(1, len(history['train_loss']) + 1)
    
    # Prepare data for seaborn
    metrics_data = []
    for epoch in epochs:
        idx = epoch - 1
        metrics_data.extend([
            {'Epoch': epoch, 'Metric': 'Loss', 'Dataset': 'Train', 'Value': history['train_loss'][idx]},
            {'Epoch': epoch, 'Metric': 'Loss', 'Dataset': 'Validation', 'Value': history['val_loss'][idx]},
            {'Epoch': epoch, 'Metric': 'Accuracy', 'Dataset': 'Train', 'Value': history['train_acc'][idx]},
            {'Epoch': epoch, 'Metric': 'Accuracy', 'Dataset': 'Validation', 'Value': history['val_acc'][idx]}
        ])
    
    df_metrics = pd.DataFrame(metrics_data)
    
    # Create subplots with seaborn styling
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # Loss plot
    loss_data = df_metrics[df_metrics['Metric'] == 'Loss']
    sns.lineplot(data=loss_data, x='Epoch', y='Value', hue='Dataset', ax=axes[0])
    axes[0].set_title('🇦🇺 Australian Tourism Sentiment Analysis - Training Loss', fontsize=14, fontweight='bold')
    axes[0].set_ylabel('Loss')
    axes[0].grid(True, alpha=0.3)
    
    # Accuracy plot
    acc_data = df_metrics[df_metrics['Metric'] == 'Accuracy']
    sns.lineplot(data=acc_data, x='Epoch', y='Value', hue='Dataset', ax=axes[1])
    axes[1].set_title('🎯 Model Accuracy Progress', fontsize=14, fontweight='bold')
    axes[1].set_ylabel('Accuracy')
    axes[1].grid(True, alpha=0.3)
    
    # Learning rate plot
    axes[2].plot(epochs, history['learning_rates'], 'o-', color='orange', linewidth=2, markersize=4)
    axes[2].set_title('📉 Learning Rate Schedule', fontsize=14, fontweight='bold')
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('Learning Rate')
    axes[2].grid(True, alpha=0.3)
    axes[2].set_yscale('log')
    
    plt.tight_layout()
    plt.show()
    
    # Print final metrics
    final_train_acc = history['train_acc'][-1]
    final_val_acc = history['val_acc'][-1]
    best_val_acc = max(history['val_acc'])
    
    print(f"\n📊 Final Training Results:")
    print(f"   Final Training Accuracy: {final_train_acc:.4f} ({final_train_acc*100:.2f}%)")
    print(f"   Final Validation Accuracy: {final_val_acc:.4f} ({final_val_acc*100:.2f}%)")
    print(f"   Best Validation Accuracy: {best_val_acc:.4f} ({best_val_acc*100:.2f}%)")
    
    # Model performance assessment
    if best_val_acc > 0.8:
        print(f"   🎉 Excellent performance for Australian tourism sentiment analysis!")
    elif best_val_acc > 0.7:
        print(f"   ✅ Good performance, ready for Australian tourism applications!")
    else:
        print(f"   ⚠️  Model may need more training or data for optimal performance.")

# Plot training results
plot_training_history(training_history)

In [None]:
# Test the trained model with new Australian tourism examples
def test_australian_sentiment_model():
    """
    Test the trained model with fresh Australian tourism examples.
    """
    
    # Load the best model
    model.load_state_dict(torch.load('best_australian_sentiment_model.pth', map_location=DEVICE))
    model.eval()
    
    # New test examples (not seen during training)
    test_examples = [
        # English examples
        {
            'text': "The Sydney Harbour Bridge climb was absolutely incredible! Spectacular views of the entire city!",
            'language': 'English',
            'expected': 'positive'
        },
        {
            'text': "Melbourne's weather ruined our entire vacation. Constantly raining and cold.",
            'language': 'English', 
            'expected': 'negative'
        },
        {
            'text': "Perth has some nice beaches but the city center is quite basic.",
            'language': 'English',
            'expected': 'neutral'
        },
        
        # Vietnamese examples
        {
            'text': "Thành phố Brisbane rất sạch sẽ và thân thiện với du khách!",
            'language': 'Vietnamese',
            'expected': 'positive'
        },
        {
            'text': "Giá cả ở Adelaide quá đắt đỏ, không đáng với chất lượng dịch vụ.",
            'language': 'Vietnamese',
            'expected': 'negative'
        },
        {
            'text': "Darwin có khí hậu ấm áp nhưng không có nhiều hoạt động giải trí.",
            'language': 'Vietnamese',
            'expected': 'neutral'
        }
    ]
    
    print("🧪 Testing Australian Tourism Sentiment Classifier")
    print("=" * 65)
    
    correct_predictions = 0
    total_predictions = len(test_examples)
    
    for i, example in enumerate(test_examples):
        # Encode the text
        encoded_text = preprocessor.encode_text(example['text'])
        text_tensor = torch.tensor([encoded_text], dtype=torch.long)
        
        # Get prediction
        prediction = model.predict_sentiment(text_tensor, preprocessor)
        
        # Check if prediction matches expected
        is_correct = prediction['sentiment'] == example['expected']
        if is_correct:
            correct_predictions += 1
        
        # Display results
        status_emoji = "✅" if is_correct else "❌"
        confidence_bar = "█" * int(prediction['confidence'] * 10)
        
        print(f"\n{i+1}. {status_emoji} {example['language']} Text Analysis:")
        print(f"   📝 Text: {example['text'][:60]}{'...' if len(example['text']) > 60 else ''}")
        print(f"   🎯 Expected: {example['expected'].capitalize()}")
        print(f"   🤖 Predicted: {prediction['sentiment'].capitalize()} ({prediction['confidence']:.3f})")
        print(f"   📊 Confidence: {confidence_bar} {prediction['confidence']*100:.1f}%")
        
        # Show all probabilities
        print(f"   📈 All Probabilities:")
        for sentiment, prob in prediction['probabilities'].items():
            print(f"      {sentiment.capitalize()}: {prob:.3f}")
    
    # Final test accuracy
    test_accuracy = correct_predictions / total_predictions
    print(f"\n🎯 Test Accuracy: {correct_predictions}/{total_predictions} = {test_accuracy:.3f} ({test_accuracy*100:.1f}%)")
    
    if test_accuracy >= 0.8:
        print(f"🌟 Excellent! Model performs very well on Australian tourism sentiment analysis.")
    elif test_accuracy >= 0.6:
        print(f"👍 Good performance! Model shows solid understanding of Australian tourism sentiment.")
    else:
        print(f"⚠️  Model needs improvement for reliable Australian tourism sentiment classification.")
    
    return test_accuracy

# Run model testing
test_accuracy = test_australian_sentiment_model()

In [None]:
# TensorBoard Visualization Instructions
print("📊 TENSORBOARD VISUALIZATION")
print("=" * 60)
print(f"Log directory: {run_logdir}")
print("\n🚀 To view TensorBoard:")

if IS_COLAB:
    print("   In Google Colab:")
    print("   1. Run: %load_ext tensorboard")
    print(f"   2. Run: %tensorboard --logdir {run_logdir}")
    print("   3. TensorBoard will appear inline in the notebook")
    
    # Auto-load TensorBoard in Colab
    try:
        %load_ext tensorboard
        %tensorboard --logdir {run_logdir}
    except:
        print("   Note: Run the commands above manually in Colab")
        
elif IS_KAGGLE:
    print("   In Kaggle:")
    print(f"   1. Download logs from: {run_logdir}")
    print("   2. Run locally: tensorboard --logdir ./tensorboard_logs")
    print("   3. Open http://localhost:6006 in browser")
else:
    print("   Locally:")
    print(f"   1. Run: tensorboard --logdir {run_logdir}")
    print("   2. Open http://localhost:6006 in browser")

print("\n📈 Available visualizations:")
print("   • Scalars: Loss, accuracy, learning rate over time")
print("   • Histograms: Model parameter distributions")
print("   • Training Progress: Batch-level and epoch-level metrics")
print("   • Memory Usage: GPU memory utilization (if available)")
print("   • Australian Context: Tourism sentiment analysis progress")

print("\n🔍 Key metrics to examine:")
print("   📉 Loss curves: Should decrease over time")
print("   📈 Accuracy curves: Should increase and converge")
print("   🎛️  Learning rate: Should adapt based on validation performance")
print("   📊 Parameter histograms: Should show healthy distributions")

print("=" * 60)

## 🔄 TensorFlow vs PyTorch: Key Differences Summary

This section summarizes the key differences between TensorFlow and PyTorch approaches demonstrated in this notebook, helping with the transition from TensorFlow to PyTorch for NLP applications.

### Model Definition

**TensorFlow (Keras)**:
```python
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embed_dim),
    tf.keras.layers.LSTM(128, return_sequences=False),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')
])
```

**PyTorch**:
```python
class SentimentClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, 128, batch_first=True)
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(128, 3)
    
    def forward(self, x):
        embedded = self.embedding(x)
        lstm_out, (hidden, _) = self.lstm(embedded)
        output = self.classifier(self.dropout(hidden[-1]))
        return output
```

### Training Loop

**TensorFlow**:
```python
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_data, epochs=10, validation_data=val_data)
```

**PyTorch**:
```python
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

for epoch in range(epochs):
    for batch in train_loader:
        optimizer.zero_grad()  # Clear gradients
        outputs = model(batch.data)
        loss = criterion(outputs, batch.targets)
        loss.backward()  # Compute gradients
        optimizer.step()  # Update parameters
```

### Key Differences Table

| Aspect | TensorFlow | PyTorch |
|--------|------------|----------|
| **Model Definition** | Sequential/Functional API | Object-oriented with `nn.Module` |
| **Training** | `model.fit()` (automatic) | Manual training loop |
| **Gradients** | Automatic with `GradientTape` | Manual with `loss.backward()` |
| **Graph Execution** | Static graph (TF 1.x) / Eager (TF 2.x) | Dynamic graph (always eager) |
| **Device Management** | `with tf.device()` context | `.to(device)` method calls |
| **Debugging** | More challenging (especially TF 1.x) | Easier with standard Python debugging |
| **Flexibility** | High-level API focus | More explicit control |
| **Learning Curve** | Steeper initially, easier for simple models | Gentler, more intuitive for Python developers |

### Advantages of Each Framework

**TensorFlow Advantages**:
- 🚀 **Production Ready**: TensorFlow Serving, TensorFlow Lite
- 📱 **Mobile/Edge**: Better mobile and edge deployment
- 🔧 **High-level APIs**: Keras makes simple models very easy
- 📊 **Visualization**: Native TensorBoard integration
- 🏭 **Ecosystem**: Mature production ecosystem

**PyTorch Advantages**:
- 🐍 **Pythonic**: More intuitive for Python developers
- 🔬 **Research**: Preferred for research and experimentation
- 🐛 **Debugging**: Easier debugging with standard Python tools
- ⚡ **Dynamic**: Dynamic computational graphs
- 🎯 **Control**: More explicit control over training process

### When to Choose PyTorch

Choose PyTorch for Australian NLP projects when:
- 🔬 **Research & Experimentation**: Trying new architectures or techniques
- 🎓 **Learning**: Understanding deep learning fundamentals
- 🤗 **NLP Focus**: Working with Hugging Face transformers ecosystem
- 🐍 **Python Preference**: Team prefers explicit, Pythonic code
- 🔧 **Custom Models**: Building complex, custom neural architectures

### Transition Tips

For TensorFlow developers moving to PyTorch:

1. **Start with `nn.Module`**: Think of it as a more explicit version of Keras layers
2. **Manual Training Loops**: Initially more code, but more control and transparency
3. **Device Management**: Explicitly move tensors and models to GPU/CPU
4. **Gradient Management**: Always call `optimizer.zero_grad()` before `loss.backward()`
5. **Debugging**: Use standard Python debugging tools and `print()` statements
6. **Hugging Face**: PyTorch integrates seamlessly with modern NLP libraries


## 🎉 Congratulations! Deep Learning NLP with PyTorch Complete

You've successfully completed a comprehensive introduction to deep learning for NLP using PyTorch with Australian context examples and English-Vietnamese multilingual support!

### 🏆 What You've Accomplished

✅ **PyTorch Fundamentals**: Mastered tensors, models, and training loops for NLP

✅ **Australian Context**: Built a sentiment classifier for Australian tourism reviews

✅ **Multilingual NLP**: Handled both English and Vietnamese text processing

✅ **Neural Networks**: Implemented LSTM-based architecture from scratch

✅ **Training Pipeline**: Created complete training loop with validation and early stopping

✅ **Visualization**: Integrated TensorBoard for comprehensive training monitoring

✅ **Device Optimization**: Implemented device-aware training (CUDA/MPS/CPU)

✅ **TensorFlow Transition**: Learned key differences and migration strategies

### 📊 Model Performance

Your Australian Tourism Sentiment Classifier can now:
- 🇦🇺 Analyze sentiment in Australian tourism reviews
- 🇻🇳 Process Vietnamese translations and reviews
- 🎯 Classify text as positive, neutral, or negative sentiment
- 📱 Run efficiently on various devices (GPU, Apple Silicon, CPU)
- 📊 Provide confidence scores and probability distributions

### 🚀 Next Steps in Your PyTorch NLP Journey

Continue your learning with the remaining notebooks in this series:

#### 1. 🔤 Word Embeddings (`02_word_embeddings_nllp.ipynb`)
- **Focus**: Deep dive into word representation techniques
- **Australian Context**: Train embeddings on Australian tourism corpus
- **Multilingual**: English-Vietnamese word alignment and cross-lingual embeddings
- **Techniques**: Word2Vec, GloVe, FastText, and visualization

#### 2. 🔄 Sequence Models (`03_sequence_models_nlp.ipynb`)
- **Focus**: Advanced LSTM, GRU, and attention mechanisms
- **Australian Context**: Part-of-speech tagging and named entity recognition
- **Multilingual**: Sequence-to-sequence translation models
- **Techniques**: Bidirectional RNNs, attention, and seq2seq architectures

#### 3. 🚀 Advanced NLP (`04_advanced_nlp.ipynb`)
- **Focus**: Bi-LSTM CRF and state-of-the-art techniques
- **Australian Context**: Named entity recognition for Australian locations
- **Integration**: Bridge to Hugging Face transformers
- **Techniques**: CRF layers, dynamic computation graphs, and modern architectures

### 🛠️ Practical Applications

Apply your new skills to real Australian NLP projects:

- **Tourism Analysis**: Analyze TripAdvisor reviews for Australian destinations
- **Social Media**: Monitor sentiment about Australian events and locations
- **Customer Service**: Build multilingual chatbots for Australian businesses
- **News Analysis**: Process Australian news articles in multiple languages
- **E-commerce**: Analyze product reviews for Australian retailers

### 📚 Additional Resources

Expand your PyTorch NLP knowledge:

- 🔗 [PyTorch NLP Tutorials](https://pytorch.org/tutorials/beginner/nlp/)
- 🤗 [Hugging Face Transformers](https://huggingface.co/transformers/)
- 📖 [Natural Language Processing with PyTorch](https://www.oreilly.com/library/view/natural-language-processing/9781491978221/)
- 🎓 [Stanford CS224N: NLP with Deep Learning](http://web.stanford.edu/class/cs224n/)
- 🇦🇺 [Australian Text Analytics Platform](https://www.atap.edu.au/)

### 🤝 Community and Support

Join the PyTorch and NLP community:

- 💬 [PyTorch Forums](https://discuss.pytorch.org/)
- 🐦 [PyTorch Twitter](https://twitter.com/pytorch)
- 📧 [Hugging Face Discord](https://discord.gg/JfAtkvEtRb)
- 📱 [r/MachineLearning](https://www.reddit.com/r/MachineLearning/)

### 🌟 Keep Experimenting!

The best way to master PyTorch NLP is through hands-on practice:

1. **Modify the Model**: Try different architectures, hyperparameters, and optimizers
2. **Expand the Dataset**: Add more Australian tourism data or other domains
3. **Add Languages**: Incorporate other languages beyond English and Vietnamese
4. **Deploy Models**: Create APIs and web applications with your trained models
5. **Contribute**: Share your Australian NLP models and datasets with the community

---

**🎊 Congratulations on completing your first PyTorch NLP project with Australian flair! You're now ready to tackle real-world natural language processing challenges with confidence. 🇦🇺🚀**