# Explanation


## Workshop 3 — RNN: From-Scratch Concept and Text Classification (Kaggle)

**Author:** Arman Golbidi  
**Course:** BM20A6100 Advanced Data Analysis and Machine Learning  
**Period:** 1 Sep 2025 – 12 Dec 2025

---

## 1) Introduction

This notebook implements a compact text classification pipeline with a vanilla **PyTorch RNN**. We (i) review the core mechanics of RNNs using a from-scratch concept, and (ii) train a practical classifier on a small Kaggle news-style dataset. Hyperparameters are tuned with **Optuna** (20 trials). We then retrain with the best settings and report accuracy, learning curves, and a confusion matrix.

---

## 2) Dataset

- **Source:** *Text classification documentation* (Kaggle).  
- **Size:** 2,225 labeled documents.  
- **Labels (5):** `0=Politics`, `1=Sport`, `2=Technology`, `3=Entertainment`, `4=Business`.  
- **Split:** Train 80% (1780), Val 10% (222), Test 10% (223).

### Preprocessing
- Lowercase text and drop empty rows.
- Build a frequency-based **vocabulary of 5,000** tokens with special tokens `<PAD>=0` and `<UNK>=1` (≈8.25% unique-token coverage).
- Convert each document to a **fixed-length sequence of 50** token IDs via truncation/padding.

---

## 3) Methods

### Model
`Embedding → nn.RNN (tanh) → Dropout → Linear` using the **final hidden state** as sequence representation.

### Training
- **Loss:** Cross-Entropy (with label smoothing if supported).
- **Optimizer:** Adam + weight decay; **LR scheduler:** `ReduceLROnPlateau`.
- **Regularization:** Dropout, gradient clipping (max-norm 5.0), early stopping on validation loss.

### Hyperparameter Tuning (Optuna, 20 trials)
Search over:
- `embedding_dim ∈ {32, 64, 128}`
- `hidden_size ∈ {32, 64, 96, 128, 160}`
- `num_layers ∈ {1, 2, 3}`
- `dropout ∈ [0.1, 0.6]`
- `batch_size ∈ {16, 24, 32, 48, 64}`
- `learning_rate ∈ [1e-4, 5e-3]`
- `weight_decay ∈ [1e-6, 1e-3]`
- `label_smoothing ∈ [0.0, 0.2]`
- `tune_epochs ∈ [25, 60]`

**Best validation accuracy:** `0.6667` (trial 16).  
The model is then retrained with the best configuration for up to **150 epochs** with patience.

---

## 4) Results

- Final **train/val/test accuracy** are printed after training.
- **Learning curves** (loss & accuracy) and **confusion matrix** are saved to:
  - `rnn_pytorch_training_results.png`
- **Per-class test accuracy** is saved to:
  - `rnn_pytorch_per_class_accuracy.png`

> Quick interpretation: The model performs best on *Sport* and *Technology*, and is weaker on *Politics*. There is a noticeable train–val gap indicating mild overfitting.

**Figures**  
(If running in Jupyter, display with `IPython.display.Image` or just open the files)
- `2.rnn_pytorch_training_results.png` — training/validation curves and test confusion matrix  
- `1.rnn_pytorch_per_class_accuracy.png` — per-class accuracy bars

---

## 5) Discussion & Next Steps

- The vanilla RNN baseline is competitive on this small dataset, but can be improved.
- Potential upgrades:
  - Replace RNN with **BiLSTM or GRU**.
  - Use **pretrained subword embeddings** (e.g., fastText) for higher OOV coverage.
  - Mitigate imbalance with **class weighting** or **focal loss**.
  - Try hybrid features (e.g., TF-IDF + neural) or shallow ensembling.

---

## 6) Reproducibility

- Best hyperparameters saved to `optuna_best_params.json`.
- All trials exported to `optuna_trials.csv`.
- Trained model + metadata saved as `rnn_pytorch_model.pt`.

> To reproduce, place `df_file.csv` (with columns `Text`, `Label`) next to the notebook, then run all cells. Figures will be generated automatically.


# Code part

In [None]:
# -*- coding: utf-8 -*-
"""
PyTorch RNN Text Classification + Optuna (20 trials) — Verbose, compatible with older torch
Dataset: df_file.csv with columns ['Text', 'Label'] and 5 classes

Pipeline:
- Preprocess -> vocab (5000) -> indexify (seq_len=50)
- PyTorch RNN with Embedding + nn.RNN
- Optuna tunes: embedding_dim, hidden_size, num_layers, dropout, batch_size,
                learning_rate, weight_decay, label_smoothing, tune_epochs
- Retrain final model with best hyperparameters (max 150 epochs)
- Saves:
    * optuna_trials.csv
    * optuna_best_params.json
    * rnn_pytorch_training_results.png
    * rnn_pytorch_per_class_accuracy.png
    * rnn_pytorch_model.pt
"""

# -----------------------------
# Standard library imports
# -----------------------------
import os
import sys
import subprocess
import json
import warnings
warnings.filterwarnings("ignore")  # silence non-critical warnings for cleaner logs

# -----------------------------
# Third-party imports
# -----------------------------
import numpy as np
import pandas as pd
import matplotlib; matplotlib.use("Agg")  # use non-interactive backend for saving figures
import matplotlib.pyplot as plt
from collections import Counter

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# -----------------------------
# Optuna import / auto-install
# -----------------------------
try:
    import optuna  # hyperparameter optimization framework
except ImportError:
    print("[Setup] Optuna not found. Installing optuna ...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "optuna"])
    import optuna  # import again after installation

optuna.logging.set_verbosity(optuna.logging.INFO)  # reduce Optuna log noise

# -----------------------------
# Config / reproducibility
# -----------------------------
GLOBAL_SEED = 42  # global random seed
np.random.seed(GLOBAL_SEED)
torch.manual_seed(GLOBAL_SEED)

N_TRIALS = 20           # number of Optuna trials
SEQ_LEN = 50            # fixed sequence length (tokens per sample)
VOCAB_SIZE = 5000       # vocabulary cap (top-k most frequent tokens + PAD/UNK)
FINAL_MAX_EPOCHS = 150  # upper bound for final training epochs

PRINT_LINE = "=" * 70  # pretty print separator
def hr(msg: str):
    """Print a big header line for readability."""
    print("\n" + PRINT_LINE)
    print(msg)
    print(PRINT_LINE)

# -----------------------------
# Device selection (no emojis)
# -----------------------------
def get_device():
    """Select best available device: MPS (Apple GPU) > CUDA > CPU."""
    if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        print("Using Mac GPU (MPS)")
        return torch.device("mps")
    if torch.cuda.is_available():
        print(f"Using CUDA GPU: {torch.cuda.get_device_name(0)}")
        return torch.device("cuda")
    print("Using CPU")
    return torch.device("cpu")

device = get_device()  # resolve device once and reuse
print(f"Device: {device}\n")

# -----------------------------
# Load data
# -----------------------------
hr("LOADING DATASET")
# from google.colab import files
# uploaded = files.upload()
# df = pd.read_csv(list(uploaded.keys())[0])

df = pd.read_csv('df_file.csv')  # expects columns: ['Text', 'Label']

print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"Number of classes: {df['Label'].nunique()}")
print("\nLabel distribution:")
print(df['Label'].value_counts().sort_index())

# fixed label to class-name mapping
class_names = {0: 'Politics', 1: 'Sport', 2: 'Technology', 3: 'Entertainment', 4: 'Business'}
print("\nClass mapping:")
for label, name in class_names.items():
    print(f"  {label}: {name} ({len(df[df['Label'] == label])} samples)")
print(PRINT_LINE)

# -----------------------------
# Preprocess
# -----------------------------
hr("PREPROCESSING")
print("[Step] Lowercasing and dropping empty rows ...")
df['Text'] = df['Text'].astype(str).str.lower()     # normalize to lowercase strings
df = df[df['Text'].str.len() > 0].reset_index(drop=True)  # drop empties and reindex
print(f"[Done] Dataset shape after preprocessing: {df.shape}")

# -----------------------------
# Vocabulary
# -----------------------------
hr("VOCABULARY")
print("[Step] Counting token frequencies ...")
all_words = []
for text in df['Text'].values:
    all_words.extend(text.split())  # whitespace tokenization

word_counts = Counter(all_words)  # frequency of each token
print(f"[Info] Total unique tokens: {len(word_counts)}")

print(f"[Step] Building vocab size={VOCAB_SIZE} with <PAD>=0, <UNK>=1 ...")
vocab = {'<PAD>': 0, '<UNK>': 1}  # reserve indices for padding and unknowns
for w, _ in word_counts.most_common(VOCAB_SIZE - 2):
    vocab[w] = len(vocab)  # assign next available index
coverage = (len(vocab) / max(1, len(word_counts))) * 100  # proportion of unique tokens covered
print(f"[Done] Vocab size: {len(vocab)} | Coverage: {coverage:.2f}%")

inverse_vocab = {idx: word for word, idx in vocab.items()}  # for human-readable previews

# -----------------------------
# Sequences
# -----------------------------
hr("SEQUENCE ENCODING")
print(f"[Info] Sequence length: {SEQ_LEN}")
print("[Step] Converting texts to index sequences ...")

X_sequences = []
for text in df['Text'].values:
    words = text.split()[:SEQ_LEN]  # truncate to max length
    seq = [vocab.get(w, vocab['<UNK>']) for w in words]  # map to ids w/ UNK fallback
    if len(seq) < SEQ_LEN:
        seq += [vocab['<PAD>']] * (SEQ_LEN - len(seq))  # right-pad with PAD tokens
    X_sequences.append(seq)

X_sequences = np.array(X_sequences, dtype=np.int64)  # shape: (N, T)
y_labels = df['Label'].astype(int).values            # shape: (N,)

print(f"[Done] X_sequences shape: {X_sequences.shape}")
print(f"[Done] y_labels shape:  {y_labels.shape}")

# -----------------------------
# Split
# -----------------------------
hr("DATA SPLIT")
# 80/10/10 stratified split: first hold out 20%, then split evenly into val/test
X_train, X_temp, y_train, y_temp = train_test_split(
    X_sequences, y_labels, test_size=0.2, random_state=GLOBAL_SEED, stratify=y_labels
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=GLOBAL_SEED, stratify=y_temp
)

print(f"[Info] Training set:   {X_train.shape[0]} samples ({X_train.shape[0]/len(X_sequences)*100:.1f}%)")
print(f"[Info] Validation set: {X_val.shape[0]} samples ({X_val.shape[0]/len(X_sequences)*100:.1f}%)")
print(f"[Info] Test set:       {X_test.shape[0]} samples ({X_test.shape[0]/len(X_sequences)*100:.1f}%)")

# -----------------------------
# Dataset class
# -----------------------------
class TextDataset(Dataset):
    """Thin torch Dataset wrapper around (sequences, labels)."""
    def __init__(self, sequences, labels):
        self.sequences = torch.LongTensor(sequences)  # (N, T)
        self.labels = torch.LongTensor(labels)        # (N,)
    def __len__(self):
        return len(self.sequences)
    def __getitem__(self, idx):
        return self.sequences[idx], self.labels[idx]

# Instantiate datasets for each split
train_dataset = TextDataset(X_train, y_train)
val_dataset   = TextDataset(X_val,   y_val)
test_dataset  = TextDataset(X_test,  y_test)

# -----------------------------
# Model
# -----------------------------
class RNNClassifier(nn.Module):
    """
    Embedding -> RNN -> Dropout -> Linear
    A simple classifier using the last hidden state of an nn.RNN.
    """
    def __init__(self, vocab_size, embedding_dim, hidden_size, output_size,
                 num_layers=2, dropout=0.5):
        super().__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)  # PAD ignored in gradients
        self.rnn = nn.RNN(
            input_size=embedding_dim,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,                     # input/output as (B, T, *)
            dropout=dropout if num_layers > 1 else 0.0,  # internal dropout only if L>1
            nonlinearity='tanh'
        )
        self.dropout = nn.Dropout(dropout)        # regularization on last hidden state
        self.fc = nn.Linear(hidden_size, output_size)  # maps to class logits

    def forward(self, x):
        embedded = self.embedding(x)           # (B, T, E)
        out, h_n = self.rnn(embedded)          # out: (B, T, H); h_n: (L, B, H)
        last_hidden = h_n[-1]                  # take the top-layer final state: (B, H)
        dropped = self.dropout(last_hidden)    # apply dropout before classifier
        logits = self.fc(dropped)              # (B, C)
        return logits

# -----------------------------
# Train / Eval helpers
# -----------------------------
def current_lr(optimizer):
    """Read current learning rate from optimizer."""
    return float(optimizer.param_groups[0]['lr'])

def train_epoch(model, dataloader, criterion, optimizer, device):
    """One training epoch over a dataloader; returns (loss, accuracy)."""
    model.train()
    total_loss, correct, total = 0.0, 0, 0
    for sequences, labels in dataloader:
        sequences, labels = sequences.to(device), labels.to(device)
        optimizer.zero_grad()
        logits = model(sequences)
        loss = criterion(logits, labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=5.0)  # stabilize training
        optimizer.step()

        total_loss += loss.item() * sequences.size(0)
        _, pred = torch.max(logits, 1)
        total += labels.size(0)
        correct += (pred == labels).sum().item()
    return total_loss / total, correct / total

def evaluate(model, dataloader, criterion, device):
    """Evaluation loop without gradient tracking; returns (loss, accuracy)."""
    model.eval()
    total_loss, correct, total = 0.0, 0, 0
    with torch.no_grad():
        for sequences, labels in dataloader:
            sequences, labels = sequences.to(device), labels.to(device)
            logits = model(sequences)
            loss = criterion(logits, labels)
            total_loss += loss.item() * sequences.size(0)
            _, pred = torch.max(logits, 1)
            total += labels.size(0)
            correct += (pred == labels).sum().item()
    return total_loss / total, correct / total

def make_criterion(label_smoothing_value: float):
    """Create CrossEntropyLoss; falls back if older torch lacks label_smoothing."""
    try:
        return nn.CrossEntropyLoss(label_smoothing=float(label_smoothing_value))
    except TypeError:
        return nn.CrossEntropyLoss()

def train_model(model, train_loader, val_loader, criterion, optimizer,
                scheduler, num_epochs, device, patience=10, log_prefix=""):
    """
    Train with early stopping on validation loss and ReduceLROnPlateau scheduler.
    Returns history dict with curves.
    """
    history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
    best_val_loss = float('inf')
    patience_counter = 0
    best_state = None

    print(f"{log_prefix}[Train] epochs={num_epochs}, batch_size={train_loader.batch_size}, "
          f"lr={current_lr(optimizer):.6f}, patience={patience}")

    for epoch in range(1, num_epochs + 1):
        tr_loss, tr_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_acc = evaluate(model, val_loader, criterion, device)

        # Scheduler step with explicit LR-change log (no 'verbose' arg)
        lr_before = current_lr(optimizer)
        scheduler.step(val_loss)
        lr_after = current_lr(optimizer)
        lr_note = ""
        if lr_after < lr_before:
            lr_note = f" (LR reduced from {lr_before:.6f} to {lr_after:.6f})"

        history['train_loss'].append(tr_loss)
        history['train_acc'].append(tr_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        print(f"{log_prefix}Epoch {epoch:3d}/{num_epochs} | "
              f"Train Loss: {tr_loss:.4f} | Train Acc: {tr_acc:.4f} | "
              f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f} | "
              f"LR: {current_lr(optimizer):.6f}{lr_note}")

        # Track best weights by validation loss
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
        else:
            patience_counter += 1

        if patience_counter >= patience:
            print(f"{log_prefix}[EarlyStopping] triggered at epoch {epoch}")
            break

    if best_state is not None:
        model.load_state_dict(best_state)  # restore best weights
        print(f"{log_prefix}[Info] Loaded best model weights")

    return history

# -----------------------------
# Optuna Objective
# -----------------------------
def objective(trial: optuna.trial.Trial) -> float:
    """Define search space, train/validate, and return validation accuracy."""
    # Hyperparameter search space
    embedding_dim = trial.suggest_categorical("embedding_dim", [32, 64, 128])
    hidden_size   = trial.suggest_categorical("hidden_size",   [32, 64, 96, 128, 160])
    num_layers    = trial.suggest_categorical("num_layers",    [1, 2, 3])
    dropout       = trial.suggest_float("dropout", 0.1, 0.6)
    batch_size    = trial.suggest_categorical("batch_size",    [16, 24, 32, 48, 64])
    learning_rate = trial.suggest_float("learning_rate", 1e-4, 5e-3, log=True)
    weight_decay  = trial.suggest_float("weight_decay", 1e-6, 1e-3, log=True)
    label_smooth  = trial.suggest_float("label_smoothing", 0.0, 0.2)
    tune_epochs   = trial.suggest_int("tune_epochs", 25, 60)
    patience      = 10  # early-stopping patience during tuning

    tnum = trial.number + 1
    header = f"[Trial {tnum:02d}/{N_TRIALS}] "

    print("\n" + "-" * 70)
    print(f"{header}START")
    print(f"{header}Params: embedding_dim={embedding_dim}, hidden_size={hidden_size}, "
          f"num_layers={num_layers}, dropout={dropout:.2f}, batch_size={batch_size}, "
          f"lr={learning_rate:.6f}, weight_decay={weight_decay:.1e}, "
          f"label_smoothing={label_smooth:.2f}, tune_epochs={tune_epochs}")

    # DataLoaders for this trial
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader   = DataLoader(val_dataset,   batch_size=batch_size, shuffle=False)

    # Model instantiation per trial
    model = RNNClassifier(
        vocab_size=len(vocab),
        embedding_dim=embedding_dim,
        hidden_size=hidden_size,
        output_size=len(class_names),
        num_layers=num_layers,
        dropout=dropout
    ).to(device)

    # Loss/optimizer/scheduler
    criterion = make_criterion(label_smooth)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    # Important: no 'verbose' argument for compatibility
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=5)

    # Train (verbose per-epoch with prefix)
    _ = train_model(
        model, train_loader, val_loader, criterion, optimizer, scheduler,
        num_epochs=int(tune_epochs), device=device, patience=patience, log_prefix=header
    )

    # Validation accuracy (objective to maximize)
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    print(f"{header}Done | Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")

    trial.set_user_attr("val_acc", float(val_acc))  # store for analysis
    return float(val_acc)

def trial_callback(total_trials: int):
    """Progress callback to print per-trial summary."""
    def _cb(study: optuna.study.Study, trial: optuna.trial.FrozenTrial):
        best = study.best_value if study.best_trial is not None else None
        print(f"[Optuna] Completed Trial {trial.number + 1}/{total_trials} | "
              f"Value: {trial.value:.4f} | Best so far: {best:.4f}")
    return _cb

# -----------------------------
# Run Optuna
# -----------------------------
hr("OPTUNA STUDY (20 TRIALS)")
sampler = optuna.samplers.TPESampler(seed=GLOBAL_SEED)  # TPE sampler with fixed seed
study = optuna.create_study(direction="maximize", sampler=sampler)  # maximize val accuracy
study.optimize(objective, n_trials=N_TRIALS, callbacks=[trial_callback(N_TRIALS)], show_progress_bar=True)

print("\n[Optuna] Best trial:")
best_trial = study.best_trial
print(f"  Value (Val Acc): {best_trial.value:.4f}")
print("  Params:")
for k, v in best_trial.params.items():
    print(f"    {k}: {v}")

# Save trials dataframe
try:
    df_trials = study.trials_dataframe()
    df_trials.to_csv("optuna_trials.csv", index=False)
    print("[Output] Saved optuna_trials.csv")
except Exception as e:
    print(f"[Warn] Could not save trials dataframe: {e}")

# Save best params for reproducibility
with open("optuna_best_params.json", "w", encoding="utf-8") as f:
    json.dump(best_trial.params, f, indent=2)
print("[Output] Saved optuna_best_params.json")

# -----------------------------
# Final training with best params
# -----------------------------
hr("FINAL TRAINING WITH BEST HYPERPARAMETERS")

bp = best_trial.params  # shorthand
embedding_dim = int(bp["embedding_dim"])
hidden_size   = int(bp["hidden_size"])
num_layers    = int(bp["num_layers"])
dropout       = float(bp["dropout"])
batch_size    = int(bp["batch_size"])
learning_rate = float(bp["learning_rate"])
weight_decay  = float(bp["weight_decay"])
label_smooth  = float(bp["label_smoothing"])
num_epochs    = FINAL_MAX_EPOCHS
patience      = 20  # longer patience for final fit

print("[Final Config]")
print(f"  Vocab size:       {len(vocab)}")
print(f"  Embedding dim:    {embedding_dim}")
print(f"  Hidden size:      {hidden_size}")
print(f"  Num layers:       {num_layers}")
print(f"  Dropout:          {dropout}")
print(f"  Output size:      {len(class_names)}")
print(f"  Batch size:       {batch_size}")
print(f"  Learning rate:    {learning_rate}")
print(f"  Weight decay:     {weight_decay}")
print(f"  Label smoothing:  {label_smooth}")
print(f"  Max epochs:       {num_epochs}")
print(f"  Sequence length:  {SEQ_LEN}")

# Build final DataLoaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader   = DataLoader(val_dataset,   batch_size=batch_size, shuffle=False)
test_loader  = DataLoader(test_dataset,  batch_size=batch_size, shuffle=False)

# Final model with best hyperparameters
model = RNNClassifier(
    vocab_size=len(vocab),
    embedding_dim=embedding_dim,
    hidden_size=hidden_size,
    output_size=len(class_names),
    num_layers=num_layers,
    dropout=dropout
).to(device)

print("\nModel architecture:")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

criterion = make_criterion(label_smooth)
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
# Important: no 'verbose' argument for compatibility
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=5)

# Train and collect history for plotting
history = train_model(
    model, train_loader, val_loader, criterion, optimizer,
    scheduler, num_epochs, device, patience=patience, log_prefix="[Final] "
)

# -----------------------------
# Evaluation
# -----------------------------
hr("MODEL EVALUATION")
train_loss, train_acc = evaluate(model, train_loader, criterion, device)
val_loss, val_acc     = evaluate(model, val_loader,   criterion, device)
test_loss, test_acc   = evaluate(model, test_loader,  criterion, device)

print("\nFinal Accuracy Scores:")
print(f"  Training Accuracy:   {train_acc:.4f} ({train_acc*100:.2f}%)")
print(f"  Validation Accuracy: {val_acc:.4f} ({val_acc*100:.2f}%)")
print(f"  Test Accuracy:       {test_acc:.4f} ({test_acc*100:.2f}%)")

# Collect test predictions (for report/plots)
model.eval()
all_predictions, all_labels = [], []
with torch.no_grad():
    for sequences, labels in test_loader:
        sequences = sequences.to(device)
        logits = model(sequences)
        _, pred = torch.max(logits, 1)
        all_predictions.extend(pred.cpu().numpy())
        all_labels.extend(labels.numpy())

y_test_pred = np.array(all_predictions)
y_test_true = np.array(all_labels)

print("\n" + PRINT_LINE)
print("DETAILED CLASSIFICATION REPORT (Test Set)")
print(PRINT_LINE + "\n")
target_names = [class_names[i] for i in range(len(class_names))]
print(classification_report(y_test_true, y_test_pred, target_names=target_names))

# -----------------------------
# Visualizations
# -----------------------------
hr("PLOTTING & SAVING FIGURES")
print("[Plot] Training/Validation curves + Confusion Matrix -> rnn_pytorch_training_results.png")

plt.figure(figsize=(15, 5))
# Loss curves
plt.subplot(1, 3, 1)
plt.plot(history['train_loss'], label='Training Loss', linewidth=2)
plt.plot(history['val_loss'],   label='Validation Loss', linewidth=2)
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Training and Validation Loss')
plt.legend(); plt.grid(True, alpha=0.3)

# Accuracy curves
plt.subplot(1, 3, 2)
plt.plot(history['train_acc'], label='Training Accuracy', linewidth=2)
plt.plot(history['val_acc'],   label='Validation Accuracy', linewidth=2)
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Training and Validation Accuracy')
plt.legend(); plt.grid(True, alpha=0.3)

# Confusion matrix on test set
plt.subplot(1, 3, 3)
cm = confusion_matrix(y_test_true, y_test_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=target_names, yticklabels=target_names)
plt.xlabel('Predicted'); plt.ylabel('True'); plt.title('Confusion Matrix (Test Set)')
plt.xticks(rotation=45, ha='right'); plt.yticks(rotation=0)
plt.tight_layout()
plt.savefig('rnn_pytorch_training_results.png', dpi=300, bbox_inches='tight')
plt.close()

print("[Plot] Per-class accuracy -> rnn_pytorch_per_class_accuracy.png")
plt.figure(figsize=(10, 6))
per_class_acc = []
for i in range(len(class_names)):
    mask = (y_test_true == i)
    acc = np.mean(y_test_pred[mask] == y_test_true[mask]) if np.sum(mask) > 0 else 0.0
    per_class_acc.append(acc)
bars = plt.bar(range(len(class_names)), per_class_acc)
plt.xlabel('Class'); plt.ylabel('Accuracy'); plt.title('Per-Class Accuracy on Test Set')
plt.xticks(range(len(class_names)), target_names, rotation=45, ha='right')
plt.ylim([0, 1.1]); plt.grid(True, alpha=0.3, axis='y')
for bar, acc in zip(bars, per_class_acc):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
             f'{acc:.3f}', ha='center', va='bottom', fontsize=10, fontweight='bold')
plt.tight_layout()
plt.savefig('rnn_pytorch_per_class_accuracy.png', dpi=300, bbox_inches='tight')
plt.close()

print("[Output] Saved: rnn_pytorch_training_results.png, rnn_pytorch_per_class_accuracy.png")

# -----------------------------
# Sample predictions
# -----------------------------
hr("SAMPLE PREDICTIONS")
np.random.seed(GLOBAL_SEED)
k = min(5, len(X_test))  # show up to 5 samples
sample_indices = np.random.choice(len(X_test), size=k, replace=False)
model.eval()
with torch.no_grad():
    for idx in sample_indices:
        seq_tensor = torch.LongTensor(X_test[idx]).unsqueeze(0).to(device)  # (1, T)
        logits = model(seq_tensor)
        probs = torch.softmax(logits, dim=1).cpu().numpy()[0]
        pred_label = int(np.argmax(probs))
        true_label = int(y_test[idx])

        # reconstruct a short text preview (ignore PAD tokens)
        tokens = [inverse_vocab.get(int(tok), "<UNK>") for tok in X_test[idx] if int(tok) != vocab['<PAD>']]
        text_preview = " ".join(tokens[:20]) if tokens else "N/A"

        print(f"Text preview: {text_preview}...")
        print(f"True Label: {class_names[true_label]}")
        print(f"Predicted:  {class_names[pred_label]}")
        print(f"Confidence: {probs[pred_label]*100:.2f}%")
        print("Result: " + ("Correct" if true_label == pred_label else "Incorrect"))
        print("-" * 70)

# -----------------------------
# Save model
# -----------------------------
print("\nSaving model and metadata ...")
torch.save({
    'model_state_dict': model.state_dict(),  # learned weights
    'vocab': vocab,                          # token->id mapping
    'class_names': class_names,              # id->class mapping
    'hyperparameters': {                     # training configuration
        'embedding_dim': embedding_dim,
        'hidden_size': hidden_size,
        'num_layers': num_layers,
        'dropout': dropout,
        'batch_size': batch_size,
        'learning_rate': learning_rate,
        'weight_decay': weight_decay,
        'label_smoothing': label_smooth,
        'seq_len': SEQ_LEN,
        'vocab_size': len(vocab)
    }
}, 'rnn_pytorch_model.pt')
print("Saved model as: rnn_pytorch_model.pt")

print("\n" + PRINT_LINE)
print("TRAINING COMPLETED SUCCESSFULLY!")
print(PRINT_LINE)
print(f"\nFinal Test Accuracy: {test_acc*100:.2f}%")
print(f"Device used: {device}")


Using CUDA GPU: Tesla T4
Device: cuda


LOADING DATASET
Dataset shape: (2225, 2)
Columns: ['Text', 'Label']
Number of classes: 5

Label distribution:
Label
0    417
1    511
2    401
3    386
4    510
Name: count, dtype: int64

Class mapping:
  0: Politics (417 samples)
  1: Sport (511 samples)
  2: Technology (401 samples)
  3: Entertainment (386 samples)
  4: Business (510 samples)

PREPROCESSING
[Step] Lowercasing and dropping empty rows ...
[Done] Dataset shape after preprocessing: (2225, 2)

VOCABULARY
[Step] Counting token frequencies ...
[Info] Total unique tokens: 60616
[Step] Building vocab size=5000 with <PAD>=0, <UNK>=1 ...
[Done] Vocab size: 5000 | Coverage: 8.25%

SEQUENCE ENCODING
[Info] Sequence length: 50
[Step] Converting texts to index sequences ...


[I 2025-11-10 12:15:37,834] A new study created in memory with name: no-name-65c89984-5b88-4c82-ba52-c4fb69512272


[Done] X_sequences shape: (2225, 50)
[Done] y_labels shape:  (2225,)

DATA SPLIT
[Info] Training set:   1780 samples (80.0%)
[Info] Validation set: 222 samples (10.0%)
[Info] Test set:       223 samples (10.0%)

OPTUNA STUDY (20 TRIALS)


  0%|          | 0/20 [00:00<?, ?it/s]


----------------------------------------------------------------------
[Trial 01/20] START
[Trial 01/20] Params: embedding_dim=64, hidden_size=160, num_layers=2, dropout=0.58, batch_size=16, lr=0.000779, weight_decay=2.0e-05, label_smoothing=0.06, tune_epochs=47
[Trial 01/20] [Train] epochs=47, batch_size=16, lr=0.000779, patience=10
[Trial 01/20] Epoch   1/47 | Train Loss: 1.6375 | Train Acc: 0.2112 | Val Loss: 1.6080 | Val Acc: 0.2432 | LR: 0.000779
[Trial 01/20] Epoch   2/47 | Train Loss: 1.5669 | Train Acc: 0.2978 | Val Loss: 1.6171 | Val Acc: 0.2748 | LR: 0.000779
[Trial 01/20] Epoch   3/47 | Train Loss: 1.4799 | Train Acc: 0.3764 | Val Loss: 1.6572 | Val Acc: 0.2838 | LR: 0.000779
[Trial 01/20] Epoch   4/47 | Train Loss: 1.3715 | Train Acc: 0.4562 | Val Loss: 1.5620 | Val Acc: 0.3108 | LR: 0.000779
[Trial 01/20] Epoch   5/47 | Train Loss: 1.2385 | Train Acc: 0.5258 | Val Loss: 1.5671 | Val Acc: 0.3919 | LR: 0.000779
[Trial 01/20] Epoch   6/47 | Train Loss: 1.1619 | Train Acc: 0.