# 2. Baseline Model: HOG Features + SVM

## Overview

Before diving into deep learning, we'll build a **traditional machine learning baseline** using:

1. **HOG (Histogram of Oriented Gradients)** - Hand-crafted feature extraction
2. **SVM (Support Vector Machine)** - Classic classifier

### Why Build a Baseline First?

1. **Benchmark**: We need something to compare our fancy deep learning models against
2. **Sanity check**: If deep learning doesn't beat HOG+SVM, something is wrong
3. **Understanding**: Helps us appreciate what deep learning automates
4. **Faster iteration**: HOG+SVM trains in minutes, not hours

### What is HOG?

HOG captures the **distribution of gradient directions** in local regions of an image:

1. Divide the image into small cells (e.g., 8x8 pixels)
2. Compute gradient direction and magnitude for each pixel
3. Create a histogram of gradient directions for each cell
4. Normalize across larger blocks for illumination invariance
5. Concatenate all histograms into a single feature vector

**Key insight**: Facial expressions involve muscle movements that create distinct edge/gradient patterns!

## Step 1: Import Libraries

In [None]:
# Standard imports
import numpy as np
import pandas as pd
from pathlib import Path
import pickle
from collections import Counter

# Image processing
from PIL import Image
from skimage.feature import hog
from skimage import exposure

# Machine learning
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import (
    classification_report, confusion_matrix, accuracy_score,
    precision_recall_fscore_support
)

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Progress bar
from tqdm import tqdm

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

## Step 2: Load Processed Data

We'll use the data splits from Notebook 1.

In [None]:
# Load processed data from notebook 1
try:
    with open('processed_data.pkl', 'rb') as f:
        data = pickle.load(f)
    
    train_df = data['train_df']
    val_df = data['val_df']
    test_df = data['test_df']
    EMOTION_CLASSES = data['emotion_classes']
    EMOTION_TO_IDX = data['emotion_to_idx']
    IDX_TO_EMOTION = data['idx_to_emotion']
    
    print("Data loaded from processed_data.pkl")
    print(f"Training samples: {len(train_df)}")
    print(f"Test samples: {len(test_df)}")
    
except FileNotFoundError:
    print("Please run Notebook 1 first to generate processed_data.pkl")
    raise

In [None]:
# For baseline, we combine train and val sets (we'll use cross-validation instead)
# This gives us more training data
train_full_df = pd.concat([train_df, val_df], ignore_index=True)

print(f"Full training set: {len(train_full_df)} samples")
print(f"Test set: {len(test_df)} samples")

## Step 3: Understanding HOG Feature Extraction

Let's visualize what HOG features look like on a sample image.

In [None]:
# HOG parameters
# These control how features are extracted

HOG_PARAMS = {
    'orientations': 9,         # Number of gradient direction bins (0-180 degrees)
    'pixels_per_cell': (8, 8), # Cell size for local histograms
    'cells_per_block': (2, 2), # Cells per normalization block
    'block_norm': 'L2-Hys',    # Block normalization method
    'visualize': True,         # Return visualization image
    'feature_vector': True     # Return as 1D vector
}

# Image size for baseline (smaller = faster)
BASELINE_IMG_SIZE = 64

print("HOG Parameters:")
for key, value in HOG_PARAMS.items():
    print(f"  {key}: {value}")

In [None]:
def load_and_preprocess_image(image_path, size=BASELINE_IMG_SIZE):
    """
    Load an image and preprocess it for HOG extraction.
    
    Steps:
    1. Load image
    2. Convert to grayscale (HOG works on intensity gradients)
    3. Resize to fixed size
    4. Convert to numpy array
    
    Args:
        image_path: Path to the image file
        size: Target size (width = height)
        
    Returns:
        Grayscale image as 2D numpy array
    """
    # Load image
    img = Image.open(image_path)
    
    # Convert to grayscale
    # L = 0.299*R + 0.587*G + 0.114*B (perceptual luminance)
    img = img.convert('L')
    
    # Resize to fixed size
    img = img.resize((size, size))
    
    # Convert to numpy array and normalize to [0, 1]
    img_array = np.array(img) / 255.0
    
    return img_array

# Test on a sample image
sample_path = train_df.iloc[0]['image_path']
sample_img = load_and_preprocess_image(sample_path)

print(f"Sample image shape: {sample_img.shape}")
print(f"Value range: [{sample_img.min():.3f}, {sample_img.max():.3f}]")

In [None]:
# Visualize HOG features
def visualize_hog(image, title=""):
    """
    Extract and visualize HOG features for an image.
    """
    # Extract HOG features with visualization
    features, hog_image = hog(
        image,
        orientations=HOG_PARAMS['orientations'],
        pixels_per_cell=HOG_PARAMS['pixels_per_cell'],
        cells_per_block=HOG_PARAMS['cells_per_block'],
        block_norm=HOG_PARAMS['block_norm'],
        visualize=True,
        feature_vector=True
    )
    
    # Rescale HOG image for better visualization
    hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 10))
    
    # Create visualization
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    
    axes[0].imshow(image, cmap='gray')
    axes[0].set_title('Original (Grayscale)')
    axes[0].axis('off')
    
    axes[1].imshow(hog_image_rescaled, cmap='gray')
    axes[1].set_title('HOG Visualization')
    axes[1].axis('off')
    
    # Show feature vector histogram
    axes[2].hist(features, bins=50, color='steelblue', edgecolor='white')
    axes[2].set_xlabel('Feature Value')
    axes[2].set_ylabel('Count')
    axes[2].set_title(f'Feature Distribution (n={len(features)})')
    
    if title:
        plt.suptitle(title, fontsize=14, y=1.02)
    
    plt.tight_layout()
    plt.show()
    
    return features

# Visualize for sample image
sample_features = visualize_hog(sample_img, title="HOG Feature Extraction")
print(f"\nFeature vector length: {len(sample_features)}")

In [None]:
# Compare HOG features across different emotions
fig, axes = plt.subplots(len(EMOTION_CLASSES), 2, figsize=(8, 3*len(EMOTION_CLASSES)))

for i, emotion in enumerate(EMOTION_CLASSES):
    # Get a sample image for this emotion
    sample = train_df[train_df['emotion'] == emotion].iloc[0]
    img = load_and_preprocess_image(sample['image_path'])
    
    # Extract HOG
    features, hog_image = hog(
        img,
        orientations=HOG_PARAMS['orientations'],
        pixels_per_cell=HOG_PARAMS['pixels_per_cell'],
        cells_per_block=HOG_PARAMS['cells_per_block'],
        block_norm=HOG_PARAMS['block_norm'],
        visualize=True,
        feature_vector=True
    )
    hog_image = exposure.rescale_intensity(hog_image, in_range=(0, 10))
    
    # Plot
    axes[i, 0].imshow(img, cmap='gray')
    axes[i, 0].set_ylabel(emotion.upper(), fontsize=12, rotation=0, ha='right', va='center')
    axes[i, 0].axis('off')
    
    axes[i, 1].imshow(hog_image, cmap='gray')
    axes[i, 1].axis('off')

axes[0, 0].set_title('Original', fontsize=12)
axes[0, 1].set_title('HOG Features', fontsize=12)

plt.suptitle('HOG Features by Emotion Class', fontsize=14, y=1.01)
plt.tight_layout()
plt.show()

### Observations:

HOG captures the gradient structure of faces:
- **Eyes and eyebrows** create strong horizontal/vertical gradients
- **Mouth** shape affects gradients in lower face
- **Different expressions** produce different gradient patterns

However, HOG is a **fixed** feature extractor - it can't learn what's most important for our specific task. Deep learning can!

## Step 4: Extract Features for All Images

Now we'll extract HOG features from all training and test images.

In [None]:
def extract_hog_features(image):
    """
    Extract HOG features from a preprocessed grayscale image.
    
    Args:
        image: 2D numpy array (grayscale image)
        
    Returns:
        1D feature vector
    """
    features = hog(
        image,
        orientations=HOG_PARAMS['orientations'],
        pixels_per_cell=HOG_PARAMS['pixels_per_cell'],
        cells_per_block=HOG_PARAMS['cells_per_block'],
        block_norm=HOG_PARAMS['block_norm'],
        visualize=False,
        feature_vector=True
    )
    return features


def extract_features_from_dataframe(df, desc="Extracting features"):
    """
    Extract HOG features from all images in a dataframe.
    
    Args:
        df: DataFrame with 'image_path' and 'label' columns
        desc: Description for progress bar
        
    Returns:
        X: Feature matrix (n_samples, n_features)
        y: Label vector (n_samples,)
    """
    features_list = []
    labels_list = []
    
    for _, row in tqdm(df.iterrows(), total=len(df), desc=desc):
        # Load and preprocess image
        img = load_and_preprocess_image(row['image_path'])
        
        # Extract HOG features
        features = extract_hog_features(img)
        
        features_list.append(features)
        labels_list.append(row['label'])
    
    return np.array(features_list), np.array(labels_list)

In [None]:
# Extract features from training and test sets
print("Extracting features from training set...")
X_train, y_train = extract_features_from_dataframe(train_full_df, "Training set")

print("\nExtracting features from test set...")
X_test, y_test = extract_features_from_dataframe(test_df, "Test set")

print("\n" + "=" * 60)
print("FEATURE EXTRACTION COMPLETE")
print("=" * 60)
print(f"\nTraining features shape: {X_train.shape}")
print(f"Test features shape: {X_test.shape}")
print(f"\nFeature vector size: {X_train.shape[1]}")

## Step 5: Feature Scaling

SVM is sensitive to feature scales. Features with larger values can dominate the decision.

We'll use **StandardScaler** to normalize features to zero mean and unit variance.

**Important**: Fit the scaler on training data only, then apply to test data!

In [None]:
# Create and fit scaler on training data
scaler = StandardScaler()

# Fit on training data and transform
X_train_scaled = scaler.fit_transform(X_train)

# Transform test data (using training statistics!)
X_test_scaled = scaler.transform(X_test)

print("Feature scaling complete!")
print(f"\nBefore scaling:")
print(f"  Training mean: {X_train.mean():.4f}, std: {X_train.std():.4f}")
print(f"\nAfter scaling:")
print(f"  Training mean: {X_train_scaled.mean():.4f}, std: {X_train_scaled.std():.4f}")

## Step 6: Train SVM Classifier

### About SVM:

Support Vector Machine finds the optimal hyperplane that separates classes with maximum margin.

**Key parameters:**
- **kernel**: Transformation applied to features. RBF (Radial Basis Function) can handle non-linear boundaries.
- **C**: Regularization parameter. Higher = less regularization, might overfit.
- **gamma**: RBF kernel parameter. Higher = more complex boundaries.

We'll use `class_weight='balanced'` to handle class imbalance.

In [None]:
# Create SVM classifier
svm_classifier = SVC(
    kernel='rbf',           # Radial Basis Function kernel
    C=10,                   # Regularization parameter
    gamma='scale',          # 1 / (n_features * X.var())
    class_weight='balanced', # Handle class imbalance
    random_state=42,
    verbose=True
)

print("SVM Classifier Configuration:")
print(f"  Kernel: {svm_classifier.kernel}")
print(f"  C (regularization): {svm_classifier.C}")
print(f"  Gamma: {svm_classifier.gamma}")
print(f"  Class weight: {svm_classifier.class_weight}")

In [None]:
# Train the classifier
print("Training SVM classifier...")
print("(This may take a few minutes)\n")

svm_classifier.fit(X_train_scaled, y_train)

print("\nTraining complete!")
print(f"Number of support vectors: {svm_classifier.n_support_.sum()}")
print(f"Support vectors per class: {dict(zip(EMOTION_CLASSES, svm_classifier.n_support_))}")

## Step 7: Evaluate the Model

Let's see how well our baseline performs!

In [None]:
# Make predictions
y_train_pred = svm_classifier.predict(X_train_scaled)
y_test_pred = svm_classifier.predict(X_test_scaled)

# Calculate accuracies
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

print("=" * 60)
print("BASELINE MODEL RESULTS")
print("=" * 60)
print(f"\nTraining Accuracy: {train_accuracy:.4f} ({train_accuracy*100:.2f}%)")
print(f"Test Accuracy:     {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

# Check for overfitting
gap = train_accuracy - test_accuracy
print(f"\nGeneralization gap: {gap:.4f}")
if gap > 0.1:
    print("Warning: Model may be overfitting (large gap between train and test)")
else:
    print("Good: Model generalizes well")

In [None]:
# Detailed classification report
print("\n" + "=" * 60)
print("CLASSIFICATION REPORT (Test Set)")
print("=" * 60 + "\n")

print(classification_report(y_test, y_test_pred, target_names=EMOTION_CLASSES))

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_test_pred)
cm_normalized = cm.astype('float') / cm.sum(axis=1, keepdims=True)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Raw counts
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=EMOTION_CLASSES, yticklabels=EMOTION_CLASSES,
            ax=axes[0])
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('True')
axes[0].set_title('Confusion Matrix (Counts)')

# Normalized
sns.heatmap(cm_normalized, annot=True, fmt='.1%', cmap='Blues',
            xticklabels=EMOTION_CLASSES, yticklabels=EMOTION_CLASSES,
            ax=axes[1])
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('True')
axes[1].set_title('Confusion Matrix (Normalized)')

plt.suptitle('HOG + SVM Baseline Model', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

In [None]:
# Per-class performance visualization
precision, recall, f1, support = precision_recall_fscore_support(
    y_test, y_test_pred, average=None
)

# Create bar chart
x = np.arange(len(EMOTION_CLASSES))
width = 0.25

fig, ax = plt.subplots(figsize=(12, 5))

bars1 = ax.bar(x - width, precision, width, label='Precision', color='steelblue')
bars2 = ax.bar(x, recall, width, label='Recall', color='darkorange')
bars3 = ax.bar(x + width, f1, width, label='F1-Score', color='forestgreen')

ax.set_xlabel('Emotion Class')
ax.set_ylabel('Score')
ax.set_title('Per-Class Performance (HOG + SVM Baseline)')
ax.set_xticks(x)
ax.set_xticklabels(EMOTION_CLASSES, rotation=45, ha='right')
ax.legend()
ax.set_ylim(0, 1.0)
ax.grid(True, alpha=0.3, axis='y')

# Add value labels
for bars in [bars1, bars2, bars3]:
    for bar in bars:
        height = bar.get_height()
        ax.annotate(f'{height:.2f}',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3), textcoords="offset points",
                    ha='center', va='bottom', fontsize=8)

plt.tight_layout()
plt.show()

## Step 8: Error Analysis

Let's look at some misclassified examples to understand where the model struggles.

In [None]:
# Find misclassified samples
misclassified_idx = np.where(y_test != y_test_pred)[0]

print(f"Total misclassified: {len(misclassified_idx)} out of {len(y_test)} ({100*len(misclassified_idx)/len(y_test):.1f}%)")

# Analyze common mistakes
mistakes = []
for idx in misclassified_idx:
    true_label = IDX_TO_EMOTION[y_test[idx]]
    pred_label = IDX_TO_EMOTION[y_test_pred[idx]]
    mistakes.append((true_label, pred_label))

mistake_counts = Counter(mistakes)

print("\nMost Common Mistakes (True -> Predicted):")
for (true_label, pred_label), count in mistake_counts.most_common(10):
    print(f"  {true_label:12s} -> {pred_label:12s}: {count} times")

In [None]:
# Visualize some misclassified examples
test_df_reset = test_df.reset_index(drop=True)

n_show = min(8, len(misclassified_idx))
sample_misclassified = np.random.choice(misclassified_idx, n_show, replace=False)

fig, axes = plt.subplots(2, n_show//2, figsize=(15, 6))
axes = axes.flatten()

for i, idx in enumerate(sample_misclassified):
    # Load original image
    img = Image.open(test_df_reset.iloc[idx]['image_path'])
    
    true_label = IDX_TO_EMOTION[y_test[idx]]
    pred_label = IDX_TO_EMOTION[y_test_pred[idx]]
    
    axes[i].imshow(img)
    axes[i].set_title(f'True: {true_label}\nPred: {pred_label}', 
                      fontsize=10, color='red')
    axes[i].axis('off')

plt.suptitle('Misclassified Examples', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

## Step 9: Cross-Validation (Optional)

For a more robust estimate of performance, let's use k-fold cross-validation.

In [None]:
# 5-fold cross-validation
print("Running 5-fold cross-validation...")
print("(This may take several minutes)\n")

cv_scores = cross_val_score(
    SVC(kernel='rbf', C=10, gamma='scale', class_weight='balanced', random_state=42),
    X_train_scaled,
    y_train,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

print("Cross-Validation Results:")
print(f"  Fold scores: {cv_scores}")
print(f"  Mean accuracy: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")

## Step 10: Save the Model

Let's save our trained model for future use.

In [None]:
# Save model and scaler
baseline_model = {
    'classifier': svm_classifier,
    'scaler': scaler,
    'hog_params': HOG_PARAMS,
    'img_size': BASELINE_IMG_SIZE,
    'test_accuracy': test_accuracy,
    'emotion_classes': EMOTION_CLASSES
}

with open('baseline_model.pkl', 'wb') as f:
    pickle.dump(baseline_model, f)

print("Model saved to baseline_model.pkl")

## Summary

### What We Built:
- **Feature Extractor**: HOG (Histogram of Oriented Gradients)
- **Classifier**: SVM with RBF kernel

### Results:
| Metric | Value |
|--------|-------|
| Training Accuracy | ~XX% |
| Test Accuracy | ~XX% |

### Key Insights:

1. **HOG captures gradient structure** but can't learn task-specific features
2. **Some emotions are harder** (fear, disgust often confused with others)
3. **Class imbalance hurts** minority class performance
4. **This is our baseline** - deep learning should beat this!

### Next Steps:
- **Notebook 3**: Build a custom CNN that learns features automatically
- **Notebook 4**: Use transfer learning for even better results

In [None]:
# Save results for comparison in later notebooks
baseline_results = {
    'model_name': 'HOG + SVM',
    'train_accuracy': train_accuracy,
    'test_accuracy': test_accuracy,
    'cv_scores': cv_scores.tolist() if 'cv_scores' in dir() else None,
    'confusion_matrix': cm.tolist(),
    'per_class_f1': f1.tolist()
}

with open('baseline_results.pkl', 'wb') as f:
    pickle.dump(baseline_results, f)

print("Results saved for comparison!")