<a href="https://colab.research.google.com/github/yadavroshankumar/Chatbot-for-university/blob/main/CSIT599_module2_CNN_and_Advanced_Computer_Vision_Exercise_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSIT 599 Module 2 - CNN and Advanced Computer Vision

## CNN Architecture Evolution: From LeNet to ResNet
### Exercise for Students - SEQUENTIAL LEARNING APPROACH

This exercise demonstrates the evolution of CNN architectures:
1. Traditional CNN (Basic vanilla CNN)
2. LeNet-5 (1998) - The pioneer
3. AlexNet (2012) - Deep learning breakthrough
4. VGG-16 (2014) - Deep and uniform
5. Inception/GoogLeNet (2014) - Multi-scale features
6. ResNet (2015) - Skip connections revolution

Dataset: CIFAR-10 (32x32 color images, 10 classes)
- airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck

Instructions:
1. Fill in the blanks marked with "# TODO: STUDENT FILL IN"
2. Run each architecture sequentially
3. Compare performance and training characteristics
4. Understand key innovations in each architecture

Key Concepts:
- Depth vs Width
- Skip connections (ResNet)
- Multi-scale features (Inception)
- Architectural innovations over time


In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
import time
from sklearn.metrics import classification_report

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

print("TensorFlow version:", tf.__version__)
print("GPU Available:", tf.config.list_physical_devices('GPU'))

## STEP 1: DATA PREPARATION - CIFAR-10

In [None]:
# ============================================================================
# STEP 1: DATA PREPARATION - CIFAR-10
# ============================================================================

def load_and_prepare_cifar10():
    """
    Load and preprocess CIFAR-10 dataset.

    CIFAR-10 contains 60,000 32x32 color images in 10 classes:
    - 50,000 training images
    - 10,000 test images

    Returns:
        tuple: (x_train, y_train, x_test, y_test, class_names)
    """
    print("STEP 1: LOADING CIFAR-10 DATASET")
    print("="*50)

    # Load CIFAR-10 dataset
    (x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

    # Class names
    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                   'dog', 'frog', 'horse', 'ship', 'truck']

    print(f"Training data shape: {x_train.shape}")
    print(f"Training labels shape: {y_train.shape}")
    print(f"Test data shape: {x_test.shape}")
    print(f"Number of classes: {len(class_names)}")
    print(f"Classes: {class_names}")

    # TODO: STUDENT FILL IN
    # Normalize pixel values to [0, 1] range
    x_train = x_train.astype('float32') / ________
    x_test = x_test.astype('float32') / ________

    # TODO: STUDENT FILL IN
    # Convert labels to categorical (one-hot encoding)
    y_train = keras.utils.to_categorical(y_train, ________)  # 10 classes
    y_test = keras.utils.to_categorical(y_test, ________)

    print(f"\nAfter preprocessing:")
    print(f"Training data shape: {x_train.shape}")
    print(f"Training labels shape: {y_train.shape}")
    print(f"Pixel value range: [{x_train.min():.2f}, {x_train.max():.2f}]")

    return x_train, y_train, x_test, y_test, class_names

def visualize_cifar10_samples(x_train, y_train, class_names):
    """Visualize sample images from CIFAR-10."""
    plt.figure(figsize=(12, 6))
    for i in range(20):
        plt.subplot(4, 5, i + 1)
        plt.imshow(x_train[i])
        plt.title(class_names[np.argmax(y_train[i])], fontsize=8)
        plt.axis('off')
    plt.suptitle('CIFAR-10 Sample Images')
    plt.tight_layout()
    plt.show()

In [None]:
# Load data
x_train, y_train, x_test, y_test, class_names = load_and_prepare_cifar10()
visualize_cifar10_samples(x_train, y_train, class_names)

## STEP 2: TRADITIONAL CNN (BASELINE)

In [None]:
# ============================================================================
# STEP 2: TRADITIONAL CNN (BASELINE)
# ============================================================================

def create_traditional_cnn():
    """
    Create a traditional/vanilla CNN as baseline.

    Simple architecture:
    - 3 Conv blocks (Conv2D + MaxPooling)
    - Dense layers
    - No special techniques

    Returns:
        keras.Model: Compiled traditional CNN
    """
    print("\nSTEP 2: BUILDING TRADITIONAL CNN (BASELINE)")
    print("="*50)

    model = keras.Sequential(name="Traditional_CNN")

    # TODO: STUDENT FILL IN
    # First Conv Block: 32 filters, (3,3) kernel, 'relu', input_shape=(32,32,3)
    model.add(layers.Conv2D(________, (3, 3), activation='________',
                           input_shape=(________, ________, ________), padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # TODO: STUDENT FILL IN
    # Second Conv Block: 64 filters, (3,3) kernel, 'relu'
    model.add(layers.Conv2D(________, (3, 3), activation='________', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # TODO: STUDENT FILL IN
    # Third Conv Block: 128 filters, (3,3) kernel, 'relu'
    model.add(layers.Conv2D(________, (3, 3), activation='________', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # TODO: STUDENT FILL IN
    # Flatten and Dense layers
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='________'))
    model.add(layers.Dense(________, activation='________'))  # 10 classes, softmax

    # TODO: STUDENT FILL IN
    # Compile: optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']
    model.compile(optimizer='________',
                 loss='________',
                 metrics=['________'])

    print("\nTraditional CNN Architecture:")
    model.summary()

    return model

## STEP 3: LeNet-5 (1998) - THE PIONEER

In [None]:
# ============================================================================
# STEP 3: LeNet-5 (1998) - THE PIONEER
# ============================================================================

def create_lenet5():
    """
    Create LeNet-5 architecture (adapted for CIFAR-10).

    Original LeNet-5 (1998) by Yann LeCun:
    - Designed for handwritten digit recognition (MNIST)
    - First successful CNN architecture
    - Used sigmoid/tanh activation (before ReLU era)

    Key features:
    - 2 Conv layers
    - Average pooling (we'll use MaxPooling for better performance)
    - Small network by modern standards

    Returns:
        keras.Model: Compiled LeNet-5 model
    """
    print("\nSTEP 3: BUILDING LeNet-5 (1998)")
    print("="*50)
    print("Innovation: First successful CNN architecture!")

    model = keras.Sequential(name="LeNet5")

    # First Conv Block (C1: 6 filters, 5x5)
    model.add(layers.Conv2D(6, (5, 5), activation='tanh',
                           input_shape=(32, 32, 3), padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # Second Conv Block (C3: 16 filters, 5x5)
    model.add(layers.Conv2D(16, (5, 5), activation='tanh'))
    model.add(layers.MaxPooling2D((2, 2)))

    # Fully connected layers
    model.add(layers.Flatten())
    model.add(layers.Dense(120, activation='tanh'))
    model.add(layers.Dense(84, activation='tanh'))
    model.add(layers.Dense(10, activation='softmax'))

    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

    print("\nLeNet-5 Architecture:")
    model.summary()
    print("Note: Using tanh activation (original) instead of ReLU")

    return model

## STEP 4: AlexNet (2012) - DEEP LEARNING BREAKTHROUGH

In [None]:
# ============================================================================
# STEP 4: AlexNet (2012) - DEEP LEARNING BREAKTHROUGH
# ============================================================================

def create_alexnet():
    """
    Create AlexNet architecture (adapted for CIFAR-10).

    Original AlexNet (2012) by Krizhevsky, Sutskever, Hinton:
    - Won ImageNet 2012 competition
    - Sparked the deep learning revolution
    - First to use ReLU activation
    - First to use dropout regularization
    - Originally designed for 224x224 images

    Key innovations:
    - ReLU activation (faster training)
    - Dropout for regularization
    - Local response normalization (we'll skip)
    - Overlapping pooling

    Returns:
        keras.Model: Compiled AlexNet model
    """
    print("\nSTEP 4: BUILDING AlexNet (2012)")
    print("="*50)
    print("Innovation: ReLU activation + Dropout + Deep network!")

    model = keras.Sequential(name="AlexNet")

    # Block 1
    model.add(layers.Conv2D(96, (3, 3), activation='relu',
                           input_shape=(32, 32, 3), padding='same'))
    model.add(layers.MaxPooling2D((2, 2), strides=2))

    # Block 2
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2), strides=2))

    # Block 3, 4, 5
    model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(384, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2), strides=2))

    # Fully connected layers with Dropout
    model.add(layers.Flatten())
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))  # KEY: Dropout introduced here!
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation='softmax'))

    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

    print("\nAlexNet Architecture:")
    model.summary()
    print("Key innovations: ReLU + Dropout!")

    return model

## STEP 5: VGG-16 (2014) - DEEP AND UNIFORM

In [None]:
# ============================================================================
# STEP 5: VGG-16 (2014) - DEEP AND UNIFORM
# ============================================================================

def create_vgg16():
    """
    Create VGG-16 architecture (simplified for CIFAR-10).

    Original VGG-16 (2014) by Simonyan and Zisserman:
    - Very deep network (16 weight layers)
    - Simple and uniform architecture
    - Only 3x3 convolutions throughout
    - Multiple conv layers before pooling

    Key innovations:
    - Depth is important
    - Small 3x3 filters work well
    - Uniform architecture (easy to understand)

    Returns:
        keras.Model: Compiled VGG-16 model
    """
    print("\nSTEP 5: BUILDING VGG-16 (2014)")
    print("="*50)
    print("Innovation: Very deep network + Uniform 3x3 filters!")

    model = keras.Sequential(name="VGG16")

    # Block 1
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                           input_shape=(32, 32, 3)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # Block 2
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # Block 3
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    # Fully connected layers
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation='softmax'))

    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

    print("\nVGG-16 Architecture:")
    model.summary()
    print("Key innovation: Very deep with uniform 3x3 convolutions!")

    return model

## STEP 6: Inception/GoogLeNet (2014) - MULTI-SCALE FEATURES

In [None]:
# ============================================================================
# STEP 6: Inception/GoogLeNet (2014) - MULTI-SCALE FEATURES
# ============================================================================

def inception_module(x, filters_1x1, filters_3x3_reduce, filters_3x3,
                     filters_5x5_reduce, filters_5x5, filters_pool_proj):
    """
    Create an Inception module.

    Inception module concept:
    - Process input at multiple scales simultaneously
    - 1x1, 3x3, 5x5 convolutions in parallel
    - Concatenate all outputs
    - 1x1 convs for dimensionality reduction

    This is the KEY innovation of Inception/GoogLeNet!
    """
    # 1x1 convolution branch
    conv_1x1 = layers.Conv2D(filters_1x1, (1, 1), padding='same', activation='relu')(x)

    # 3x3 convolution branch
    conv_3x3 = layers.Conv2D(filters_3x3_reduce, (1, 1), padding='same', activation='relu')(x)
    conv_3x3 = layers.Conv2D(filters_3x3, (3, 3), padding='same', activation='relu')(conv_3x3)

    # 5x5 convolution branch
    conv_5x5 = layers.Conv2D(filters_5x5_reduce, (1, 1), padding='same', activation='relu')(x)
    conv_5x5 = layers.Conv2D(filters_5x5, (5, 5), padding='same', activation='relu')(conv_5x5)

    # Max pooling branch
    pool_proj = layers.MaxPooling2D((3, 3), strides=(1, 1), padding='same')(x)
    pool_proj = layers.Conv2D(filters_pool_proj, (1, 1), padding='same', activation='relu')(pool_proj)

    # Concatenate all branches
    output = layers.concatenate([conv_1x1, conv_3x3, conv_5x5, pool_proj], axis=-1)

    return output

def create_inception():
    """
    Create Inception/GoogLeNet architecture (simplified for CIFAR-10).

    Original Inception/GoogLeNet (2014) by Szegedy et al:
    - Winner of ImageNet 2014
    - Introduced "Inception module"
    - Multi-scale feature extraction
    - Efficient: fewer parameters than VGG

    Key innovation:
    - Inception module: parallel convolutions at different scales
    - 1x1 convolutions for dimensionality reduction
    - Network in network concept

    Returns:
        keras.Model: Compiled Inception model
    """
    print("\nSTEP 6: BUILDING Inception/GoogLeNet (2014)")
    print("="*50)
    print("Innovation: Multi-scale feature extraction with Inception modules!")

    input_layer = layers.Input(shape=(32, 32, 3))

    # Initial convolution
    x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(input_layer)
    x = layers.MaxPooling2D((2, 2), strides=2)(x)

    # Inception modules
    x = inception_module(x, 64, 96, 128, 16, 32, 32)
    x = inception_module(x, 128, 128, 192, 32, 96, 64)
    x = layers.MaxPooling2D((2, 2), strides=2)(x)

    x = inception_module(x, 192, 96, 208, 16, 48, 64)

    # Global average pooling
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(0.4)(x)

    # Output layer
    output_layer = layers.Dense(10, activation='softmax')(x)

    model = models.Model(inputs=input_layer, outputs=output_layer, name="Inception")

    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

    print("\nInception Architecture:")
    model.summary()
    print("Key innovation: Parallel multi-scale convolutions!")

    return model

## STEP 7: ResNet (2015) - SKIP CONNECTIONS REVOLUTION

In [None]:
# ============================================================================
# STEP 7: ResNet (2015) - SKIP CONNECTIONS REVOLUTION
# ============================================================================

def residual_block(x, filters, kernel_size=3, stride=1):
    """
    Create a residual block.

    Residual block concept:
    - Add skip connection from input to output
    - Allows training very deep networks (100+ layers)
    - Solves vanishing gradient problem
    - F(x) + x instead of just F(x)

    This is the KEY innovation of ResNet!

    Args:
        x: Input tensor
        filters: Number of filters
        kernel_size: Size of conv kernel
        stride: Stride for conv

    Returns:
        Output tensor with residual connection
    """
    # Main path
    shortcut = x

    # TODO: STUDENT FILL IN
    # First conv layer: filters, kernel_size, padding='same', activation='relu'
    x = layers.Conv2D(filters, kernel_size, strides=stride,
                     padding='________', activation='________')(x)

    # TODO: STUDENT FILL IN
    # Second conv layer: filters, kernel_size, padding='same', NO activation
    x = layers.Conv2D(________, kernel_size, padding='________')(x)

    # Adjust shortcut if dimensions changed
    if stride != 1:
        shortcut = layers.Conv2D(filters, (1, 1), strides=stride, padding='same')(shortcut)

    # TODO: STUDENT FILL IN
    # Add skip connection: x + shortcut
    x = layers.add([________, ________])

    # Activation after addition
    x = layers.Activation('relu')(x)

    return x

def create_resnet():
    """
    Create ResNet architecture (simplified for CIFAR-10).

    Original ResNet (2015) by He et al:
    - Winner of ImageNet 2015
    - Revolutionary skip connections
    - Enabled training of very deep networks (152 layers!)
    - Solved vanishing gradient problem

    Key innovation:
    - Residual/skip connections: x + F(x)
    - Identity mapping
    - Can train much deeper networks

    Returns:
        keras.Model: Compiled ResNet model
    """
    print("\nSTEP 7: BUILDING ResNet (2015)")
    print("="*50)
    print("Innovation: Skip connections enable very deep networks!")

    input_layer = layers.Input(shape=(32, 32, 3))

    # Initial convolution
    x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(input_layer)

    # Residual blocks
    x = residual_block(x, 64)
    x = residual_block(x, 64)

    x = residual_block(x, 128, stride=2)
    x = residual_block(x, 128)

    x = residual_block(x, 256, stride=2)
    x = residual_block(x, 256)

    # Global average pooling
    x = layers.GlobalAveragePooling2D()(x)

    # Output layer
    output_layer = layers.Dense(10, activation='softmax')(x)

    model = models.Model(inputs=input_layer, outputs=output_layer, name="ResNet")

    model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

    print("\nResNet Architecture:")
    model.summary()
    print("Key innovation: Skip connections (x + F(x))!")

    return model

## TRAINING AND EVALUATION FUNCTIONS

In [None]:

# ============================================================================
# TRAINING AND EVALUATION FUNCTIONS
# ============================================================================

def train_model(model, x_train, y_train, x_test, y_test, epochs=10, batch_size=128):
    """Train a model and return history and training time."""
    print(f"\nTraining {model.name}...")
    print(f"Epochs: {epochs}, Batch size: {batch_size}")

    start_time = time.time()

    history = model.fit(
        x_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        validation_data=(x_test, y_test),
        verbose=1
    )

    training_time = time.time() - start_time

    print(f"{model.name} training completed in {training_time:.2f} seconds")

    return history, training_time

def evaluate_model(model, x_test, y_test, class_names):
    """Evaluate model and return metrics."""
    print(f"\nEvaluating {model.name}...")

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)

    print(f"{model.name} Results:")
    print(f"  Test Loss: {test_loss:.4f}")
    print(f"  Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

    return test_loss, test_accuracy

## COMPARISON AND VISUALIZATION

In [None]:
# ============================================================================
# COMPARISON AND VISUALIZATION
# ============================================================================

def compare_all_models(results):
    """
    Compare all models with comprehensive analysis.

    Args:
        results: Dictionary with model results
    """
    print("\n" + "="*60)
    print("STEP 8: COMPREHENSIVE MODEL COMPARISON")
    print("="*60)

    print("\n📊 PERFORMANCE SUMMARY:")
    print("-" * 60)
    print(f"{'Model':<20} {'Accuracy':<12} {'Loss':<12} {'Time (s)':<12} {'Params':<12}")
    print("-" * 60)

    for name, data in results.items():
        print(f"{name:<20} {data['accuracy']:<12.4f} {data['loss']:<12.4f} "
              f"{data['time']:<12.2f} {data['params']:<12,}")

    print("\n🏆 RANKINGS:")
    print("-" * 40)

    # Sort by accuracy
    sorted_by_acc = sorted(results.items(), key=lambda x: x[1]['accuracy'], reverse=True)
    print("\nBy Accuracy:")
    for i, (name, data) in enumerate(sorted_by_acc, 1):
        print(f"  {i}. {name}: {data['accuracy']:.4f}")

    # Sort by parameters (efficiency)
    sorted_by_params = sorted(results.items(), key=lambda x: x[1]['params'])
    print("\nBy Efficiency (fewer parameters):")
    for i, (name, data) in enumerate(sorted_by_params, 1):
        print(f"  {i}. {name}: {data['params']:,} params")

def plot_architecture_comparison(results):
    """Plot comparison of all architectures."""
    models = list(results.keys())
    accuracies = [results[m]['accuracy'] for m in models]
    params = [results[m]['params'] / 1e6 for m in models]  # in millions
    times = [results[m]['time'] for m in models]

    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

    # Accuracy comparison
    colors = ['red', 'orange', 'yellow', 'lightgreen', 'cyan', 'blue']
    ax1.bar(models, accuracies, color=colors)
    ax1.set_title('Test Accuracy Comparison')
    ax1.set_ylabel('Accuracy')
    ax1.set_ylim([0, 1])
    ax1.tick_params(axis='x', rotation=45)
    for i, v in enumerate(accuracies):
        ax1.text(i, v + 0.02, f'{v:.3f}', ha='center')

    # Parameters comparison
    ax2.bar(models, params, color=colors)
    ax2.set_title('Model Complexity (Parameters)')
    ax2.set_ylabel('Parameters (Millions)')
    ax2.tick_params(axis='x', rotation=45)
    for i, v in enumerate(params):
        ax2.text(i, v + 0.1, f'{v:.2f}M', ha='center')

    # Training time comparison
    ax3.bar(models, times, color=colors)
    ax3.set_title('Training Time (10 epochs)')
    ax3.set_ylabel('Time (seconds)')
    ax3.tick_params(axis='x', rotation=45)
    for i, v in enumerate(times):
        ax3.text(i, v + 5, f'{v:.1f}s', ha='center')

    # Accuracy vs Parameters scatter
    ax4.scatter(params, accuracies, c=range(len(models)), cmap='viridis', s=200)
    for i, model in enumerate(models):
        ax4.annotate(model, (params[i], accuracies[i]),
                    xytext=(5, 5), textcoords='offset points', fontsize=8)
    ax4.set_xlabel('Parameters (Millions)')
    ax4.set_ylabel('Accuracy')
    ax4.set_title('Accuracy vs Model Complexity')
    ax4.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('cnn_architecture_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("\n📈 Comparison plot saved as 'cnn_architecture_comparison.png'")

def analyze_evolution():
    """Analyze the evolution of CNN architectures."""
    print("\n🔬 ARCHITECTURAL EVOLUTION ANALYSIS:")
    print("="*60)

    print("\n1998 - LeNet-5:")
    print("  • Pioneer of CNNs")
    print("  • Small network, tanh activation")
    print("  • Designed for simple tasks (MNIST)")

    print("\n2012 - AlexNet:")
    print("  • Deep learning revolution")
    print("  • ReLU activation (faster training)")
    print("  • Dropout regularization")
    print("  • Won ImageNet competition")

    print("\n2014 - VGG-16:")
    print("  • Very deep (16 layers)")
    print("  • Uniform 3x3 convolutions")
    print("  • Simple but effective architecture")

    print("\n2014 - Inception:")
    print("  • Multi-scale feature extraction")
    print("  • Parallel convolutions (1x1, 3x3, 5x5)")
    print("  • More efficient than VGG")

    print("\n2015 - ResNet:")
    print("  • Skip connections revolution")
    print("  • Enables very deep networks (100+ layers)")
    print("  • Solved vanishing gradient problem")
    print("  • State-of-the-art performance")

    print("\n🎯 KEY LESSONS:")
    print("  1. Depth matters (but needs skip connections)")
    print("  2. ReLU > tanh/sigmoid")
    print("  3. Regularization prevents overfitting")
    print("  4. Multi-scale features improve performance")
    print("  5. Skip connections enable deeper networks")

## MAIN EXECUTION - SEQUENTIAL TRAINING

In [None]:
# ============================================================================
# MAIN EXECUTION - SEQUENTIAL TRAINING
# ============================================================================

print("CNN ARCHITECTURE EVOLUTION: LeNet to ResNet")
print("="*60)



# Dictionary to store results
results = {}

# Train each architecture sequentially
architectures = [
    ("Traditional CNN", create_traditional_cnn),
    ("LeNet-5", create_lenet5),
    ("AlexNet", create_alexnet),
    ("VGG-16", create_vgg16),
    ("Inception", create_inception),
    ("ResNet", create_resnet)
]

EPOCHS = 10
BATCH_SIZE = 128

for name, create_func in architectures:
    print(f"\n{'='*60}")
    print(f"TRAINING: {name}")
    print(f"{'='*60}")

    # Create model
    model = create_func()

    # Train model
    history, training_time = train_model(model, x_train, y_train, x_test, y_test,
                                        epochs=EPOCHS, batch_size=BATCH_SIZE)

    # Evaluate model
    test_loss, test_accuracy = evaluate_model(model, x_test, y_test, class_names)

    # Store results
    results[name] = {
        'accuracy': test_accuracy,
        'loss': test_loss,
        'time': training_time,
        'params': model.count_params(),
        'history': history
    }

    print(f"\n✅ {name} completed!")

# Compare all models
compare_all_models(results)
plot_architecture_comparison(results)
analyze_evolution()

print("\n🎉 EXERCISE COMPLETED!")
print("You've explored the evolution of CNN architectures from 1998 to 2015!")




## 🎓 DISCUSSION QUESTIONS:

1. Why did AlexNet outperform LeNet-5?
   Hint: Think about activation functions and depth

2. Why does VGG-16 have so many parameters?
   Hint: Look at the fully connected layers

3. How does Inception achieve efficiency?
   Hint: Compare parameters with VGG-16

4. Why can ResNet train much deeper networks?
   Hint: Think about gradient flow

5. Trade-offs: Accuracy vs Parameters vs Speed?
   Which architecture would you choose for:
   - Mobile device (limited resources)
   - Cloud server (unlimited resources)
   - Real-time application (speed critical)

🔬 EXPERIMENTS TO TRY:

1. Train for more epochs - which improves most?
2. Add data augmentation - impact on each architecture?
3. Reduce/increase model depth - what happens?
4. Try different optimizers (SGD vs Adam)
5. Implement batch normalization in older architectures

📚 FURTHER LEARNING:

- EfficientNet (2019): Compound scaling
- Vision Transformer (2020): Attention mechanisms
- ConvNeXt (2022): Modern CNN design
