# AlexNet

## Objectives
- **Improving Image Classification Performance**: AlexNet aimed to outperform existing models on image classification benchmarks, particularly the ImageNet dataset, which contains millions of labeled images across thousands of categories

- **Leveraging GPU Computing for Training**: Training deep neural networks requires significant computational resources. AlexNet was one of the first models to effectively leverage GPUs for training, which drastially reduced training time and enabled the exploration of deeper and more complex models

- **Introducing ReLU Non-linearity**: ReLU activation functions were introduced to mitigate the vanishing gradient problem

## Architecture

AlexNet's architecture is a CNN that consists of 12 layers: 5 convolutional layers and 3 pooling layers followed by 3 fully connected layers:

1. **Input Layer**: The input to AlexNet is a colored image of size 224x224 pixels with 3 color channels (RGB)
2. **First Convolutional Layer**: This layer applies 96 convolutional filters with a kernel size of 11x11 and a stride of 4. This results in an output volume of 55x55x96
3. **First Pooling Layer**: Max-pooling with 3x3 window, stride of 2, reducing the spatial dimensions to 27x27x96
4. **Second Convolutional Layer**: Applies 256 convolutional filters with a kernel size of 5x5, stride of 1, and padding of 2, resulting in an output volume of 27x27x256
5. **Second Pooling Layer**: Max-pooling with 3x3 window, stride of 2, reducing the spatial dimensions to 13x13x256
6. **Third Convolutional Layer**: Applies 384 convolutional filters with a kernel size of 3x3, stride of 1, and padding of 1, resulting in an output volume of 13x13x384
7. **Fourth Convolutional Layer**: Similar to the third convolutional layer, it applies 384 filters with the same parameters, resulting in an output volume of 13x13x384
8. **Fifth Convolutional Layer**: Applies 256 convolutional filters with a kernel size of 3x3, stride of 1, and padding of 1, resulting in an output volume of 13x13x256
9. **Third Pooling Layer**: Max-pooling with 3x3 window, stride of 2, reducing the spatial dimensions to 6x6x256
10. **First Fully Connected Layer**: 4096 neurons
11. **Second Fully Connected Layer**: 4096 neurons
12. **Third Fully Connected Layer (Output Layer)**: Dense layer with 1000 neurons, one for each class in the ImageNet dataset.

### Use AlexNet:

- On medium-scale image classification tasks with moderate computational resources
- In legacy systems where compatibility is necessary
- For transfer learning with pretrained models for less complex tasks

### Avoid AlexNet:

- When state-of-the-art performance and high accuracy are required
- On very large-scale datasets where newer models excel
- In resource-constrained environments where more efficient models are needed
- For cutting-edge research and development exploring new deep learning techniques
- When more advanced models can provide better feature extraction and performance in transfer learning

### Libraries

In [1]:
import torch
from torch import nn
from d2l import torch as d2l