# Deep Learning 1
We will provide a detailed explanation based on the handwritten digit recognition program introduced on the second lecture.
First, let's present the handwritten digit recognition program.

In [2]:
from PIL import Image # Import image processing module
import matplotlib.pyplot as plt
import torch # Import PyTorch library for deep learning
import torchvision # Import TorchVision library for image processing

# Define the preprocessing of image data.
# Image data is converted into tensors.
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor()])

# Download training data and test data in MNIST dataset
train_mnist = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_mnist = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Define Dataloader which outputs for every 100 training data randomly.
train_loader = torch.utils.data.DataLoader(train_mnist, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_mnist, batch_size=100, shuffle=True)

# Define neural network
class CNN(torch.nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 32, 3) # 28x28x1 -> 26x26x32 (CNN)
        self.conv2 = torch.nn.Conv2d(32, 64, 3) # 26x26x32 -> 24x24x64  (CNN)
        self.pool = torch.nn.MaxPool2d(2, 2) # 24x24x64 -> 12x12x64 (Max Pooling)
        self.fc1 = torch.nn.Linear(12 * 12 * 64, 128) # Fully-connected layer
        self.fc2 = torch.nn.Linear(128, 10) # Fully-connected layer

    def forward(self, x):
        h = torch.nn.functional.relu(self.conv1(x)) # Apply the convolution process to the input data
        h = torch.nn.functional.relu(self.conv2(h)) # Apply the convolution process to the hidden data
        h = self.pool(h) # Apply max pooling to the hidden data
        h = torch.flatten(h, start_dim=1) # Convert the tensor to a vector
        h = torch.nn.functional.relu(self.fc1(h)) # Apply the fully-connected layer to the vector
        y = self.fc2(h) # Apply the fully-connected layer to the hidden data
        return y

# Instantiate the defined neural network
model = CNN()
# Define a loss function
criterion = torch.nn.CrossEntropyLoss()
# Define an optimizer which minimizes the loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

# Training loop
epochs = 5
for epoch in range(epochs):
    total_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        # Initialize the optimizer
        optimizer.zero_grad()
        
        # Predict the target
        outputs = model(inputs)
        # Calculate the error between the prediction and the target
        loss = criterion(outputs, labels)
        # Train the neural network
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        if i == 0 or (i+1) % 100 == 0:
            print('{0}, {1}, Loss:{2}'.format(epoch+1, i+1, total_loss/(i+1)))
    print('Total loss:{}'.format(total_loss/(i+1)))
    
    total_loss = 0.0
    

1, 1, Loss:2.299196720123291
1, 100, Loss:1.503427717089653
1, 200, Loss:0.9781293166428804
1, 300, Loss:0.7688145390152932
1, 400, Loss:0.6506744086369872
1, 500, Loss:0.5764600720852614
1, 600, Loss:0.5216090929880738
Total loss:0.5216090929880738
2, 1, Loss:0.2686532735824585
2, 100, Loss:0.22512469820678235
2, 200, Loss:0.2149828828126192
2, 300, Loss:0.20465209330121675
2, 400, Loss:0.1972691385820508
2, 500, Loss:0.18863634123653172
2, 600, Loss:0.1826807174521188
Total loss:0.1826807174521188
3, 1, Loss:0.0929945632815361
3, 100, Loss:0.1276093866676092
3, 200, Loss:0.1263842006586492
3, 300, Loss:0.12236206198732058
3, 400, Loss:0.11847318078391254
3, 500, Loss:0.11574019915610552
3, 600, Loss:0.11186994348652661
Total loss:0.11186994348652661
4, 1, Loss:0.11827472597360611
4, 100, Loss:0.08581236591562628
4, 200, Loss:0.08788658922538162
4, 300, Loss:0.08460415255899231
4, 400, Loss:0.08205918801715598
4, 500, Loss:0.0811321620810777
4, 600, Loss:0.07944747865200043
Total loss

## Convolutional Layer
In the convolutional layer, filters are used to extract local features from an image.
Since the purpose of the filter is to capture only local features of the image, it is usually not set very large.
Common sizes like 3x3 or 4x4 are often used.

In the Convolution layer, a filter $h(i,j)$ is used to perform the following computation on a part of the image.
$$ u'(x, y) = \sum_i\sum_j h(i, j) u(x+i, x+j)$$
This operation is the element-wise product with the filter, $h(i,j)$ and a part of the image, and can be considered as a dot product.
This means that this operation indicates the similarity between the filter and a part of the image.
In other words, the convolution layer can be seen as a process that checks where the pattern represented by the filter exists on the image.

This process involves scanning the input image from the top-left corner sequentially, processing the entire image.
By adjusting the movement amount, it's possible to control how much image information is shared between the similarities calculated with the filter.
This is specified as the "stride".

Each filter represents one pattern.
Therefore, if you want to detect multiple patterns from an image, such as straight lines, curves, and corners, you need to prepare multiple filters.
Conceptually, each filter produces one "channel".
Therefore, you need to specify the number of filters. Theoretically, the more filters you use, the more diverse patterns you can detect.

````
self.conv1 = torch.nn.Conv2d(1, 32, 3) # 28x28x1 -> 26x26x32 (CNN)
````
In this setting, the input information has one channel.
By using 32 types of filters, it outputs 32 channels.
The size of the filter is set to 3x3.
While the stride is not explicitly mentioned, it defaults to 1, which means the filter processes the image by shifting one unit at a time.

## Pooling Layer
In image recognition, it's desirable to overlook minor rotations, translations, and scaling.
For instance, in handwritten digits, the features identified by filters might slightly shift due to these variations.
However, when making the final prediction, it's preferable to ignore such minute changes.
In the pooling layer, to absorb such minor variations, a representative point is extracted from a predefined region, aiming to accommodate the aforementioned changes.

The purpose of the pooling layer is to determine a value that represents the specified region.
For this reason, common pooling methods include "Max Pooling" and "Average Pooling".
Due to its simplicity in processing, many systems using CNNs frequently employ "Max Pooling".

````
self.pool = torch.nn.MaxPool2d(2, 2) # 24x24x64 -> 12x12x64 (Max Pooling)

````
In this setting, the maximum value is extracted from a 2x2 area for each channel.
Since each channel contains a similarity measure of patterns extracted by the filters, even if there are minor changes within a 2x2 range, they will be absorbed through the processing of the pooling layer.

## Classifier
Using the features obtained from the convolutional and pooling layers, the Fully-connected layers are employed to make the final classification.
The number of outputs matches the number of distinct categories to be classified, and the prediction is made based on the largest output value.

## Training
We have designed a CNN network.
From here on, we will proceed with training using the training data.
The basic flow involves defining a loss function to evaluate the discrepancy between the predictions and the actual values.
The parameters (weights of the neural network), specifically the weights of the filters in the convolutional layer and the weights in the Fully-connected layer, are adjusted to minimize this loss function.

### Loss function
In this program, the error function is defined below.
````
criterion = torch.nn.CrossEntropyLoss()
````
Here, cross-entropy is used and defined as follows:
$$ \sum_i t_i \log p_i $$
${\mathbf t}$ is a vector where only the correct values are 1, and ${\mathbf p}$ represents the probability of the predicted values from the CNN network.
However, the output ${\mathbf o}$ from the CNN network is just a score, so to convert it to a probability, the following softmax function is used.
$$ p_i = \frac{e^{o_i}}{\sum_i e^{o_i}}$$

By differentiating this loss function with respect to the parameters, we can determine the gradient and seek the value that minimizes it.
Using the chain rule of differentiation, the calculation of the gradient in the neural network can be expressed as a product form.
This allows us to write the differentiation for each layer as a function, which is referred to as backpropagation.
When writing this in an actual program, the gradient is computed using "loss.backward()", and corrections are made with "optimizer.step()".
Specifically, for the update method, we use a state-of-the-art approach called Adam.

# Assignment
Using the CNN network created here, please classify the FashionMNIST data.

In [5]:
from PIL import Image # Import image processing module
import matplotlib.pyplot as plt
import torch # Import PyTorch library for deep learning
import torchvision # Import TorchVision library for image processing

# Define the preprocessing of image data.
# Image data is converted into tensors.
transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor()])

# Download training data and test data in MNIST dataset
train_fashionmnist = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_fashionmnist = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

# Define Dataloader which outputs for every 100 training data randomly.
train_loader = torch.utils.data.DataLoader(train_fashionmnist, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_fashionmnist, batch_size=100, shuffle=True)

plt.imshow(train_fashionmnist[0][0].transpose(1, 2, 0))

TypeError: transpose() received an invalid combination of arguments - got (int, int, int), but expected one of:
 * (int dim0, int dim1)
 * (name dim0, name dim1)
