# 05 CNN Forward Method - PyTorch Deep Learning Implementation
## Review

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

The `forward()` method is the **actual network transformation**. <br>The forward method is the mapping that maps an **input tensor** to a prediction **output tensor**. Let's see how this is done.

This means that the forward method implementation will use **all of the layers** we defined inside the constructor. In this way, the **forward method** explicitly defines the **network's transformation**.

In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6,out_channels=12,kernel_size = 5)
        
        self.fc1 = nn.Linear(in_features = 12*4*4,out_features = 120)
        self.fc2 = nn.Linear(in_features = 120,out_features = 60)
        self.out = nn.Linear(in_features = 60,out_features = 10)
        
    def forward(self,t):
        # implement the forward pass
        return t

## Input Layer 1

The **input layer** of any neural network is determined by the **input data**. 

```python
def forward(self,t):
    #(1) input ;ayer
    t = t
```

## Hidden Convolutional Layers: Layers 2 And 3

To preform the convolution operation, we pass the tensor to the forward method of the first convolutional layer, `self.conv1`. We've learned how all PyTorch neural network modules have `forward()` methods, and when we call the `forward()` method of a `nn.Module`, there is a special way that we make the call.

When want to call the `forward()` method of a `nn.Module` instance, we call the actual instance instead of calling the `forward()` method directly.

Instead of doing this `self.conv1.forward(tensor)`, we do this `self.conv1(tensor)`.

```python
# (2) hidden conv layer
t = self.conv1(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)

# (3) hidden conv layer
t = self.conv2(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)
```

As we can see here, our input tensor is transformed as we move through the convolutional layers. The first convolutional layer has a convolutional operation, followed by a **relu activation** operation whose output is then passed to a max pooling operation with `kernel_size=2` and `stride=2`.(其中kernel size就是filter的大小)

The output tensor `t` of the first convolutional layer is then passed to the next convolutional layer, which is identical except for the fact that we call `self.conv2()` instead of `self.conv1()`.

Each of these layers is comprised of a collection of weights (data) and a collection operations (code). The weights are encapsulated inside the `nn.Conv2d()` class instance. The `relu()` and the `max_pool2d()` calls are just **pure operations**. Neither of these have weights, and this is why we call them directly from the `nn.functional` API.

Sometimes we may see `pooling operations` referred to as pooling layers. Sometimes we may even hear activation operations called activation layers.

**However**, what makes a layer **distinct** from an operation is that layers have **weights**. Since `pooling` operations and `activation` functions **do not have weights**, we will refer to them as **operations** and view them as being added to the collection of layer operations.

**确定一个操作能不能叫做layer就看他有没有weights，不过也别被这些术语搞糊涂了，我们就是通过一系列方法的组合来实现这个`forward()`**

Don't let these terms confuse the fact that the whole network is simply a composition of functions, and what we are doing now is defining this composition inside our `forward()` method.

## Hidden Linear Layers: Layers 4 And 5
Before we pass our input to the first hidden linear layer, we must `reshape()` or `flatten` our tensor. This will be the case any time we are passing output from a convolutional layer as input to a linear layer.

Since the forth layer is the first linear layer, we will include our reshaping operation as a part of the forth layer.

```python
# (4) hidden linear layer
t = t.reshape(-1, 12 * 4 * 4)
t = self.fc1(t)
t = F.relu(t)

# (5) hidden linear layer
t = self.fc2(t)
t = F.relu(t)
```

number `12` in the reshaping operation is determined by the number of output channels coming from the previous convolutional layer(`out_channels=12`).

The `4 * 4` is actually the height and width of each of the 12 output channels.

The height and width dimensions have been reduced from `28 x 28` to `4 x 4` by the convolution and pooling operations.

After the tensor is reshaped, we pass the `flattened` tensor to the linear layer and pass this result to the `relu() activation function`.

## Output Layer 6

The sixth and last layer of our network is a linear layer we call the **output layer**. When we pass our tensor to the output layer, the result will be the **prediction tensor**. Since our data has ten prediction classes, we know our output tensor will have ten elements.

```python
# (6) output layer
t = self.out(t)
#t = F.softmax(t, dim=1)
```
Inside the network we usually use `relu()` as our `non-linear activation function`, but for the output layer, whenever we have a single category that we are trying to predict, we use `softmax()`. The `softmax function` returns a positive probability for each of the prediction classes, and the probabilities sum to `1`.

**However**, in our case, we won't use `softmax()` because the loss function that we'll use, `F.cross_entropy()`, implicitly performs the `softmax()` operation on its input, so we'll just return the result of the last linear transformation.

The implication of this is that our network will be trained using the softmax operation but will not need to compute the additional operation when the network is used for inference after the training process is complete.

In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6,out_channels=12,kernel_size = 5)
        
        self.fc1 = nn.Linear(in_features = 12*4*4,out_features = 120)
        self.fc2 = nn.Linear(in_features = 120,out_features = 60)
        self.out = nn.Linear(in_features = 60,out_features = 10)
        
    def forward(self,t):
        # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (4) hidden linear layer
        t = t.reshape(-1, 12 * 4 * 4)
        t = self.fc1(t)
        t = F.relu(t)

        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)

        # (6) output layer
        t = self.out(t)
        #t = F.softmax(t, dim=1)

        return t

## Quiz 05
Q1:In neural network programming, the forward method of a network instance explicitly defines the network's ______________<br>
A1:transformation

Q2:In neural network programming, the forward method of a network instance is the mapping that maps an input tensor to a prediction output tensor.<br>
A2:**True**

Q3:The input layer of any neural network is determined by the input data. For this reason, we can think of the input layer as the identity transformation. Mathematically, this is the function,$$f(x)=x$$
A3:**True**

Q4:In neural network programming, all layers that are not the input or output layers are called hidden layers.<br>
A4:**True**

Q5:In neural network programming, the operations that are defined using _______________ are called layers.<br>
A5:weights

Q6:In the most general sense, neural networks are mathematical functions. Terms like layers, activation functions, and weights, are just used to help describe the different parts.<br>
A6:**True**

[Reference](https://deeplizard.com/learn/video/MasG7tZj-hw)

---
---

# 06 CNN Image Prediction With PyTorch - Forward Propagation Explained

## What Is Forward Propagation?

*Forward propagation* is the process of transforming an input tensor to an output tensor. <br>
At its core, a neural network is a function that maps an input tensor to an output tensor, and *forward propagation* is just a special name for the process of **passing an input to the network and receiving the output** from the network.

For our network, what this means is simply passing our input tensor to the network and receiving the output tensor. To do this, we pass our sample data to the network's `forward()` method.

This is why, the `forward()` method has the name *forward*, the execution of **the `forward()` is the process of forward propagation**.

The word *forward*, is pretty *straight forward*. ;)

However, the word *propagate* means to move or transmit *through some medium*. In the case of **neural networks**, data propagates through the **layers of the network**.

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

torch.set_printoptions(linewidth=120)

In [13]:
train_set = torchvision.datasets.FashionMNIST(
    root = './data/FashionMNIST'
    ,train = True
    ,download = True
    ,transform = transforms.Compose([
        transforms.ToTensor()
    ])
)

In [29]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels = 1,out_channels = 6,kernel_size = 5)
        self.conv2 = nn.Conv2d(in_channels = 6,out_channels = 12,kernel_size = 5)
        
        self.fc1 = nn.Linear(in_features = 12*4*4,out_features=120)
        self.fc2 = nn.Linear(in_features = 120,out_features=60)
        self.out = nn.Linear(in_features = 60,out_features=10)
        
    def forward(self,t):
        t = F.relu(self.conv1(t))
        t = F.max_pool2d(t, kernel_size = 2,stride = 2)
        
        t = F.relu(self.conv2(t))
        t = F.max_pool2d(t, kernel_size = 2,stride = 2)
        
        t = F.relu(self.fc1(t.reshape(-1,12*4*4)))
        t = F.relu(self.fc2(t))
        t = self.out(t)
        
        return t

## Predicting With The Network: Forward Pass

Before we being, we are going to **turn off PyTorch’s gradient calculation feature**. This will stop PyTorch from automatically building a computation graph as our tensor flows through the network.
![pytorch neuron](https://deeplizard.com/images/neural%20network%202%203%202.png)

The computation graph keeps track of the network's mapping by tracking each computation that happens. The graph is used during the training process to calculate the derivative (gradient) of the loss function with respect to the network’s weights.

**Since we are not training the network yet, we aren’t planning on updating the weights, and so we don’t require gradient calculations. We will turn this back on when training begins.**

This process of tracking calculations happens in real-time, as the calculations occur.Turning it off isn’t strictly necessary but having the feature turned off does reduce memory consumption since the graph isn't stored in memory. This code will turn the feature off.
```python
torch.set_grad_enabled(False) 
```

### Passing A Single Image To The Network


In [30]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x216a5bc1ca0>

In [31]:
network = Network()

Next, we’ll procure a **single** sample from our training set, unpack the image and the label, and verify the image’s shape:

In [32]:
sample = next(iter(train_set))
image,label = sample
image.shape
#The image tensor’s shape indicates that we have a single channel image 
#that is 28 in height and 28 in width. Cool, this is what we expect.

torch.Size([1, 28, 28])

In [33]:
label

9

Now, there's a second step we must preform before simply passing this tensor to our network. When we pass a tensor to our network, the network is expecting a **batch**, so even if we want to pass a single image, we still need a batch.

This is no problem. We can create **a batch that contains a single image**. All of this will be packaged into a single **four dimensional tensor** that reflects the following dimensions.

This requirement of the network **arises from** the fact that the `forward()` method's in the `nn.Conv2d` convolutional layer classes expect their tenors to have **4 dimensions**. This is pretty standard as most neural network implementations deal with batches of input samples rather than single samples.

To put our single sample image tensor into a batch with a size of 1, we just need to `unsqueeze()` the tensor to add an additional dimension. [squeeze tutorial](https://deeplizard.com/learn/video/fCVuiW9AFzY)

In [34]:
# Inserts an additional dimension that represents a batch of size 1
image.unsqueeze(0).shape

torch.Size([1, 1, 28, 28])

In [35]:
pred = network(image.unsqueeze(0)) 
# image shape needs to be (batch_size × in_channels × H × W)

In [36]:
pred

tensor([[ 0.0207, -0.0841, -0.0833,  0.1248, -0.0734, -0.0905, -0.0102, -0.0013,  0.0937, -0.0687]])

And we did it! We've used our forward method to get a prediction from the network. The network has returned a prediction tensor that contains a prediction value for each of the ten categories of clothing.

In [37]:
pred.shape

torch.Size([1, 10])

In [38]:
 pred.argmax(dim=1)

tensor([3])

In [39]:
label

9

For each input in the batch, and for each prediction class, we have a prediction value. If we wanted these values to be probabilities, we could just the `softmax()` function from the `nn.functional` package.

In [40]:
F.softmax(pred, dim=1)

tensor([[0.1036, 0.0933, 0.0933, 0.1149, 0.0943, 0.0927, 0.1004, 0.1013, 0.1114, 0.0947]])

In [41]:
F.softmax(pred, dim=1).sum()

tensor(1.)

In [42]:
print( pred.argmax(dim=1))
print(label)

tensor([3])
9


The label for the first image in our training set is `9`, and using the `argmax()` function we can see that the highest value in our prediction tensor occurred at the class represented by index `3`.

The prediction in this case is **incorrect**, **which is what we expect** because the weights in the network were generated randomly.

In [43]:
pred

tensor([[ 0.0207, -0.0841, -0.0833,  0.1248, -0.0734, -0.0905, -0.0102, -0.0013,  0.0937, -0.0687]])

### Network Weights Are Randomly Generated
There are a couple of important things we need to point out about these results. Most of the probabilities came in close to `10%`, and this `makes sense` because our network is guessing and we have ten prediction classes coming from a `balanced dataset`.

Another implication of the randomly generated weights is that **each time we create a new instance of our network**, the **weights** within the network will be **different**. This means that the predictions we get will be different if we create different networks. Keep this in mind. Your predictions will be different from what we see here.

In [44]:
net1 = Network()
net2 = Network()

In [45]:
net1(image.unsqueeze(0))

tensor([[ 0.0299,  0.0675, -0.0123,  0.0595,  0.0621,  0.0469, -0.1198,  0.0559,  0.1360,  0.1458]])

In [46]:
net2(image.unsqueeze(0))

tensor([[ 0.0902,  0.0108,  0.0641,  0.0207, -0.0602,  0.0682, -0.0044, -0.0218,  0.0343,  0.0474]])

## Quiz 06
Q1:_____________ is the process of transforming an input tensor to an output tensor.<br>A1:Forward propagation

Q2:The concept of forward propagation is used to indicate that the input tensor data is transmitted through the network in the backward direction.<br>
A2:**False**

Q3:In neural network programming, the computational graph keeps track of the network's mapping by tracking each computation that happens. The graph is used during the training process to calculate the derivative (gradient) of the loss function with respect to the network’s ______________<br>
A3:weights

Q4:Suppose we have a balanced dataset with ten different classes. Choose an image from this dataset and pass the image to a CNN that has randomly initialized weights. In this situation, what is the approximate prediction probability should we expect to see across all the prediction classes?<br>
A4:10%

Q5:Suppose we have the pred tensor below. What will be the output of the sum() of the softmax().
```python
> pred
tensor([[0.0991, 0.0916, 0.0907, 0.0949, 0.1013, 0.0922, 0.0990, 0.1130, 0.1107, 0.1074]])

> F.softmax(pred, dim=1).sum()
```
A5:tensor(1.)

[Reference](https://deeplizard.com/learn/video/6vweQjouLEE)