# 01Build PyTorch CNN - Object Oriented Neural Networks

In [1]:
class Lizard: #class declaration
    def __init__(self, name): #class constructor (code)
        self.name = name #attribute (data)

    def set_name(self, name): #method declaration (code)
        self.name = name #method implementation (code)

The first line declares the class and specifies the class name, which in this case is Lizard.

The second line defines a special method called the class constructor. Class constructors are called when a new instance of the class is created. As parameters, we have self and name.

The self parameter gives us the ability to create attribute values that are stored or encapsulated within the object. When we call this constructor or any of the other methods, we don't pass the self parameter. Python does this for us automatically.

Argument values for any other parameter are arbitrarily passed by the caller, and these passed values that come in to the method can be used in a calculation or saved and accessed later using self.

After we're done with the constructor, we can create any number of specialized methods like this one here that allows a caller to change the name value that was stored in self. All we have to do here is call the method and pass a new value for the name. 

Building A Neural Network In PyTorch
We now have enough information to provide an outline for building neural networks in PyTorch. The steps are as follows:

Short version:

1.Extend the nn.Module base class.
2.Define layers as class attributes.
3.Implement the forward() method.

In [None]:
#let’s create a simple class to represent a neural network.
class Network:
    def __init__(self):
        self.layer = None
    
    def forward(self,t):
        t = self.layer(t)
        return t

This gives us a simple network class that has a single dummy layer inside the constructor and a dummy implementation for the forward function.

The implementation for the <code>forward()</code> function takes in a tensor <code>t</code> and transforms it using the dummy layer. After the tensor is transformed, the new tensor is returned.

This is a good start, but the class hasn’t yet extended the <code>nn.Module</code> class. To make our <code>Network</code> class extend <code>nn.Module</code>, we must do two additional things:

1.Specify the <code>nn.Module</code> class in parentheses on line 1.<br/>
2.Insert a call to the super class constructor on line 3 inside the constructor.
This gives us:

In [2]:
import torch.nn as nn

In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = None
    def forward(self,t):
        t = self.layer(t)
        return t

These changes transform our simple neural network into a PyTorch neural network because we are now extending PyTorch's <code>nn.Module</code> base class.

With this, we are done! Now we have a <code>Network</code> class that has all of the functionality of the PyTorch <code>nn.Module</code> class.

## Define The Network’s Layers As Class Attributes
At the moment, our Network class has a single dummy layer as an attribute. Let’s replace this now with some real layers that come pre-built for us from PyTorch's <code>nn</code> library. We’re building a CNN, so the two types of layers we'll use are linear layers and convolutional layers.

In [5]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6,out_channels=12,kernel_size = 5)
        
        self.fc1 = nn.Linear(in_features = 12*4*4,out_features = 120)
        self.fc2 = nn.Linear(in_features = 120,out_features = 60)
        self.out = nn.Linear(in_features = 60,out_features = 10)
        
    def forward(self,t):
        # implement the forward pass
        return t

In [6]:
network =Network()
network

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

## Quiz 01
Q1:When building a basic convolutional neural network, the goal is to model or approximate a function that maps image inputs to the correct ____________<br/>
A1:output class

Q2:The words _______________ and network mean the same thing.<br/>
A2:model

Q3:Neural networks and layers in PyTorch extend the _______________ class. This means that we must extend the _______________ class when building a new layer or neural network in PyTorch.<br/>
A3:nn.Module

Q4:When we pass a tensor to our network as input, the tensor flows _______________ though each layer transformation until the tensor reaches the output layer. This process of a tensor flowing _______________ though the network is known as a forward pass.<br/>
A4:forward

Q5:In the code below, we used the abbreviation fc in fc1 and fc2 because linear layers are also called fully connected layers. Linear layers also have a third name that is sometimes used. This name is _______________ layers.

<code>class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, t):
        # implement the forward pass
        return t</code>
A5:dense

# 02 CNN Layers - Deep Neural Network Architecture

## Quiz02
Q1:Parameters are used in function definitions as place-holders while _______________ are the actual values that are passed to the function.<br/>
A1:arguments

Q2:In general, _______________ are parameters whose values are chosen manually and arbitrarily.<br/>
A2:hyperparameters

Q3:As neural network programmers, we choose hyperparameter values mainly based on _______________ and increasingly by utilizing values that have proven to work well in the past.<br/>
A3:trial and error

Q4:In neural network programming, the words kernel and _______________ are interchangeable.<br/>
A4:filter

Q5:With a basic CNN, the <code>out_channels</code> parameter of a convolutional layer sets the number of ____________<br/>
A5:filters

Q6:Data dependent hyperparameters are parameters whose values are dependent on data. Two data dependent hyperparameters that stick out with basic CNNs are the <code>in_channels</code> of the first convolutional layer, and the <code>out_features</code> of the ____________<br/>
A6:output layer
    
Q7:In a basic CNN, the <code>in_channels</code> of the first convolutional layer depend on the number of ____________<br/>
A7:images in the training set
    
Q8:In a basic CNN, the <code>out_features</code> coming from the output layer depend on the number of<br/>
A8:classes in the training set

# 03 CNN Weights - Learnable Parameters in Neural Networks

In [7]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t

Learnable Parameters
Learnable parameters are parameters whose values are learned during the training process.

With learnable parameters, we typically start out with a set of arbitrary values, and these values then get updated in an iterative fashion as the network learns.

In fact, when we say that a network is learning, we specifically mean that the network is learning the appropriate values for the learnable parameters. Appropriate values are values that minimize the loss function.

In [8]:
network = Network()

In [9]:
print(network)

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


The <code>print()</code> function prints to the console a string representation of our network. With a sharp eye, we can notice that the printed output here is detailing our network’s architecture listing out our network’s layers, and showing the values that were passed to the layer constructors.




In [10]:
class Network():
    def __init__(self):
        #super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t


All Python classes automatically extend the object class. If we want to provide a custom string representation for our object, we can do it, but we need to introduce another object oriented concept called overriding.

When we extend a class, we get all of its functionality, and to complement this, we can add additional functionality. However, we can also override existing functionality by changing it to behave differently.

We can override Python’s default string representation using the <code> __repr__  </code>function. This name is short for representation.

同理 pytorch中的 __repr__  是定义在module.py中

In [12]:
network = Network()
print(network)

<__main__.Network object at 0x000001DA2B4B65B0>


In [13]:
class Network():
    def __init__(self):
        #super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t
    def __repr__(self):
        return "lizardnet"

In [14]:
network = Network()
print(network)

lizardnet


In [15]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t
    
network = Network()
print(network)

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


### Accessing The Network's Layers
In Python and many other programming languages, we access attributes and methods of objects using dot notation.

In [16]:
network.conv1

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [17]:
network.conv2

Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))

In [18]:
network.fc1

Linear(in_features=192, out_features=120, bias=True)

In [19]:
network.fc2

Linear(in_features=120, out_features=60, bias=True)

In [20]:
network.out

Linear(in_features=60, out_features=10, bias=True)

In [21]:
network

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

In [22]:
network.conv1.weight

Parameter containing:
tensor([[[[ 0.0516,  0.0247,  0.0753,  0.1139,  0.0294],
          [ 0.0286, -0.1143,  0.0187, -0.1602,  0.0889],
          [-0.0862,  0.1164, -0.0797, -0.0881, -0.0659],
          [-0.0644,  0.0318, -0.0891,  0.1878, -0.1816],
          [-0.1455,  0.1939,  0.0116, -0.0983,  0.0895]]],


        [[[ 0.0798, -0.0607,  0.1528,  0.0290,  0.1208],
          [-0.0843,  0.0786,  0.1338, -0.0950,  0.0589],
          [-0.0068,  0.1232,  0.1197, -0.0473, -0.0312],
          [ 0.0098, -0.1048, -0.1724,  0.0827,  0.1873],
          [-0.0016,  0.0097,  0.1008, -0.1015,  0.0546]]],


        [[[ 0.0906,  0.1924,  0.1485, -0.1521,  0.1248],
          [-0.1287,  0.1772, -0.1020,  0.1167, -0.0005],
          [ 0.0478, -0.1350,  0.0539, -0.1415, -0.0640],
          [ 0.1415, -0.1464,  0.1800,  0.0145,  0.1695],
          [ 0.1828, -0.0337,  0.1776,  0.0602,  0.0546]]],


        [[[ 0.0084,  0.1880,  0.0287,  0.0334,  0.0222],
          [-0.1886, -0.0883, -0.1730,  0.1352,  0.1244

### PyTorch Parameter Class
To keep track of all the weight tensors inside the network. PyTorch has a special class called <code>Parameter. </code>The <code>Parameter</code> class extends the tensor class, and so the weight tensor inside every layer is an instance of this <code>Parameter</code> class. This is why we see the <code>Parameter containing</code> text at the top of the string representation output.

We can see in the Pytorch source code that the <code>Parameter</code> class is overriding the <code>__repr__ </code>function by prepending the text parameter containing to the regular tensor class representation output.

<code>
def __repr__(self):
    return 'Parameter containing:\n' + super(Parameter, self).__repr__()
</code>

### Weight Tensor Shape
Remember, the shape of a tensor really encodes all the information we need to know about the tensor.

In [23]:
network.conv1

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [24]:
network.conv1.weight.shape

torch.Size([6, 1, 5, 5])

The shape of the weight tensor for the first convolutional layer shows us that we have a rank-4 weight tensor. The first axis has a length of 6, and this accounts for the 6 filters.

The second axis has a length of 1 which accounts for the single input channel, and the last two axes account for the height and width of the filter.

The way to think about this is as if we are packaging all of our filters into a single tensor.

Now, the second conv layer has 12 filters, and instead of convolving a single input channel, there are 6 input channels coming from the previous layer.

In [25]:
network.conv2.weight.shape

torch.Size([12, 6, 5, 5])

Think of this value of 6 here as giving each of the filters some depth. Instead of having a filter that convolves all of the channels iteratively, our filter has a depth that matches the number of channels.

The two main takeaways about these convolutional layers is that our filters are represented using a single tensor and that each filter inside the tensor also has a depth that accounts for the input channels that are being convolved.

1.All filters are represented using a single tensor.<br/>
2.Filters have depth that accounts for the input channels.<br/>

Our tensors are rank-4 tensors. The first axis represents the number of filters. The second axis represents the depth of each filter which corresponds to the number of input channels being convolved.

The last two axes represent the height and width of each filter. We can pull out any single filter by indexing into the weight tensor’s first axis.

<code>(Number of filters, Depth, Height, Width)</code>

In [28]:
network.fc1

Linear(in_features=192, out_features=120, bias=True)

In [29]:
network.fc1.weight

Parameter containing:
tensor([[ 0.0409, -0.0097,  0.0448,  ..., -0.0019,  0.0157,  0.0264],
        [-0.0116, -0.0508,  0.0406,  ...,  0.0037,  0.0210, -0.0232],
        [-0.0150,  0.0057, -0.0048,  ...,  0.0416,  0.0090,  0.0353],
        ...,
        [-0.0205,  0.0179,  0.0074,  ..., -0.0415,  0.0231, -0.0640],
        [-0.0280, -0.0567, -0.0046,  ..., -0.0166,  0.0321,  0.0230],
        [-0.0591, -0.0495, -0.0082,  ...,  0.0347, -0.0485,  0.0498]],
       requires_grad=True)

In [30]:
network.fc1.weight.shape

torch.Size([120, 192])

In [31]:
network.fc2.weight.shape

torch.Size([60, 120])

In [32]:
network.out.weight.shape

torch.Size([10, 60])

### Accessing The Networks Parameters

The first example is the most common way, and we’ll use this to iterate over our weights when we update them during the training process.

In [33]:
for param in network.parameters():
    print(param.shape)

torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([12, 6, 5, 5])
torch.Size([12])
torch.Size([120, 192])
torch.Size([120])
torch.Size([60, 120])
torch.Size([60])
torch.Size([10, 60])
torch.Size([10])


The second way is just to show how we can see the name as well. This reveals something that we won’t cover in detail, the bias is also a learnable parameter. Each layer has a bias by default, so for each layer we have a weight tensor and a bias tensor.

In [34]:
for name, param in network.named_parameters():
    print(name, '\t\t', param.shape)

conv1.weight 		 torch.Size([6, 1, 5, 5])
conv1.bias 		 torch.Size([6])
conv2.weight 		 torch.Size([12, 6, 5, 5])
conv2.bias 		 torch.Size([12])
fc1.weight 		 torch.Size([120, 192])
fc1.bias 		 torch.Size([120])
fc2.weight 		 torch.Size([60, 120])
fc2.bias 		 torch.Size([60])
out.weight 		 torch.Size([10, 60])
out.bias 		 torch.Size([10])


## Quiz 03

Q1:Hyperparameter values are chosen arbitrarily.<br/>
A1:**True**

Q2:Learnable parameters are parameters whose values are learned during the training process.<br/>
A2:**True**

Q3:With _________ we typically start out with a set of arbitrary values, and these values then get updated in an iterative fashion as the network learns.<br/>
A3:**learnable parameters**

Q4:Weights are learnable parameters that live inside each layer of the network.<br/>
A4:**True**

Q5:As a network is trained, the weight values inside the network are updated in such a way that the loss function output is minimized.<br/>
A5:**True**

Q6:For a convolutional layer, the weight values live inside the filters, and when implemented in code, the filters are actually represented using ____________<br/>
A6:**a single tensor**

Q7:In neural network programming, all convolutional filters for a given conv layer are represented using a single tensor.<br/>
A7:**True**

Q8:In neural network programming, the filter tensor, a.k.a. weight tensor, of a given conv layer has a depth that accounts for the input channels.<br/>
A8:**True**

Q9:Inside a convolutional layer, the depth dimension of the convolutional filter corresponds to the number of _______________ coming into the layer.<br/>
A9:**input channels**


## 04 Callable Neural Networks - Linear Layers in Depth

In [5]:
import torch

In [6]:
in_features = torch.tensor([1,2,3,4], dtype=torch.float32)

weight_matrix = torch.tensor([
    [1,2,3,4],
    [2,3,4,5],
    [3,4,5,6]
], dtype=torch.float32)


In [7]:
weight_matrix.matmul(in_features)

tensor([30., 40., 50.])

In [8]:
import torch.nn as nn

In [9]:
fc = nn.Linear(in_features=4,out_features=3)

In [10]:
fc(in_features)

tensor([-0.4863, -2.2903,  0.7294], grad_fn=<AddBackward0>)

We can call the object instance like this because PyTorch neural network modules are <code>callable Python objects</code>. We'll look at this important detail more closely in a minute, but first, check out this output. We did indeed get a 1-dimensional tensor with three elements. However, different values were produced.

This is because PyTorch creates a weight matrix and initializes it with random values. This means that the linear functions from the two examples are different, so we are using different function to produce these outputs.

**Remember the values inside the weight matrix define the linear function.** This demonstrates how the network's mapping changes as the weights are updated during the training process.

In [11]:
fc.weight = nn.Parameter(weight_matrix)

In [12]:
fc(in_features)

tensor([30.3760, 40.4606, 49.9037], grad_fn=<AddBackward0>)

This time we are much closer to the <code>30, 40, and 50 </code>values. **However**, we're exact. Why is this? We'll, this is not exact because the linear layer is adding a <code>bias </code>tensor to the output. Watch what happens when we turn the bias off. We do this by passing a False flag to the constructor.

In [13]:
fc = nn.Linear(in_features=4, out_features=3, bias=False)

In [14]:
fc.weight = nn.Parameter(weight_matrix)

In [15]:
fc(in_features)

tensor([30., 40., 50.], grad_fn=<SqueezeBackward3>)

What makes this possible is that PyTorch module classes implement another special Python function called <code>__call__()</code>. If a class implements the <code>__call__()</code> method, the special call method will be invoked anytime the object instance is called.

This fact is an important PyTorch concept because of the way the <code>__call__() </code>method interacts with the <code>forward()</code> method for our layers and networks.

Instead of calling the <code>forward()</code> method directly, we call the object instance. After the object instance is called, the <code>__call__()</code> method is invoked under the hood, and the <code> __call__() </code> in turn invokes the <code>forward()</code> method. This applies to all PyTorch neural network modules, namely, networks and layers.

<code>
# torch/nn/modules/module.py (version 1.0.1)

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        hook(self, input)
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
    else:
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            raise RuntimeError(
                "forward hooks should never return any values, but '{}'"
                "didn't return None".format(hook))
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
            else:
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
</code>

The extra code that PyTorch runs inside the <code>__call__() </code>method is why we never invoke the <code>forward()</code> method directly. If we did, the additional PyTorch code would not be executed. As a result, any time we want to invoke our <code>forward()</code> method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.

## Quiz 04
Q1:When input features are received by a linear layer, they are received in the form of a flattened 1-dimensional tensor and are then multiplied by______ <br/>
A1:the weight matrix

Q2:Suppose we have the code below. What will be the output of the following tensor operation?<br/>
A2:tensor([30., 40., 50.])

Q3:Linear layers map an <code>in_feature</code> space to an <code>out_feature</code> space using a weight matrix.<br/>
A3:True

Q4:he weight matrix inside a linear layer is initialized with random values.<br/>
A4:True

Q5:The linear layer operation can be expressed mathematically as $ y = Ax + b $ <br/>In this equation, which symbol represents the weight matrix?<br/>
A5:A
