<h1>Building Neural Network with pytorch</h1>
<h3>Here model ==  network</h3>
<b>main goal:</b> model or approximate function that maps image inputs to correct output class<br>
The process to implement a NN below
<ol>
<li>Prepare the data</li>
<li><b>Build the model</b></li>
<li>Train the model</li>
<li>Analyze the models results</li>
</ol>
Source with table and more info: https://deeplizard.com/learn/video/IKOHHItzukk

In [39]:
#contains all the good stuff to make nn
import torch.nn as nn

<h3>Extend Pytorch nn.Module Class</h3>
create python class called Network<br>
extend Pytorch's nn.Module class<br>
define model layers:<br>
CNN with convolution layers and linear layers

<h3>Componets of class</h3>
<i>Methods</i> (code)<br>
<i>Attributes</i> (data)<br>
good quick overview of classes: https://stats.stackexchange.com/questions/154798/difference-between-kernel-and-filter-in-cnn/188216

The network class below extends the nn.Module class

Convolutional layers stride is not specified (defaults to (1,1))<br>
    -stride is how far to move filter after each operation<br>
    --(1,1)=> move one unit when moving right, one unit when moving down

In [40]:
class Network(nn.Module):#extending nn.Module class
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        '''
        Linear layers are fully connected
        aka: dense, fully connected layers
        Pytorch uses linear
        '''
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        #create output layer
        self.out = nn.Linear(in_features=60, out_features=10)
    
    def forward(self, t):
        #still need implementation
        return t 

<h2>Parameters</h2>
parameters are place holders for a value, while
<b>Arguments</b> are the actual value

Convolutional layer (3 parameters):<br>
<ul>
<li>in_channels</li>
<li>out_channels</li>
<li>kernel_size</li>
</ul>
Linear layer (2 parameters):<br>
<ul>
<li>in_features</li>
<li>out_features</li>
</ul>

<h3>Hyperparameters</h3>
values that are chose manual and arbitrarily<br>

we choose hyperparameter values mainly based on <b>trial and error</b> and increasingly by <b>utilizing values</b> that have proven to <b>work well in the past</b>.

<h4>Hyper parameters in CNNs</h4>
these are the hyperparameters we use
<ul>
<li>kernel_size (sets the height and width)</li>
<li>out_channels (sets depth of filter, # of kernels inside filter. One kernel produces one output channel)</li>
<li>out_features (sets the size of the output tensor)</li>
</ul>

<h3>Data Dependent Hyperparameters</h3>
these parameters values are depended on the data
<ul>
<li>in_channels (depends on number of color channels)</li>
<li>out_features (depends on number of classes)</li>
<li>in_features (depends on the output from prev layer)</li>
</ul>

<h4>SIDENOTE: Kernel vs Filter</h4>
<i>Kernel:</i> 2d tensor<br>
<i>Filter</i> 3d tensor containing collection of kernels
More info: https://stats.stackexchange.com/questions/154798/difference-between-kernel-and-filter-in-cnn/188216

<h3>Learnable Parameters</h3>
parameters whos <i>values are learned</i> during training<br><br>
Generally we <i>start with <b>random values</b></i> that are <i>updated iteratively</i> as network learns<br><br>
When a network is <b>learning</b> it is simply <b>updating learnable parameters</b> to find values that <i>minimize</i> the loss function


<h4>Make an instance of network class and inspect weights</h4>

In [41]:
network = Network()

In [42]:
print(network)

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


<h4>Accessing the Network's Layers</h4>
For more info and possible customizations: https://deeplizard.com/learn/video/stWU37L91Yc

In [43]:
network.conv1

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))

In [44]:
network.fc2

Linear(in_features=120, out_features=60, bias=True)

In [45]:
network.out

Linear(in_features=60, out_features=10, bias=True)

In [46]:
#Accessing Layer Weights
network.conv1.weight

Parameter containing:
tensor([[[[-1.7289e-01, -3.6889e-02, -2.2785e-02, -1.4952e-01,  1.9414e-01],
          [ 2.1457e-02,  1.4025e-01, -2.9079e-02, -1.3774e-01,  1.3644e-01],
          [ 1.2203e-01,  7.5122e-02,  3.5353e-02, -1.6008e-02,  1.7124e-01],
          [-1.5510e-01, -6.4722e-02,  1.0430e-01, -1.1301e-01, -8.7242e-02],
          [ 7.7930e-02, -1.6106e-01, -1.3908e-01, -1.0758e-01, -3.7452e-02]]],


        [[[ 5.2286e-02,  5.5730e-02, -7.9570e-02,  8.6542e-02,  2.2734e-03],
          [-7.0686e-02,  1.8413e-01,  4.0059e-03, -1.4064e-01, -1.4402e-01],
          [-4.4736e-02,  1.9903e-01,  1.2061e-01, -1.1757e-01,  1.8986e-01],
          [ 5.9690e-02,  1.5190e-01,  1.3827e-01,  6.6847e-02,  2.3341e-02],
          [-1.9387e-01, -7.4476e-03,  5.3592e-02, -1.1056e-01,  1.4097e-01]]],


        [[[-1.7661e-01,  2.3881e-03, -4.2553e-03,  1.3229e-01, -4.7485e-02],
          [ 9.2290e-02, -1.2766e-01,  1.9569e-01, -4.7710e-02, -1.8479e-01],
          [ 9.9628e-02, -1.3248e-01, -4.1997e-

<h4>Weight Tensor shape</h4>
CONVOLUTIONAL LAYERS
<ol>
<li>All filters are represented using a single tensor</li>
<li>Filters have depth that accounts for the input channels(number of input channels being convolved)</li>
</ol>

In [47]:
#see the shape 
#rank 4 weight tensor 
    #first axis has a length of 6 (6 filters)
    #second axis length 1 (input channel)
    #third and fourth axis (height and width)
network.conv1.weight.shape

torch.Size([6, 1, 5, 5])

In [48]:
#notice input channel is 6 here
network.conv2.weight.shape

torch.Size([12, 6, 5, 5])

FULLY CONNECTED (LINEAR) LAYERS<br>Flattened rank-1 tensors as input and as output<br>
<b>transform</b> the <b>in_features</b> to the <b>out_features</b> in a linear layer is by <b>using</b> a rank-2 tensor that is commonly called a <b>weight matrix</b>.

note: height = out_features, width = in_features

In [50]:
network.fc1.weight.shape

torch.Size([120, 192])

In [51]:
network.fc2.weight.shape

torch.Size([60, 120])

In [52]:
network.out.weight.shape

torch.Size([10, 60])

Final note
To see all the parameters/ weights of the network 

In [None]:
for param in network.parameters():
    print(param.shape)

In [None]:
for name, param in network.named_parameters():
    print(name, 't\t', param.shape)