# PyTorch Basics - Stacking Layers

By [Akshaj Verma](https://akshajverma.com/)

This notebook takes you through the 4 ways of stacking layers in a neural net namely, `nn.Sequential`, `nn.Module`, `nn.ModuleList`, and `nn.ModuleDict`. 

## `nn.Module` vs `nn.Functional`

PyTorch has 2 major ways of defining layers. We can either use `nn.Module` or `nn.Functional`. `nn.Module` is the object oriented way of defining an architecture while `nn.Functional` is the functional way.  

The major difference between the two is the ability to maintain a state. In case of a layer such as convolutional or a feedforward layer which has trainable parameters in the form of weights and biases, we use the object oriented way because we need to track those (parameters). While in the case of layers such as dropout and activation functions, which do not have any trainable parameters, we go for the functional way. Note that, we can still define things like dropout or activations using the object oriented way. 

`nn.Module` is stateful while `nn.Functional` is stateless. `nn.Module` uses `nn.Functional` under the hood for lot of operations such as [activation functions](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/activation.py) with the added advantage of initilializing and maintaining parameters for you. 


For more detailed information about nn.Module vs nn.Functional, have a look at [this answer](https://discuss.pytorch.org/t/beginner-should-relu-sigmoid-be-called-in-the-init-method/18689/6).


`torch.nn.Module` => [What is torch.nn?](https://pytorch.org/tutorials/beginner/nn_tutorial.html)

Now, let's dig into the different between `nn.Module`, `nn.Sequential`, `nn.ModuleList`, and `nn.ModuleDict` and how to use them. We will take the example of a CNN here BECAUSE WHY NOT. 

Task: Assume we are creating a neural net for image classification where the input is of the shape `[28 x 28 x 1]`. (*MNIST much?*)

We will define a network with with minimum PyTorch abstractions and gradually make it better and better.

# Import libraries

In [1]:
import torch.nn as nn

# `nn.Module`

The easiest way of defining a neural net in PyTorch is using the `nn.Module`. Let's do that now. 

In [2]:
class ModuleClassifier(nn.Module):
    def __init__(self, num_classes=10):
        super(ModuleClassifier, self).__init__()
        
        # Block 1
        self.conv_layer_1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.batchnorm_layer_1 = nn.BatchNorm2d(num_features=16)
        self.dropout_layer_1 = nn.Dropout2d(p=0.1)
        self.maxpool_layer_1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Block 2
        self.conv_layer_2 = nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2)
        self.batchnorm_layer_2 = nn.BatchNorm2d(32)
        self.dropout_layer_2 = nn.Dropout2d(0.2)
        self.maxpool_layer_2 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Block 3
        self.conv_layer_3 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2)
        self.batchnorm_layer_3 = nn.BatchNorm2d(64)
        self.dropout_layer_3 = nn.Dropout2d(0.3)
        self.maxpool_layer_3 = nn.MaxPool2d(kernel_size=2, stride=2)
        

        self.relu_layer = nn.ReLU()
        self.fc = nn.Linear(9*64, num_classes)
    
    def forward(self, x):
        # Block 1
        x = self.conv_layer_1(x)
        x = self.batchnorm_layer_1(x)
        x = self.maxpool_layer_1(x)
        x = self.dropout_layer_1(x)
        
        # Block 2
        x = self.conv_layer_2(x)
        x = self.batchnorm_layer_2(x)
        x = self.maxpool_layer_2(x)
        x = self.dropout_layer_2(x)
        
        # Block 3
        x = self.conv_layer_3(x)
        x = self.batchnorm_layer_3(x)
        x = self.maxpool_layer_3(x)
        x = self.dropout_layer_3(x)
        
        
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        
        return out

In [3]:
model_with_module = ModuleClassifier()
print(model_with_module)

ModuleClassifier(
  (conv_layer_1): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (batchnorm_layer_1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout_layer_1): Dropout2d(p=0.1, inplace=False)
  (maxpool_layer_1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_layer_2): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (batchnorm_layer_2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout_layer_2): Dropout2d(p=0.2, inplace=False)
  (maxpool_layer_2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_layer_3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (batchnorm_layer_3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout_layer_3): Dropout2d(p=0.3, inplace=False)
  (maxpool_layer_3): MaxPool2d(kernel_size=2, stride=2, padding=0, dila

The issue with the above code is apparent. It would be super cumbersome to add new layers with this method because  we'll have to initialize a layer in the `__init__()` method and use it in the `forward()` method. 


Let's use Sequential to mitigate some of the issues in the section below. 

# `nn.Sequential`

In [4]:
class SequentialClassifier(nn.Module):
    def __init__(self, num_classes=10):
        super(SequentialClassifier, self).__init__()
        
        # Block 1
        self.block1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.Dropout2d(0.1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        
        # Block 2
        self.block2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.Dropout2d(0.2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        
        # Block 3
        self.block3 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(64),
            nn.Dropout2d(0.2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        
        self.fc = nn.Linear(9*64, num_classes)
        
    def forward(self, x):
        out = self.block1(x)
        out = self.block2(out)
        out = self.block3(out)
        
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        
        return out

In [5]:
model_with_sequential = SequentialClassifier()
print(model_with_sequential)

SequentialClassifier(
  (block1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.1, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.2, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block3): Sequential(
    (0): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.2, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, c

`nn.Sequential` has enabled us to abstract away repeating layers into chunks of code. Not to mention, there's significantly less code in the `forward()` method. 


In the above code, we're still repeating a lot of code by defining and calling all the different lays. If you look closely, all three chunks contain similar layers with different values; namely - `nn.Conv2d` => `BatchNorm2d` => `nn.Dropout` => `nn.ReLU` => `nn.MaxPool2d`. We can abstract away this in a method which returns a `nn.Sequential` object. Let's see how to do that. 

In [6]:
class BetterSequentialClassifier(nn.Module):
    def __init__(self, num_classes=10):
        super(BetterSequentialClassifier, self).__init__()
        
        # Block 1
        self.block1 = self.cnn_chunk(channel_ip=1, channel_op=16, dropout_val=0.1, 
                                     kernel_size=5, stride=1, padding=2
                                    )
        
        # Block 2
        self.block2 = self.cnn_chunk(channel_ip=16, channel_op=32, dropout_val=0.2, 
                                     kernel_size=5, stride=1, padding=2
                                    )        
        
        # Block 3
        self.block3 = self.cnn_chunk(channel_ip=32, channel_op=64, dropout_val=0.3, 
                                     kernel_size=5, stride=1, padding=2
                                    )
        
        self.fc = nn.Linear(9*64, num_classes)
        
    def forward(self, x):
        out = self.block1(x)
        out = self.block2(out)
        out = self.block3(out)
        
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        
        return out
    
    
    def cnn_chunk(self, channel_ip, channel_op, dropout_val, **kwargs):
        
        # Yes, chonker like a fat cat. 
        chonker = nn.Sequential(
            nn.Conv2d(channel_ip, channel_op, **kwargs),
            nn.BatchNorm2d(channel_op),
            nn.Dropout2d(dropout_val),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        return chonker

In [7]:
better_model_with_sequential = BetterSequentialClassifier()
print(better_model_with_sequential)

BetterSequentialClassifier(
  (block1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.1, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.2, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (block3): Sequential(
    (0): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Dropout2d(p=0.3, inplace=False)
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilatio

Let's take it up a notch. Let's define a classifier based on the following architecture where p=padding, k=kernel, s=stride.


`Input` - `(28, 28, 1)` ==========> 

`p=2,k=3,s=1` - `Conv1 + BatchNorm` - `(28, 28, 8)`

`p=3,k=5,s=1` - `Conv2 + Dropout` - `(28, 28, 16)`

`p=0,k=2,s=3` - `Maxpool` - `(14, 14, 16)`

`p=2,k=3,s=1` - `Conv3 + BatchNorm` - `(14, 14, 32)`

`p=2,k=3,s=1` - `Conv4 + Dropout` - `(14, 14, 64)` ==========> `FC` - `(14 * 14 * 64, 1)`

In [20]:
class EvenBetterSequentialClassifier(nn.Module):
    def __init__(self, num_classes=10):
        super(EvenBetterSequentialClassifier, self).__init__()
        

        self.block1 = nn.Sequential(
            self.conv_batchnorm(channel_ip=1, channel_op=8, padding=2, kernel_size=3, stride=1),
            self.conv_dropout(channel_ip=8, channel_op=16,  dropout_val=0.1, padding=3, kernel_size=5, stride=1)
        )
        
        self.block2 = nn.Sequential(
            self.conv_batchnorm(channel_ip=16, channel_op=32, padding=2, kernel_size=3, stride=1),
            self.conv_dropout(channel_ip=32, channel_op=64, dropout_val=0.2, padding=2, kernel_size=3, stride=1)
        )
        
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=3)
        self.linear = nn.Linear(in_features=14*14*64, out_features=num_classes)
        
    
    def forward(self, x):
        x = self.block1(x)
        x = self.maxpool(x)
        x = self.block2(x)
        
        x = x.view(x.size(0), -1)
        
        out = self.linear(x)
        
        return out
        
   
    def conv_batchnorm(self, channel_ip, channel_op, **kwargs):
        chonker = nn.Sequential(
            nn.Conv2d(in_channels=channel_ip, out_channels=channel_op, **kwargs),
            nn.ReLU(),
            nn.BatchNorm2d(channel_op)
        )
        
        return chonker
    
    
    def conv_dropout(self, channel_ip, channel_op, dropout_val, **kwargs):
        chonker = nn.Sequential(
            nn.Conv2d(in_channels=channel_ip, out_channels=channel_op, **kwargs),
            nn.ReLU(),
            nn.Dropout2d(dropout_val)
        )
        
        return chonker

In [21]:
even_better_model_with_sequential = EvenBetterSequentialClassifier()
print(even_better_model_with_sequential)

EvenBetterSequentialClassifier(
  (block1): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
      (1): ReLU()
      (2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): Sequential(
      (0): Conv2d(8, 16, kernel_size=(5, 5), stride=(1, 1), padding=(3, 3))
      (1): ReLU()
      (2): Dropout2d(p=0.1, inplace=False)
    )
  )
  (block2): Sequential(
    (0): Sequential(
      (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
      (1): ReLU()
      (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): Sequential(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
      (1): ReLU()
      (2): Dropout2d(p=0.2, inplace=False)
    )
  )
  (maxpool): MaxPool2d(kernel_size=2, stride=3, padding=0, dilation=1, ceil_mode=False)
  (linear): Linear(in_features=12544, out_features=10, bias=True)
)


`nn.Sequential` is a powerful tool to use as you've seen above.

# `nn.ModuleList`

ModuleList helps