In [1]:
#export
import numpy as np
import torch
import pandas as pd
from pathlib import Path
import imageio
from skimage import io, transform
import torchvision
from torchvision import transforms

from scripts.dataloader import Dataset, Transforms, Resize, ToTorch, Sampler, collate, DataLoader
from functools import partial

import torch.nn as nn
import torch.nn.functional as F

# Step 0 - Imports from the Part 1

Now that we have a dataloader, we can start to build model that uses that data.  Before we begin, let us create a dataset and dataloader from which we'll extract the minibatches required to train

In [2]:
df_path = r'data/processed_dataframe.csv'
img_col = 'filename'
cont_cols = ['followers', 'following', 'engagement_factor_std', 'month', 'year', 'day_name', 'hour']
cat_cols = []
target_col = 'engagement_factor_moving_avg'
image_path = Path(r'data/Images')
tfms = Transforms([Resize(256), ToTorch()])

ds_train = Dataset(df_path, 
                   img_col = img_col,
                   cont_cols = cont_cols, 
                   cat_cols = cat_cols, 
                   target_col = target_col, 
                   image_path = image_path, 
                   transforms = tfms)

dl_train = DataLoader(dataset = ds_train,
                      sampler = Sampler(ds_train, bs = 16),
                      collate_func = collate)

In [3]:
for i, (xb,yb) in enumerate(dl_train):
    print (f"Minibatch {i}, with target shape {yb.shape}")
    if i>5: break

Minibatch 0, with target shape torch.Size([16])
Minibatch 1, with target shape torch.Size([16])
Minibatch 2, with target shape torch.Size([16])
Minibatch 3, with target shape torch.Size([16])
Minibatch 4, with target shape torch.Size([16])
Minibatch 5, with target shape torch.Size([16])
Minibatch 6, with target shape torch.Size([16])


In [4]:
iterator = iter(dl_train)
xb, yb = next(iterator)
x_image, x_tab = xb
x_image = x_image.float()

In [5]:
x_image.shape, x_tab.shape, yb.shape

(torch.Size([16, 3, 256, 256]), torch.Size([16, 7]), torch.Size([16]))

This gives us a single batch to work with as we test our network.

# Step 1 - Investigating the components of an nn.Module

Pytorch models are created using nn.Module as a base class.  This imparts many features onto the model and its subcomponents, but at the core it allows a few things to happen:
- You can initialize the module with __init__ to store all the key variables as well as a call to the nn.Module.__init__ using super. That will set up many of the core instance variables of the module (including the parameters that will be tracked and updated during training)
- You can define how an input is processed by the model and what it outputs (should output a torch tensor).  This is the "forward" class method and we will use it extensively when constructing our models
- You can register forwards or backwards hooks (extra code that will be run during the forward or backwards phase).  These (and callbacks in general) will be the subject of a future instalment
- Other useful class methods such as zero_grad, or the ability to set the state as training or validation (eval)

Note: rather than have a backwards class, pytorch uses an autograd system to update the gradients for all the relevant parameters, and an optimizer class to update the parameters themselves.  We will go over these features in a future installment

For instance, one of the basic modules is the nn.Linear class, which performs the operation wx + b, where w is the weight parameter and b is the bias.  Not that for historical reasons the operation is performed as x.(w.T) + b, so the shape of the weights will be the opposite of what you expect

```python
class Linear(Module):
    r"""Applies a linear transformation to the incoming data: :math:y = xA^T + b"""
    __constants__ = ['bias', 'in_features', 'out_features']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )
```

The F.linear is defined as:


```python 
def linear(input, weight, bias=None):
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret

```

Which is essentially x.(w.T) + b for our purposes

Here we can see that initializing the module creates the appropriate weight and bias tensors, then initializes them using kaiming_uniform.  If you want to initialize your parameters in a different way, you can either modify them after the layer has been created (which we will discuss later), or create another Linear class that overwrites the reset_parameters class method.  

We can explore the input and output of the linear module using a test_batch

In [6]:
test_batch = torch.rand(16, 20)
test_linear_layer = nn.Linear(20, 5)
results = test_linear_layer(test_batch)
test_batch.shape, results.shape

(torch.Size([16, 20]), torch.Size([16, 5]))

We see that we input a tensor with batch size of 16 and with 20 elements, and that after going through the linear layer we are left with 5 elements from our of our 16 input samples, as expected

# Step 2 - nn.module Building Blocks



### Some useful modules 

The torch.nn.Module class provides  




- nn.Linear() - The basic building block of a fully-connected network.  It will take an input with shape (batch_size, in_features) and will produce an output with shape (batch_size, out_features)
    * in_features - the number of features coming into the module (independent of batch size) 
    * out_features - the number of desired features leaving the module
    * Note: nn.Linear will have up to two parameters:
        - A weight matrix of shape (out_features, in_features).
        - A bias vector of shape (out_features)


- nn.Conv2d() - The basic building block of a convolutional neural net.  It takes in an input with shape (batch_size, in_channels, height, width), and will produce an output with shape (batch_size, out_channels, height, width)
    * in_channels - The number of channels coming into the layer.  
    * out_channels - The number of channels leaving the layer
    * kernel_size,
    * stride=1,
    * padding=0,
    * dilation=1,
    * groups=1,
    * bias=True,
    * padding_mode='zeros',
    * Note: nn.Conv2d has up to two types of parameters:
        - The kernel with shape (out_channels, in_channels, kernel_size, kernel_size).  Another way to think about this is that there are "out_channel" kernels of shape (in_channels, kernel_size, kernel_size), which each produce a single channel of the output
        - A bias vector of shape (out_channels).  Each channel will receive a single bias, which will be applied to each pixel of the channel.
    
    
- nn.ReLU()


In [7]:
def param_shapes(module):
    for p in module.parameters():
        print (p.shape)

print ('Linear node')
param_shapes(nn.Linear(5, 10))
print('')

print ('Conv2d')
param_shapes(nn.Conv2d(10,20,3))

Linear node
torch.Size([10, 5])
torch.Size([10])

Conv2d
torch.Size([20, 10, 3, 3])
torch.Size([20])


We can also create our own modules by inheriting from the nn.Modules class.  The key requirements when inheriting from the nn.Modules class is that you call the superclass's initialization, and that you define a forward() class method.  Generally, this forward class method accepts a tensor as an input and returns a tensor as an output.  Thanks to the autograd feature of pytorch, you can do whatever you want within those two requirements as tensors will track their own grads throughout the process.  

A useful class to make is a Lambda module, which accepts a function during class initialization and applies that function to the input tensor.  

In [8]:
#export
class Lambda(nn.Module):
    def __init__(self, func):
        super(Lambda, self).__init__()
        self.func = func
        
    def forward(self, x):
        return self.func(x)
    
def flatten(x):
    return x.view(x.shape[0], -1)

def simulate_fc_output(x, n):
    return torch.rand((x.shape[0], n))
    


# Step 3 - Creating a Custom Network

## Building with the end in mind

We don't have all the components that we need for the final model, but we can construct our mixed model using some simulated components.  Those components are:
- The tabular model
- The CNN/image model
- The mixed model (that combines the two outputs)

Once we have all of these models, we can combine them into a MixedInputModel class, where we call each model sequentially and by passing in the appropriate inputs for each submodel

In [9]:
#export
class MixedInputModel(nn.Module):
    def __init__(self, cnn_model,  tabular_model, mixed_model):
        super(MixedInputModel, self).__init__()
        
        self.cnn_model = cnn_model
        self.tabular_model = tabular_model
        self.mixed_model = mixed_model
        
    
    def forward(self, x):
        #unpack the x_batch tuple into the image and tabular components
        x_image, x_tab = x
        x_image = x_image.float()
        x_tab = x_tab.float()
        
        #run each component seperately through their respective models
        cnn_output = self.cnn_model(x_image)
        tabular_output = self.tabular_model(x_tab)
        
        #concatenate the outputs from both networks and pass it through the mixed model output
        concat_outputs = torch.cat((cnn_output, tabular_output), dim = 1)
        mixed_model_output = self.mixed_model(concat_outputs) 
        
        return(mixed_model_output)

In [10]:
cnn_model = Lambda(partial(simulate_fc_output, n = 10))
tabular_model = Lambda(partial(simulate_fc_output, n = 5))
mixed_model = Lambda(partial(simulate_fc_output, n = 1))

In [11]:
cnn_output = cnn_model(x_image)
tabular_output = tabular_model(x_tab)
concat_outputs = torch.cat((cnn_output, tabular_output), dim = 1)
mixed_model_output = mixed_model(concat_outputs)

(cnn_output.shape, tabular_output.shape, concat_outputs.shape, mixed_model_output.shape)

(torch.Size([16, 10]),
 torch.Size([16, 5]),
 torch.Size([16, 15]),
 torch.Size([16, 1]))

In [12]:
final_model = MixedInputModel(cnn_model = cnn_model,
                              tabular_model = tabular_model,
                              mixed_model = mixed_model
                             )

In [13]:
final_model(xb).shape

torch.Size([16, 1])

That's all there is to mixing the two inputs!  Unfortunately, we won't get great results from this model since the submodels just create constant outputs.  We now need to create the three input models to replace our placeholder models: the cnn_model, the tabular_model, and the mixed_model

## A Simple Tabular Model

In [14]:
class BasicTabularModel(nn.Module):
    h1 = 10
    h2 = 5    
    
    def __init__(self, in_features, out_features):
        super(BasicTabularModel, self).__init__()
        self.layer_1 = nn.Linear(in_features, self.h1)
        self.layer_2 = nn.Linear(self.h1, self.h2)
        self.layer_3 = nn.Linear(self.h2, out_features)
        
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.layer_1(x)
        x = self.relu(x)
        
        x = self.layer_2(x)
        x = self.relu(x)
        
        x = self.layer_3(x)

        return x

In [15]:
bs, in_features = x_tab.shape
basic_model = BasicTabularModel(in_features, 1)

In [16]:
basic_model(x_tab)

tensor([[-11109.4121],
        [-46031.8281],
        [-13607.8604],
        [-11813.8057],
        [ -4535.3691],
        [ -1341.2977],
        [ -3102.7595],
        [-11813.0830],
        [ -2798.6069],
        [ -3874.8674],
        [ -9163.5762],
        [ -4943.7153],
        [ -7840.6665],
        [-46031.1914],
        [ -9162.2314],
        [ -3874.8804]], grad_fn=<AddmmBackward>)

Alternatively, we can define a complete model during the class initialization and then use that during the forward pass.  This approach will make things easier when we put everything together later on in the notebook.  We can also make the model more flexible by accepting the tabular models hidden and final layer sizes as inputs

In [17]:
#export
class TabularModel(nn.Module):    
    def __init__(self, layer_sizes):
        super(TabularModel, self).__init__()
        
        layers = []      
        
        for i in range(len(layer_sizes)-1):
            layers.append(nn.Linear(layer_sizes[i], layer_sizes[i+1]))
            layers.append(nn.ReLU())
        
        
        self.model = nn.Sequential(*layers[:-1]) #ignore the last nn.ReLU
        
    def forward(self, x):
        return self.model(x)

In [18]:
bs, in_features = x_tab.shape
tab_model = TabularModel([in_features, 10, 5, 1])

In [19]:
tab_model(x_tab)

tensor([[-11450.5615],
        [-47384.1445],
        [-14024.9053],
        [-12180.5898],
        [ -4680.1562],
        [ -1380.0170],
        [ -3189.9885],
        [-12180.7090],
        [ -2890.3040],
        [ -4001.6138],
        [ -9402.6016],
        [ -5090.2539],
        [ -8087.6436],
        [-47384.3242],
        [ -9402.9619],
        [ -4001.6382]], grad_fn=<AddmmBackward>)

Starting with a known number of inputs, we have created a tabular model that produces a single output value.  Notably, these values are much larger than the -1 to 1 range that we usually expect.  This is because the weights of the network have not been properly initialized.  We will handle network initialization later.  

## A Simple CNN Model

In [20]:
bs, c, w, h = x_image.shape
bs, c, w, h

(16, 3, 256, 256)

In [21]:
class SimpleCNNModel(nn.Module):
    c1 = 4
    c2 = 8
    c3 = 12
    
    def __init__(self):
        super(SimpleCNNModel, self).__init__()
        self.model = nn.Sequential(nn.Conv2d(3, self.c1, kernel_size = 3, padding = 1, stride = 2), nn.ReLU(),
                                  nn.Conv2d(self.c1, self.c2, kernel_size = 3, padding = 1, stride = 2), nn.ReLU(), 
                                  nn.Conv2d(self.c2, self.c3, kernel_size = 3, padding = 1, stride = 2), nn.ReLU())
        
    def forward(self, x):
        return self.model(x)

In [22]:
simple_cnn_model = SimpleCNNModel()
simple_cnn_model(x_image).shape

torch.Size([16, 12, 32, 32])

We see here that we have the desired 12 output channels. The height and width of our output image has changed.  After each convolutional layer, the output height and width will change in the following manner: 

$$ output\_size = \frac{input\_size + 2*padding - dilation*(kernel\_size-1) -1}{stride} + 1 $$

Although this is not something that we need to concern ourselves with during the input stage, it's necessary to understand how these values change so that when we want to unflatten the image into a fully connected layer, we know how many input layers there are.  Tracking the size also lets us customize the CNN component such that it brings the final image down to an appropriate size.  

## Adding in the Fully Connected Layers

In [23]:
#export 

# class Lambda(nn.Module):
#     def __init__(self, func):
#         super(Lambda, self).__init__()
#         self.func = func
        
#     def forward(self, x):
#         return self.func(x)

    
# def flatten(x):
#     return x.view(x.shape[0], -1)
    

class CNNModel(nn.Module):
    h = [5, 7, 10, 14] #hidden layer channels    
    def __init__(self, img_channels, img_size):
        super(CNNModel, self).__init__()
        self.img_channels, self.size = img_channels, img_size
        
        current_channels = img_channels
        output_size = img_size
        
        cnn_model_components = []
        for new_channels in self.h:
            layer, output_size = self.get_cnn_layer(current_channels, new_channels, 3, 1, 2, output_size)
            current_channels = new_channels
            cnn_model_components.append(layer)
        
        while output_size >5:
            layer, output_size = self.get_cnn_layer(self.h[-1], self.h[-1], 3, 1, 2, input_size = output_size)
            
        cnn_model_components.append(nn.AdaptiveAvgPool2d(1))
        cnn_model_components.append(Lambda(flatten))
        
        fcc_model_components = nn.Sequential(nn.Linear(self.h[-1], 20), nn.ReLU(),
                                            nn.Linear(20, 10), nn.ReLU(),
                                            nn.Linear(10, 1))      
        
        self.model = nn.Sequential(*cnn_model_components, fcc_model_components)
        
        
        
        
    def forward(self, x):
        return self.model(x)
        
    def get_cnn_layer(self, inp_chs, out_chs, kernel_size = 3, padding = 1, stride = 2, input_size = None):
        """
        This function acts as a default for 
        
        
        We can keep track of the final size of our network based on the initial size.  The formula for output
        size is the floor of O = ((Input_size + 2*padding - dilation*(kernel_size-1) -1)/stride) + 1.  For instance, with
        input size = 256, kernel_size = 3, padding = 2 and stride = 2, we get (256+2*2-1*(3-1)-1)/2 = 127.5, whose floor
        is 127.  We therefore expect the output height and width to be 127
        
        """
        
        cnn_layer = nn.Conv2d(inp_chs, out_chs, kernel_size, stride, padding,1)
        
        if input_size is None: return cnn_layer
        else:
            new_size = ((input_size + 2*padding - 1*(kernel_size - 1) -1)/stride + 1)//1
            return cnn_layer, new_size
        
        
        
        
            
    

In [24]:
cnn_model = CNNModel(3, h)
cnn_model

CNNModel(
  (model): Sequential(
    (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): Conv2d(5, 7, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (2): Conv2d(7, 10, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (3): Conv2d(10, 14, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (4): AdaptiveAvgPool2d(output_size=1)
    (5): Lambda()
    (6): Sequential(
      (0): Linear(in_features=14, out_features=20, bias=True)
      (1): ReLU()
      (2): Linear(in_features=20, out_features=10, bias=True)
      (3): ReLU()
      (4): Linear(in_features=10, out_features=1, bias=True)
    )
  )
)

In [25]:
predictions = cnn_model(x_image)
print (predictions,'\n')
print (predictions.shape)

tensor([[-0.1497],
        [-0.1461],
        [-0.1499],
        [-0.1471],
        [-0.1471],
        [-0.1465],
        [-0.1489],
        [-0.1483],
        [-0.1487],
        [-0.1499],
        [-0.1481],
        [-0.1490],
        [-0.1484],
        [-0.1472],
        [-0.1468],
        [-0.1491]], grad_fn=<AddmmBackward>) 

torch.Size([16, 1])


## Adapting a Pretrained Model

We are now able to output a single prediction from each image of our image batch.  However, although you can get great results from a simple tabular model, it's unlikely that we will be able to get good predictive results from a network this simple for image processing.  Fortunately, there are many existing cnn_models that we can use.  An added bonus is the ability to use pretrained networks, which allows us to benefit from weights that have been trained on huge datasets of images.  Although we may not ultimately use our model for the same application, many of the earlier kernels will provide detection of more universal components.  

Now that we know how to build a model from components, we can start to mix in these preconstructed and pretrained CNN models into our final model.  torchvision.models provides many models to choose from.  We will 

In [26]:
base_model = torchvision.models.resnet34(pretrained=True)
base_model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Going through the model, we see that there are four types of nn.Modules used.  Some are familiar to us and others are new.

Known:
- Conv2d
- Linear
- ReLU

New:
- BatchNorm2d
- BasicBlock

We can see from our base model that the final output is a fully-connected layer with 1000 outputs.  That is a very useful output for our purposes, since out final prediction requires only a single output.  That means we have a lot of flexibility between the 1000 inputs and the single output to implement our model.  Before we begin, let us test that this model handles our image batch correctly

In [27]:
results = base_model(x_image)
results.shape

torch.Size([16, 1000])

As expected, our output is of the form (batch_size, 1000).  That means we're good to go in terms of incorporating Resnet34 into our final cnn model.  As before, we'll add in some Linear layers in order to gradually bring the final number of nodes down to 1.

In [28]:
#export
class CustomResnet(nn.Module):
    def __init__(self, base_model, connected_layer_sizes):
        super(CustomResnet, self).__init__()
        #self.base_mode = base_model
        #self.model_outputs = model_outputs
        self.connected_layer_sizes = connected_layer_sizes
        
        connected_layers = []
        for i in range(len(connected_layer_sizes)-1):
            h1 = connected_layer_sizes[i]
            h2 = connected_layer_sizes[i+1]
            
            connected_layers.append(nn.Linear(h1, h2))
            connected_layers.append(nn.ReLU())
            
        connected_model = nn.Sequential(*connected_layers)
        
        
        self.model = nn.Sequential(base_model, connected_model)
        
    def forward(self, x):
        return self.model(x)
        

In [29]:
cnn_model_resnet = CustomResnet(torchvision.models.resnet34(pretrained = True), [1000, 50, 20, 1])
cnn_model_resnet(x_image)

tensor([[0.0000],
        [0.0174],
        [0.0665],
        [0.0586],
        [0.0000],
        [0.0862],
        [0.0000],
        [0.1083],
        [0.1043],
        [0.0000],
        [0.0604],
        [0.0000],
        [0.0099],
        [0.3023],
        [0.2319],
        [0.1932]], grad_fn=<ReluBackward0>)

## Side Note - Further Modifications

We were fortunate that the output of the base model was conveniently sized, but what if this was not the case?  For instance, if the model had a fully connected component that tailored the output to a very specific application.  Fortunately, we can 

In [30]:
for i, c in enumerate(base_model.children()):
    print('Child:', i)
    print (c, '\n')

Child: 0
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) 

Child: 1
BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 

Child: 2
ReLU(inplace=True) 

Child: 3
MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) 

Child: 4
Sequential(
  (0): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel

We can convert these children into a list and select only the children that we want.  Note that each nn.Sequential (or module) will be grouped as a single child.  Each of these children may have children of its own.  For instance, Child 4 has three children, each a BasicBlock.  Each of these basic blocks has 5 children of their own.

Groups of 

In [31]:
children = list(base_model.children())
new_model = nn.Sequential(*children[:-2])
results = new_model(x_image)
results.shape

torch.Size([16, 512, 8, 8])

In [32]:
new_model

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Con

By extracting children of this pretrained model, we can use only the components that we need.  Since this is a pretrained model, however, it makes more sense to cut out layers from the end rather than the beginning.  The middle layers have been trained to accept the inputs from their previous layers.  If those are gone you may lose the benefit of training.

# Step 4 - Putting it all Together

Now that we have all the components in place, we will construct our model from the following components:
- A tabular model (fully connected) that accepts the tabular inputs and outputs 4 numbers
- A CNN model that accepts the image inputs and outputs 10 nodes
- Another fully connected model that accepts the concatenated outputs from the tabular and cnn (14 total), and results in a single predictive output


In [33]:
bs, ch_img, h_img, w_img = x_image.shape
bs, tab_inputs = x_tab.shape

num_cnn_outputs = 10
num_tabular_outputs = 4

num_mixed_inputs = num_cnn_outputs + num_tabular_outputs


input_cnn_model = CustomResnet(torchvision.models.resnet34(pretrained = True), [1000,50,20, num_cnn_outputs])
input_tabular_model = TabularModel([tab_inputs, 10, num_tabular_outputs])
input_mixed_model = TabularModel([num_mixed_inputs, 7, 1])

mixed_model = MixedInputModel(input_cnn_model, input_tabular_model, input_mixed_model)


In [34]:
mixed_model(xb)

tensor([[2210.7942],
        [9096.5645],
        [2698.2668],
        [2344.1638],
        [ 908.9642],
        [ 277.5974],
        [ 628.1006],
        [2344.1633],
        [ 564.4508],
        [ 777.1080],
        [1837.3689],
        [ 992.0399],
        [1563.8647],
        [9096.4697],
        [1837.2316],
        [ 777.0434]], grad_fn=<AddmmBackward>)

If we enumerate through the children, we see that each of our models is a separate child

In [35]:
for i, c in enumerate(mixed_model.children()):
    print('Child:', i)
    print (c, '\n')

Child: 0
CustomResnet(
  (model): Sequential(
    (0): ResNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): 

# Step 5 - A Note on Initializing our Network

The first few cycles of training are crucial as inproper starting weights can cause the gradients to vanish (go to zero) or explode (go towards infinity).  This occurs when the outputs from each layer tend away from a mean of zero and a std of 1.  There are a few methods that models use to address this.  In many CNN models, there is a BatchNorm layer, which essentially resets the batch to have a mean of 0 and std of 1, then scales it using some learned parametes.

Another helpful way to reduce these issues is to ensure that you initialize your parameters correctly based on an understanding of how the mean and standard deviation change after each layer.  We saw earlier that the linear component of our network is initialized using kaiming_uniform and that the CNN component is built using transfer leaarning.  This suggests that we should be alright in terms of getting things trained.

If you do want to do your own initialization, you can either create your own versions of the layers (as we discussed earlier with the nn.Linear class and the reset_parameters() class method).  Alternatively, you can re-initialize any of the parameters once your model has been created.  

If we take our cnn_model, for example, we can look at all of the children it contains

In [36]:
cnn_model.model

Sequential(
  (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): Conv2d(5, 7, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (2): Conv2d(7, 10, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (3): Conv2d(10, 14, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (4): AdaptiveAvgPool2d(output_size=1)
  (5): Lambda()
  (6): Sequential(
    (0): Linear(in_features=14, out_features=20, bias=True)
    (1): ReLU()
    (2): Linear(in_features=20, out_features=10, bias=True)
    (3): ReLU()
    (4): Linear(in_features=10, out_features=1, bias=True)
  )
)

The last child (6) is the one we're interested in.  We can extract that by converting the children into a list and taking only the last one.

In [37]:
fcc_components = list(cnn_model.model.children())[-1]
all_children = list(fcc_components.children())

If we wanted to reinitialize all of the weights from this sections, we can access them through the .weight attribute for each child.

In [38]:
for c in all_children:
    print(c, end = ': ')
    if hasattr(c, 'weight'): print("Has a weight parameter")
    else: print("Does not have a weight parameter")

Linear(in_features=14, out_features=20, bias=True): Has a weight parameter
ReLU(): Does not have a weight parameter
Linear(in_features=20, out_features=10, bias=True): Has a weight parameter
ReLU(): Does not have a weight parameter
Linear(in_features=10, out_features=1, bias=True): Has a weight parameter


In [39]:
all_weights = []
for c in all_children:
    if hasattr(c, 'weight'): all_weights.append(c.weight)

Let's take a quick look at the original weights

In [40]:
all_weights

[Parameter containing:
 tensor([[-0.0086,  0.2395,  0.1832,  0.0250,  0.1219, -0.0510,  0.0065,  0.2465,
          -0.0095,  0.0606, -0.2030,  0.0678, -0.1657, -0.1405],
         [ 0.1449, -0.1771,  0.0197, -0.0646,  0.0891,  0.0353,  0.0960, -0.0454,
           0.0648,  0.1256, -0.0635,  0.1252,  0.2633,  0.1042],
         [ 0.1193,  0.1443, -0.2558, -0.2162,  0.1685,  0.2045, -0.0755,  0.0240,
           0.2509,  0.0289, -0.2601,  0.1930,  0.0518,  0.1046],
         [-0.1411, -0.2553,  0.2507,  0.2606, -0.1996,  0.0155, -0.1708,  0.2156,
          -0.1029, -0.1964,  0.2318,  0.0528,  0.0619,  0.0811],
         [ 0.0150,  0.0394,  0.0984, -0.1106,  0.2386, -0.0659, -0.1350,  0.1376,
          -0.0171,  0.0051, -0.2357, -0.1509, -0.0033, -0.0018],
         [ 0.2660, -0.1156,  0.1181,  0.1123,  0.0391, -0.2664, -0.0678, -0.0529,
           0.0100,  0.2422, -0.2064,  0.0430, -0.2255,  0.2311],
         [ 0.2470, -0.1270,  0.1958, -0.0086, -0.1246, -0.1845, -0.0673,  0.2231,
           0.

As an example, we can set reinitialize all of the weights to 1

In [41]:
for w in all_weights:
    nn.init.ones_(w)
all_weights

[Parameter containing:
 tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1

If we then go back to the original model, we can see that the parameters have indeed been changed!  The init.ones_ method is likely a poor choice for the weights matrix, but there are many others that exist in torch.init

In [42]:
list(cnn_model.parameters())[-5:]

[Parameter containing:
 tensor([-0.1895, -0.0506, -0.0046, -0.0836, -0.2458,  0.1487,  0.1709, -0.2504,
         -0.0911, -0.0970, -0.0023,  0.0088,  0.2596, -0.2128,  0.0139, -0.0676,
         -0.2070,  0.2148,  0.1451,  0.0008], requires_grad=True),
 Parameter containing:
 tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
          1., 1.],
         [1., 1., 

If you want, you can also set the parameters yourself or use your own custom function.

In [43]:
def set_to_zero(tensor):
    tensor.data = torch.zeros(tensor.shape)

with torch.no_grad():
    set_to_zero(all_weights[-3])
    #all_weights[-1] = torch.zeros(w, h)

In [44]:
list(cnn_model.parameters())[-6:]

[Parameter containing:
 tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0

# Step 6 - Using the GPU

So far, we have been using the CPU to make predictions using our models.  This is alright for single use

In [45]:
%timeit mixed_model(xb)

1.87 s ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [46]:
mixed_model.cuda()
xb = tuple(map(lambda x: x.cuda(), xb))
yb.cuda()

tensor([1.1538, 0.4972, 1.2127, 0.7262, 0.8618, 1.0807, 0.8220, 1.0631, 0.8150,
        0.8680, 1.1262, 0.9052, 1.2725, 0.6386, 0.5025, 0.8337],
       device='cuda:0')

In [47]:
%timeit mixed_model(xb)

16.9 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


For my machine, we see that the time for CPU inference is 1.87s and the time for GPU inference is 16.3ms.  That means that the GPU is almost 130 times faster for this model, which is why most of deep learning is performed on the GPU!

# Notebook Script Export

In [48]:
!python scripts/notebook2script.py "Part 2 - Model Construction and Initialization.ipynb" 'custom_models.py'

Converted Part 2 - Model Construction and Initialization.ipynb to scripts\custom_models.py
