### MobileNetV2

[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/pdf/1801.04381)

*2024/12/05*

Google discovered that the ReLU results in significant information loss in low-dimensional spaces, which prevents the model from making full use of features. To address this issue, they added a pointwise convolution layer before the depthwise convolution to increase the dimensionality of the features. This allows the model to extract features more effectively in a higher-dimensional space. After the depthwise convolution, another pointwise convolution is used to reduce the dimensionality normally. Additionally, a linear activation function is applied at the end to minimize information loss. 

$$
\text{MobileNet} \qquad\qquad\qquad \rightarrow \overset{3 \times 3}{\text{DW}} \xrightarrow{\text{ReLU}} \overset{1 \times 1}{\text{PW}} \xrightarrow{\text{ReLU}}
$$
$$
\text{MobileNetV2} \rightarrow \overset{1 \times 1}{\text{PW}} \xrightarrow{\text{ReLU6}} \overset{3 \times 3}{\text{DW}} \xrightarrow{\text{ReLU6}} \overset{1 \times 1}{\text{PW}} \xrightarrow{\text{Linear}}
$$
Furthermore, the model used shortcut connections between the original input and the final output of this inverted residual structure. Another thing worth mentioning is the use of ReLU6 instead of ReLU. ReLU6 clips the output to a maximum value of 6, addressing the issue of overflow in low-precision floating-point representations.

*Code*

In [6]:
import torch
import torch.nn as nn
from torchsummary import summary


class InvertedResidual(nn.Module):
    def __init__(self, in_channels, out_channels, stride, expansion_factor):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        hidden_dim = round(in_channels * expansion_factor)

        layers = []
        if expansion_factor != 1:
            # pw
            layers.append(nn.Conv2d(in_channels, hidden_dim, kernel_size=1, bias=False))
            layers.append(nn.BatchNorm2d(hidden_dim))
            layers.append(nn.ReLU6(inplace=True))
        
        # dw
        layers.append(nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, stride=stride, padding=1, groups=hidden_dim, bias=False))
        layers.append(nn.BatchNorm2d(hidden_dim))
        layers.append(nn.ReLU6(inplace=True))
        
        # pw
        layers.append(nn.Conv2d(hidden_dim, out_channels, kernel_size=1, bias=False))
        layers.append(nn.BatchNorm2d(out_channels))

        self.conv = nn.Sequential(*layers)
        self.use_res_connect = self.stride == 1 and in_channels == out_channels

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, dropout=0.2):
        super(MobileNetV2, self).__init__()
        
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU6(inplace=True),
            
            InvertedResidual(32, 16, stride=1, expansion_factor=1),
            InvertedResidual(16, 24, stride=2, expansion_factor=6),
            InvertedResidual(24, 24, stride=1, expansion_factor=6),
            InvertedResidual(24, 32, stride=2, expansion_factor=6),
            InvertedResidual(32, 32, stride=1, expansion_factor=6),
            InvertedResidual(32, 32, stride=1, expansion_factor=6),
            InvertedResidual(32, 64, stride=2, expansion_factor=6),
            InvertedResidual(64, 64, stride=1, expansion_factor=6),
            InvertedResidual(64, 64, stride=1, expansion_factor=6),
            InvertedResidual(64, 64, stride=1, expansion_factor=6),
            InvertedResidual(64, 96, stride=1, expansion_factor=6),
            InvertedResidual(96, 96, stride=1, expansion_factor=6),
            InvertedResidual(96, 96, stride=1, expansion_factor=6),
            InvertedResidual(96, 160, stride=2, expansion_factor=6),
            InvertedResidual(160, 160, stride=1, expansion_factor=6),
            InvertedResidual(160, 160, stride=1, expansion_factor=6),
            InvertedResidual(160, 320, stride=1, expansion_factor=6),
            
            nn.Conv2d(320, 1280, kernel_size=1, bias=False),
            nn.BatchNorm2d(1280),
            nn.ReLU6(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1)),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout),
            nn.Linear(1280, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

In [7]:
model = MobileNetV2().cuda()
x = torch.randn(1, 3, 224, 224).cuda()
summary(model, (x.squeeze(dim=0).shape))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 112, 112]             864
       BatchNorm2d-2         [-1, 32, 112, 112]              64
             ReLU6-3         [-1, 32, 112, 112]               0
            Conv2d-4         [-1, 32, 112, 112]             288
       BatchNorm2d-5         [-1, 32, 112, 112]              64
             ReLU6-6         [-1, 32, 112, 112]               0
            Conv2d-7         [-1, 16, 112, 112]             512
       BatchNorm2d-8         [-1, 16, 112, 112]              32
  InvertedResidual-9         [-1, 16, 112, 112]               0
           Conv2d-10         [-1, 96, 112, 112]           1,536
      BatchNorm2d-11         [-1, 96, 112, 112]             192
            ReLU6-12         [-1, 96, 112, 112]               0
           Conv2d-13           [-1, 96, 56, 56]             864
      BatchNorm2d-14           [-1, 96,