# 基于MobileNet V1实现分类任务

## MobileNet简介

MobileNet是用于移动和嵌入式视觉应用的高效模型。MobileNet基于流线型架构，该架构使用深度可分离卷积来构建轻量级深度神经网络。MobileNet模型引入了两个全局超参数（宽度因子和分辨率因子）来平衡模型处理数据的速度和模型的精度。基于对模型的处理速度和模型的精度的灵活的掌控，MobileNets可以适应非常广阔的应用场景，包括对象检测、细粒度分类、人脸属性和大规模地理定位。


## 算法解析

MobileNet的主要特点是高效和适用广泛。MobileNet模型的基础结构——深度可分离卷积，保证了模型的高效；设置宽度因子和分辨率因子保证了模型的适应性强。

### 深度可分离卷积

深度可分离卷积是可分解卷积的一种形式，将标注的卷积分解为深度卷积和逐点卷积。

![图1](./images/Depthwise_Separable_Conv.png)
<center><i>图1</i></center>

#### 深度卷积（Depthwise Conv）

深度卷积（如图1（b）所示）为每个输入通道应用一个滤波器，然后将每个输入通道与相应的滤波器进行卷积，最后将卷积输出堆叠在一起。深度卷价的计算成本为：

$$
Cost_{dw-conv} = D_K \cdot D_K \cdot M \cdot D_F \cdot D_F \tag{1}
$$

#### 逐点卷积（Pointwise Conv）

逐点卷积（如图1（c）所示）是一种使用1x1内核的卷积，内核的深度就等于图像的通道数。逐点卷积的计算成本为：

$$
Cost_{pw-conv} = M \cdot N \cdot D_F \cdot D_F \tag{2}
$$

标准卷积在一个步骤中同时执行通道和空间计算，而深度可分离卷积将计算分为两个步骤：深度卷积对每个输入通道应用一个卷积滤波器，逐点卷积将深度卷积的输出进行线性组合。标准卷积和深度可分离卷积的比较如图2所示。

![图2](./images/Depthwise_Separable_Structure.png)
<center><i>图2</i></center>

![图3](./images/Standard_Conv_vs_Depthwise_Separable_Conv.png)
<center><i>图3</i></center>

标准卷积的计算成本为：

$$
Cost_{standard-conv} = D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F \tag{3}
$$

其中，$D_K$是depthwise_conv的kernel size，$M$是输入的通道数，$N$是输入的batch size，$D_F$是feature map的size。

深度可分离卷积的计算成本为：

$$
Cost_{depthwise-separable-conv} = D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F \tag{4}
$$

综上所述，我们可以比较深度可分离卷积的计算成本与标准卷积的计算成本：

$$
\cfrac{Cost_{depthwise-separable-conv} }{Cost_{standard-conv}} = \cfrac{D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F}{D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F} = \cfrac{1}{N} + \cfrac{1}{D_K^2} \tag{5}
$$

一般数据集的batch_size较大，所以$\cfrac{1}{N}$对该比例的影响较小，因此该比例可以简化为：

$$
\cfrac{Cost_{depthwise-separable-conv} }{Cost_{standard-conv}} \approx \cfrac{1}{D_K^2} \tag{6}
$$

由于MobileNet的kernel size为3，也即使用深度可分离卷积的计算成本是标准卷积的1/9。与此同时，由于一般图片的空间位置高度相关，不同通道之间相对独立，所以这种方式对精度的影响非常的小。

### 模型缩放

MobileNet主要通过设置宽度因子和分辨率因子来对模型进行缩放。

#### 宽度因子

虽然MobileNet的基准模型已经足够小和足够快了。但是很多时候是一些特定的用例或应用程序可能要求模型更小更快。为了构建这些更小且计算量更少的模型，MobileNet引入了一个非常简单的参数α，我们称为宽度因子。宽度因子会缩短每层输入和输出的宽度，从而构建更小更快的模型。

考虑宽度因子 $\alpha$ 的深度可分离卷积的计算成本为：

$$
Cost_{\alpha-depthwise-separable-conv} = D_K \cdot D_K \cdot \alpha M \cdot D_F \cdot D_F + \alpha M \cdot \alpha N \cdot D_F \cdot D_F \tag{6}
$$

其中，$\alpha \in [0.25, 0.50, 0.75, 1.00]$。

#### 分辨率因子

分辨率因子是作用于每一个模块输入尺寸的约减因子，简单来说就是将输入数据以及由此在每一个模块产生的特征图都变小了，结合宽度因子 $\alpha$ 和分辨率因子 $\rho$ 的深度可分离卷积的计算成本为：

$$
Cost_{\alpha-\rho-depthwise-separable-conv} = D_K \cdot D_K \cdot \alpha M \cdot \rho D_F \cdot \rho D_F + \alpha M \cdot \alpha N \cdot \rho D_F \cdot \rho D_F
$$

其中，$\rho D_F \in [224, 192, 160, 128]$。


## 模型结构

下面我们通过MindSpore vision套件来剖析MobileNet的结构，相关模块在Vision套件中都有API可直接调用。

### ConvNormActivation结构

ConvNormActivation模块是所有卷积网络中最基础的模块，由一个卷积层（Conv, Depwise Conv），一个归一化层(BN)，一个激活函数组成。图2中可以套用这个结构的的小模块：Depwise Conv+BN+ReLU6，Pointwise Conv+BN+ReLU6。

In [None]:
from typing import Optional

from mindspore import nn

class ConvNormActivation(nn.Cell):
    """
    Convolution/Depthwise fused with normalization and activation blocks definition.
    """
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm: Optional[nn.Cell] = nn.BatchNorm2d,
                 activation: Optional[nn.Cell] = nn.ReLU
                 ) -> None:
        super(ConvNormActivation, self).__init__()
        padding = (kernel_size - 1) // 2
        # 设置和添加卷积层
        layers = [
            nn.Conv2d(
                in_planes,
                out_planes,
                kernel_size,
                stride,
                pad_mode='pad',
                padding=padding,
                group=groups
            )
        ]
        # 判断是否设置归一化层
        if norm:
            # 设置归一化层
            layers.append(norm(out_planes))
        # 判断是否设置激活函数
        if activation:
            # 设置激活函数
            layers.append(activation())

        self.features = nn.SequentialCell(layers)

    def construct(self, x):
        output = self.features(x)
        return output

### 基准模型结构

MobileNetV1的主体结构的各项参数如图4所示。

![图4](./images/MobileNet_BackBone_Architeture.png)
<center><i>图4</i></center>

根据图4的参数，我们构造了MobileNetVM的主体结构，如下面的代码所示。


In [None]:
from mindvision.classification.models.classifiers import BaseClassifier
from mindvision.classification.models.blocks import ConvNormActivation


class MobileNetV1(nn.Cell):
    """
    MobileNet V1 backbone.
    """

    def __init__(self, ):
        super(MobileNetV1, self).__init__()
        self.layers = [
            ConvNormActivation(3, 32, 3, 2, activation=nn.ReLU6),  # Conv0

            ConvNormActivation(32, 32, 3, 1, groups=32, activation=nn.ReLU6),  # Conv1_depthwise
            ConvNormActivation(32, 64, 1, 1, activation=nn.ReLU6),  # Conv1_pointwise
            ConvNormActivation(64, 64, 3, 2, groups=64, activation=nn.ReLU6),  # Conv2_depthwise
            ConvNormActivation(64, 128, 1, 1, activation=nn.ReLU6),  # Conv2_pointwise

            ConvNormActivation(128, 128, 3, 1, groups=128, activation=nn.ReLU6),  # Conv3_depthwise
            ConvNormActivation(128, 128, 1, 1, activation=nn.ReLU6),  # Conv3_pointwise
            ConvNormActivation(128, 128, 3, 2, groups=128, activation=nn.ReLU6),  # Conv4_depthwise
            ConvNormActivation(128, 256, 1, 1, activation=nn.ReLU6),  # Conv4_pointwise

            ConvNormActivation(256, 256, 3, 1, groups=256, activation=nn.ReLU6),  # Conv5_depthwise
            ConvNormActivation(256, 256, 1, 1, activation=nn.ReLU6),  # Conv5_pointwise
            ConvNormActivation(256, 256, 3, 2, groups=256, activation=nn.ReLU6),  # Conv6_depthwise
            ConvNormActivation(256, 512, 1, 1, activation=nn.ReLU6),  # Conv6_pointwise

            ConvNormActivation(512, 512, 3, 1, groups=512, activation=nn.ReLU6),  # Conv7_depthwise
            ConvNormActivation(512, 512, 1, 1, activation=nn.ReLU6),  # Conv7_pointwise
            ConvNormActivation(512, 512, 3, 1, groups=512, activation=nn.ReLU6),  # Conv8_depthwise
            ConvNormActivation(512, 512, 1, 1, activation=nn.ReLU6),  # Conv8_pointwise
            ConvNormActivation(512, 512, 3, 1, groups=512, activation=nn.ReLU6),  # Conv9_depthwise
            ConvNormActivation(512, 512, 1, 1, activation=nn.ReLU6),  # Conv9_pointwise
            ConvNormActivation(512, 512, 3, 1, groups=512, activation=nn.ReLU6),  # Conv10_depthwise
            ConvNormActivation(512, 512, 1, 1, activation=nn.ReLU6),  # Conv10_pointwise
            ConvNormActivation(512, 512, 3, 1, groups=512, activation=nn.ReLU6),  # Conv11_depthwise
            ConvNormActivation(512, 512, 1, 1, activation=nn.ReLU6),  # Conv11_pointwise

            ConvNormActivation(512, 512, 3, 2, groups=512, activation=nn.ReLU6),  # Conv12_depthwise
            ConvNormActivation(512, 1024, 1, 1, activation=nn.ReLU6),  # Conv12_pointwise
            ConvNormActivation(1024, 1024, 3, 1, groups=1024, activation=nn.ReLU6),  # Conv13_depthwise
            ConvNormActivation(1024, 1024, 1, 1, activation=nn.ReLU6),  # Conv13_pointwise
        ]

        self.features = nn.SequentialCell(self.layers)

    def construct(self, x):
        """Forward pass"""
        output = self.features(x)
        return output


def mobilenetv1(num_classes: int):
    backbone = MobileNetV1()
    head = nn.Dense(1024, num_classes)
    model = BaseClassifier(backbone, head)

    return model

## 模型训练与推理

本案例基于MindSpore-GPU版本，在单GPU卡上完成模型训练和验证。

首先导入相关模块，配置相关超参数并读取数据集，该部分代码在Vision套件中都有API可直接调用，详情可以参考以下链接：https://gitee.com/mindspore/vision 。

可通过:http://image-net.org/ 进行数据集下载。

加载前先定义数据集路径，请确保你的数据集路径如以下结构。

```text
.ImageNet/
    ├── ILSVRC2012_devkit_t12.tar.gz
    ├── train/
    ├── val/
    └── mobilenet_infer.png
```

### 模型训练

训练模型前，需要先按照论文中给出的参数设置损失函数，优化器以及回调函数，MindSpore Vision套件提供了提供了相应的接口，具体代码如下所示。


In [None]:
import argparse

from mindspore import context
from mindspore.common import set_seed
from mindspore.communication import init, get_rank, get_group_size
from mindspore.context import ParallelMode
from mindspore.train import Model
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor

from mindvision.classification.dataset import ImageNet
from mindvision.engine.loss import CrossEntropySmooth

set_seed(1)


def mobilenet_v1_train(args_opt):
    """MobileNetV1 train."""
    context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target)

    # Data Pipeline.
    if args_opt.run_distribute:
        init("nccl")
        rank_id = get_rank()
        device_num = get_group_size()
        context.set_auto_parallel_context(device_num=device_num,
                                          parallel_mode=ParallelMode.DATA_PARALLEL,
                                          gradients_mean=True)
        dataset = ImageNet(args_opt.data_url,
                           split="train",
                           num_parallel_workers=args_opt.num_parallel_workers,
                           shuffle=True,
                           resize=args_opt.resize,
                           num_shards=device_num,
                           shard_id=rank_id,
                           batch_size=args_opt.batch_size,
                           repeat_num=args_opt.repeat_num)
        ckpt_save_dir = args_opt.ckpt_save_dir + "_ckpt_" + str(rank_id) + "/"
    else:
        dataset = ImageNet(args_opt.data_url,
                           split="train",
                           num_parallel_workers=args_opt.num_parallel_workers,
                           shuffle=True,
                           resize=args_opt.resize,
                           batch_size=args_opt.batch_size,
                           repeat_num=args_opt.repeat_num)
        ckpt_save_dir = args_opt.ckpt_save_dir

    dataset_train = dataset.run()
    step_size = dataset_train.get_dataset_size()

    # Create model.
    network = mobilenetv1(args_opt.num_classes)

    # Set lr scheduler.
    if args_opt.lr_decay_mode == 'cosine_decay_lr':
        lr = nn.cosine_decay_lr(min_lr=args_opt.min_lr, max_lr=args_opt.max_lr,
                                total_step=args_opt.epoch_size * step_size, step_per_epoch=step_size,
                                decay_epoch=args_opt.decay_epoch)
    elif args_opt.lr_decay_mode == 'piecewise_constant_lr':
        lr = nn.piecewise_constant_lr(args_opt.milestone, args_opt.learning_rates)

    # Define optimizer.
    network_opt = nn.Momentum(network.trainable_params(), lr, args_opt.momentum)

    # Define loss function.
    network_loss = CrossEntropySmooth(sparse=True, reduction="mean", smooth_factor=args_opt.smooth_factor,
                                      classes_num=args_opt.num_classes)

    # Define metrics.
    metrics = {'acc'}

    # Set the checkpoint config for the network.
    time_cb = TimeMonitor(data_size=step_size)
    loss_cb = LossMonitor()
    cb = [time_cb, loss_cb]
    ckpt_config = CheckpointConfig(
        save_checkpoint_steps=step_size,
        keep_checkpoint_max=args_opt.keep_checkpoint_max)
    ckpt_cb = ModelCheckpoint(prefix="mobilenetv1", directory=ckpt_save_dir, config=ckpt_config)
    cb += [ckpt_cb]

    # Init the model.
    model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics=metrics)

    # Begin to train.
    model.train(args_opt.epoch_size,
                dataset_train,
                callbacks=cb,
                dataset_sink_mode=args_opt.dataset_sink_mode)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='MobileNetV1 train.')
    parser.add_argument('--device_target', type=str, default="GPU", choices=["Ascend", "GPU", "CPU"])
    parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
    parser.add_argument('--epoch_size', type=int, default=200, help='Train epoch size.')
    parser.add_argument('--keep_checkpoint_max', type=int, default=10, help='Max number of checkpoint files.')
    parser.add_argument('--ckpt_save_dir', type=str, default="./mobilenet_v1", help='Location of training outputs.')
    parser.add_argument('--num_parallel_workers', type=int, default=8, help='Number of parallel workers.')
    parser.add_argument('--batch_size', type=int, default=64, help='Number of batch size.')
    parser.add_argument('--repeat_num', type=int, default=1, help='Number of repeat.')
    parser.add_argument('--num_classes', type=int, default=1001, help='Number of classification.')
    parser.add_argument('--lr_decay_mode', type=str, default="cosine_decay_lr", help='Learning rate decay mode.')
    parser.add_argument('--min_lr', type=float, default=0.0, help='The minimum learning rate.')
    parser.add_argument('--max_lr', type=float, default=0.1, help='The maximum learning rate.')
    parser.add_argument('--decay_epoch', type=int, default=200, help='Number of decay epochs.')
    parser.add_argument('--milestone', type=list, default=None, help='A list of milestone.')
    parser.add_argument('--learning_rates', type=list, default=None, help='A list of learning rates.')
    parser.add_argument('--momentum', type=float, default=0.9, help='Momentum for the moving average.')
    parser.add_argument('--smooth_factor', type=float, default=0.1, help='Label smoothing factor.')
    parser.add_argument('--dataset_sink_mode', type=bool, default=True, help='The dataset sink mode.')
    parser.add_argument('--run_distribute', type=bool, default=True, help='Distributed parallel training.')

    args = parser.parse_known_args()[0]
    mobilenet_v1_train(args)

```text
epoch: 89 step: 1251, loss is 2.44095
Epoch time: 322114.519, per step time: 257.486
epoch: 90 step: 1251, loss is 2.2521682
Epoch time: 320744.265, per step time: 256.390
```


### 模型验证

模型验证过程与训练过程相似。不同的是验证过程不需要设置优化器，但是需要设置评价指标

调用ImageNet验证集数据的只需要将接口的split参数设置为"val"即可，具体代码如下所示。

In [None]:
import mindspore as ms


def mobilenet_v1_eval(args_opt):
    """MobileNetV1 eval."""
    context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target)

    # Data pipeline.
    dataset_path = args_opt.data_url

    dataset = ImageNet(dataset_path,
                       split="val",
                       num_parallel_workers=args_opt.num_parallel_workers,
                       resize=args_opt.resize,
                       batch_size=args_opt.batch_size)

    dataset_eval = dataset.run()

    # Create model.
    network = mobilenetv1(args_opt.num_classes)

    # Define loss function.
    network_loss = CrossEntropySmooth(sparse=True, reduction="mean",
                                      smooth_factor=args_opt.smooth_factor,
                                      classes_num=args_opt.num_classes)

    # Define eval metrics.
    eval_metrics = {'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),
                    'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}

    # Init the model.
    model = Model(network, network_loss, metrics=eval_metrics)
    param_dict = ms.load_checkpoint(args_opt.checkpoint_path)
    ms.load_param_into_net(model, param_dict)
    model.set_train(False)

    # Begin to eval
    result = model.eval(dataset_eval)
    print(result)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='MobileNetV1 eval.')
    parser.add_argument('--device_target', type=str, default="GPU", choices=["Ascend", "GPU", "CPU"])
    parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
    parser.add_argument('--checkpoint_path', required=True, default=None, help='Path of checkpoint file.')
    parser.add_argument('--num_parallel_workers', type=int, default=8, help='Number of parallel workers.')
    parser.add_argument('--batch_size', type=int, default=64, help='Number of batch size.')
    parser.add_argument('--num_classes', type=int, default=1001, help='Number of classification.')
    parser.add_argument('--smooth_factor', type=float, default=0.1, help='The smooth factor.')

    args = parser.parse_known_args()[0]
    mobilenet_v1_eval(args)

```text
{'Top_1_Accuracy': 0.71292, 'Top_5_Accuracy': 0.90112}
```

## 总结

本案例对MobileNet的论文中提出的深度可分离卷积结构和模型缩放算法进行了详细的解释，向读者完整地呈现了该算法的核心问题的解析。

同时，通过MindSpore Vision套件，剖析了MobileNetV1的主要模块和主体结构，还展示了MobileNetV1 模型在ImageNet数据上的训练，验证和推理的过程。

## 引用

[1] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
