### 关于MTCNN（Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks）
* 地址：https://github.com/kpzhang93/MTCNN_face_detection_alignment
* 这个方法很经典，也很老

* protobuf，是google开发的序列化框架，通过该框架可以对数据进行序列化并进行传输
* caffemodel是caffe用来存储模型的文件格式，二进制权重存储格式
* prototxt是caffe的网络图结构存储的文件格式，文本格式存储，通过按照图结构读取后，使用caffemodel填充对应权重，进而可以推理

# 注意点
1. 模型是matlab训练的，因此图像的格式是列排列（也就是需要转置），而坐标的cellx, celly同样需要转置对调
2. 图像的通道顺序是RGB

# 过程
1. 数据排列格式，和通道顺序
    - 排列格式有：BCHW，BHWC、HWC等各种方式
        - 主要指数据在内存上的存储方式
    - 通道顺序有：BGR、RGB等
        - 主要指排列格式中C维度如果是3的情况下，顺序应该是什么样
1. nms，非极大值抑制(Non-Maximum Suppression)
    - 对于很多目标而言，需要保留局部极值的时候，称之为非极大值抑制。例如检测器中遇到的框过滤
    - ![](nms.png)
2. MaxPooling，PReLU
    - 最大池化
    - ReLU改进版本的PReLU，在ReLU的负数区域乘以系数，而非直接置0。并且系数是可学习的参数
3. FCN，全卷积，卷积划窗的道理
    - 对于全卷积而言，输出的featuremap大小，会随着图像大小变化而变化。也因此可以利用该原理实现划窗
4. 实现一个PNet的推理
5. CropResizeTo实现
    - 使用warpAffine
    - 使用坐标计算和resize

* 复现它
* [网络可视化工具Netron](https://github.com/lutzroeder/Netron)

In [1]:
import mtcnn.caffe_pb2 as pb

net = pb.NetParameter()
with open("mtcnn/det1.caffemodel", "rb") as f:
    net.ParseFromString(f.read())

In [2]:
net.layer[4].blobs[0].data

[-0.08164715766906738, -0.015041245147585869, 0.12457386404275894, 0.8359154462814331, 0.23798520863056183, -0.41081759333610535, 1.246496558189392, -1.3452640771865845, -0.5807721018791199, -0.2663098871707916, 0.29752546548843384, 0.10301049053668976, 0.8521553874015808, 0.32018500566482544, -0.33211761713027954, 0.5174388885498047, -1.0906494855880737, -0.49633175134658813, -0.13688956201076508, 0.10161298513412476, -0.17933745682239532, 0.016411680728197098, 0.4359000027179718, -0.10341845452785492, 0.03367944434285164, -0.21637864410877228, 0.05970483273267746, 0.48790067434310913, 0.32513341307640076, 0.7851399779319763, -0.8748639822006226, -0.7899603247642517, 0.2713397145271301, -2.402355670928955, -0.7243988513946533, 0.2544996440410614, -0.49456819891929626, -0.13129429519176483, -0.2500744163990021, 0.27103665471076965, -0.31626617908477783, -0.1839706003665924, 1.1370490789413452, 0.5234616994857788, 0.6913951635360718, -0.29579049348831177, 0.2303909808397293, -0.60144764

In [12]:
for layer in net.layer:
    for _ in range(len(layer.blobs)):
        layer.blobs.pop(0)

net

name: "12Net"
layer {
  name: "data12"
  type: "HDF5Data"
  top: "label12face"
  top: "data12face"
  include {
    phase: TRAIN
  }
  phase: TRAIN
  hdf5_data_param {
    source: "/home_sdb/kpzhang/WIDER07/data12-face4/train-12rgb.txt"
    batch_size: 2000
  }
}
layer {
  name: "slicer_label"
  type: "Slice"
  bottom: "label12face"
  top: "label1"
  top: "label2"
  top: "label3"
  phase: TRAIN
  slice_param {
    slice_point: 1
    slice_point: 5
    axis: 1
  }
}
layer {
  name: "label1_slicer_label_0_split"
  type: "Split"
  bottom: "label1"
  top: "label1_slicer_label_0_split_0"
  top: "label1_slicer_label_0_split_1"
  phase: TRAIN
}
layer {
  name: "silence"
  type: "Silence"
  bottom: "label3"
  phase: TRAIN
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data12face"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  phase: TRAIN
  convolution_param {
    num_output: 10
    kernel_size: 3
    stride: 1
