# 模型介绍

论文复现第三期DEFORMABLE DETR方案， 精确度MAP为0.446。本项目的一个亮点是基于paddlepaddle c++探索了可变形transformer自定义算子的实现。

**DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION**

代码: [https://github.com/fundamentalvision/Deformable-DETR](https://github.com/fundamentalvision/Deformable-DETR)

论文地址: [https://arxiv.org/pdf/2010.04159.pdf](https://arxiv.org/pdf/2010.04159.pdf)

- 基于DETR，进行优化； DETR训练周期很长，比faster rcnn还慢10-20倍， 对小目标也不太优化
- DETR直接使用特征图进行训练，Deformable DETR使用注意力后的特征图进行训练（即每一个query搜索有效位置作为keys）
- 修改了key的提取方式，以及贡献图的生成方式，贡献图直接使用query的特征回归
- backbone使用resnext101-DCN-trick， 提升了对小目标识别的性能
- DETR的attention上权重过于均匀分布，导致需要训练更久去突出目标位置。所以本文对encoder在权重初始化时不再统一分布。deformable conv是一种更有效关注稀疏空间定位的方式， 对稀疏空间采样更友好， 在卷积上添加了位移变量， 这个变量根据数据的情况学习，偏移后，相当于卷积核每个方块可伸缩的变化，从而改变了感受野的范围，感受野成了一个多边形，同事对尺度和旋转变换起作用。
- Deformable Attention Module，选取前2mk个通道编码采样的offset， 决定query去跟哪些key进行匹配。最后mk个通道计算keys的贡献。
- 将deformable attention module扩展为多尺度feature map，主要解小目标问题，每一层采集K个点作为keys，转换成，对一个query，所有层均采K个点，融合了不同层的特征，故不需要FPN

# 关于数据集COCO2017

COCO的全称是Common Objects in Context，是微软团队提供的一个可以用来进行图像识别的数据集。MS COCO数据集中的图像分为训练、验证和测试集。其行业地位就不再多少了，本文主要梳理一下该数据集包含的内容。下图是官网给出的可下载的数据集（更新时间2020年01月09日），从这里可看出其数据集主要包括有标注的和无标注的数据。



![](https://ai-studio-static-online.cdn.bcebos.com/fc21793a8cd6402a9c9d8b9e630fc06d6696c4fc98504c2b982840445aef44a3)

In [None]:
#解压数据集
%cd /home/aistudio/data/data7122/ 
!unzip train2017.zip
!unzip val2017.zip 
!unzip annotations_trainval2017.zip 

In [None]:
#加载数据集
%cd ~/my_detr
!python coco_dataset.py

deforamble_transformer# 模型结构搭建

1. Backbone在resnet50基础上修改，支持DC5变体
2. 搭建transformer结构
3. 搭建后处理包括匈牙利匹配算法
4. 损失函数
4. 后处理


**核心代码主要有:**
* model.py
* position_encoding.py, 暂时只用到了sine
* resnet.py, 赞没用到dilation，返回的layers有变化
* backbone.py， 
* deforamble_transformer.py

![](https://ai-studio-static-online.cdn.bcebos.com/7542cd01503e46d1a80a33fbbd746d2d3d6c9fbc37ee44e7899c709b3701c1e2)

## 本地跑通pytorch代码

`./configs/r50_deformable_detr.sh --eval`

跑出的结果跟原作者发布的一致

![](https://ai-studio-static-online.cdn.bcebos.com/536cbb7fdd2b4f1896918c561184cf67502b4e0fce9c455993d0d4bbbe24e81a)



## 使用x2paddle将torch代码转为paddle

```
pip install x2paddle --index https://pypi.python.org/simple/
x2paddle --convert_torch_project --project_dir=Deformable-DETR --save_dir=paddle_project
```

**出现大量不支持的包和算子，所以暂时放弃，改用手动方式搭建**

![](https://ai-studio-static-online.cdn.bcebos.com/87d4a80c2bd54b538eda17523e39e3e50eb8414930544145a13a075879787fc7)



# 添加自定义算子

参照官方文档： [自定义外部算子](http://https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op_cn.html)

参考pytorch对算子的实现，主要完成forward和backward计算逻辑， 接口封装时需要严格安装官方文档编写，如Attribute声明， 返回数据的shape和dtype定义等等。

运行方式可以直接通过setup安装到pip环境，也可以JIT运行时编译， 本项目采用第二种。

**下列命令是对torch的模拟输入在paddle中进行验证， 前向和反向输出结果一致。**

In [None]:
%cd ~/my_df_detr/ops
!python ms_deform_attn_test.py

/home/aistudio/my_df_detr/ops
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
2021-05-28 14:35:58,059 - INFO - Re-Compiling custom_jit_ops.so, because specified cflags have been changed. New signature c362299a2c5efc8bf87b48534e812379 has been saved into /home/aistudio/.cache/paddle_extensions/custom_jit_ops/version.txt.
value type <class 'paddle.VarBase'>
shapes type <class 'paddle.VarBase'>
level_start_index type <class 'paddle.VarBase'>
sampling_locations type <class 'paddle.VarBase'>
attention_weights type <class 'paddle.VarBase'>
im2col_step type <class 'int'>
W0528 14:36:09.28

# 搭建模型结构

其中几大核心模块主要包括构建backbone, transformer, position encoding, hungrimatcher等等。 可直接运行以下命令查看网络结构：

In [None]:
%cd ~/my_df_detr
!python model.py

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  """
  """
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
2021-05-28 14:41:18,674 - INFO - Re-Compiling custom_jit_ops.so, because specified cflags have been changed. New signature cae118c821a2806df1b811169cc15250 has been saved into /home/aistudio/.cache/paddle_extensions/custom_jit_ops/version.txt.
Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/f/dataset/COCO2017', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', di

# 权重转换

将pytorch的权重转到paddle， 里面有几个地方需要注意: 
- 拆分多头注意力机制中的QKV权重
- BN层权重名称转换
- 全连接层权重维度转置

```
def convert_param_dict(model_dict, trans_weights):
    renamed_state_dict = {}
    for k, v in model_dict.items():
        name_list = k.split('.')
        if k in trans_weights:
            renamed_state_dict[k] = v.numpy().transpose((1, 0))
            continue

        if len(name_list) > 2 and name_list[-2][:2] == 'bn':
            if name_list[-1] == "weight":
                ender = "weight"
            elif name_list[-1] == "bias":
                ender = "bias"
            elif name_list[-1] == "running_mean":
                ender = "_mean"
            elif name_list[-1] == "running_var":
                ender = "_variance"
            new_k = name_list[:-1] + ["module", ender]
            renamed_state_dict['.'.join(new_k)] = v.numpy()

        elif len(name_list) > 2 and name_list[-2][:6] == 'linear' and name_list[-1] == "weight":
            renamed_state_dict['.'.join(name_list)] = v.numpy().transpose((1, 0))

        elif len(name_list) >= 2 and name_list[0][-5:] == "embed" and name_list[0][0] != 'q' \
                and name_list[-1] == "weight":
            renamed_state_dict['.'.join(name_list)] = v.numpy().transpose((1, 0))

        elif len(name_list) > 2 and (name_list[-2] == 'self_attn' or name_list[-2] == 'multihead_attn'):
            if name_list[-1][-4:] == "bias":
                q_v, k_v, v_v = np.split(v.numpy(), 3)
                q_k = name_list[:-1] + ["q_proj", "bias"]
                k_k = name_list[:-1] + ["k_proj", "bias"]
                v_k = name_list[:-1] + ["v_proj", "bias"]
                renamed_state_dict['.'.join(q_k)] = q_v
                renamed_state_dict['.'.join(k_k)] = k_v
                renamed_state_dict['.'.join(v_k)] = v_v
            else:
                q_v, k_v, v_v = np.split(v.numpy().transpose((1, 0)), 3, axis=1)
                # q_v, k_v, v_v = np.split(v.numpy(), 3, axis = 0)
                q_k = name_list[:-1] + ["q_proj", "weight"]
                k_k = name_list[:-1] + ["k_proj", "weight"]
                v_k = name_list[:-1] + ["v_proj", "weight"]
                renamed_state_dict['.'.join(q_k)] = q_v
                renamed_state_dict['.'.join(k_k)] = k_v
                renamed_state_dict['.'.join(v_k)] = v_v

        elif len(name_list) > 2 and (name_list[-3] == 'self_attn' or name_list[-3] == 'multihead_attn'):
            if name_list[-1][-4:] == "bias":
                renamed_state_dict['.'.join(name_list)] = v.numpy()
            else:
                renamed_state_dict['.'.join(name_list)] = v.numpy().transpose((1, 0))

        elif len(name_list) > 3 and name_list[-3] == 'downsample' and name_list[-2] == '1':
            if name_list[-1] == "weight":
                ender = "weight"
            elif name_list[-1] == "bias":
                ender = "bias"
            elif name_list[-1] == "running_mean":
                ender = "_mean"
            elif name_list[-1] == "running_var":
                ender = "_variance"
            new_k = name_list[:-1] + ["module", ender]
            renamed_state_dict['.'.join(new_k)] = v.numpy()

        else:
            renamed_state_dict[k] = v.numpy()

    return renamed_state_dict


device = torch.device(args.device)
seed = args.seed + utils.get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
model, criterion, postprocessors = build_model(args)
model.to(device)

if args.resume:
    if args.resume.startswith('https'):
        checkpoint = torch.hub.load_state_dict_from_url(
            args.resume, map_location='cpu', check_hash=True)
    else:
        checkpoint = torch.load(args.resume, map_location='cpu')
    model.load_state_dict(checkpoint['model'])

trans_weights = []
for layer in model.named_modules():
    if isinstance(layer[1], torch.nn.Linear):
        trans_weights.append(layer[0] + ".weight")

model_state_dict = convert_param_dict(model.state_dict(), trans_weights)
f = open("df_wts_dc5.pkl", "wb")
pickle.dump(model_state_dict, f)
f.close()
```

# 用预训练模型验证

用转换的权重对模型进行验证，基本能达到torch的精度， **44.6高于原作者官方给出的44.5**

In [12]:
%cd ~/my_df_detr
!python train_val.py --eval

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  """
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
2021-05-28 14:55:53,645 - INFO - Re-Compiling custom_jit_ops.so, because specified cflags have been changed. New signature c362299a2c5efc8bf87b48534e812379 has been saved into /home/aistudio/.cache/paddle_extensions/custom_jit_ops/version.txt.
  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized
Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_p

# 数据预处理对齐

对于样例图片进行预处理后，送入到网络中的数据基本保持一致。

**torch输出如下:**

![](https://ai-studio-static-online.cdn.bcebos.com/4a6a0aa74eec42d098020f3abc053101bb48cf86036f4ce48fd9bc5265d045f0)


**paddle输出如下:**

In [16]:
%cd ~/my_df_detr
!python check_image_pre.py

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
W0528 15:16:05.932473  6457 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0528 15:16:05.937321  6457 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[Tensor(shape=[3, 800, 1199], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
       [[[ 0.00556555, -0.01155920, -0.04580871, ..., -1.96378114, -1.92953163, -1.91240688],
         [-0.04580871, -0.0458

# 前向输出对齐

输入1*3*800*1199的模拟数据(ones), 对比torch以及Paddle的输出。 已在debug窗口进行对比过， 输出基本一致， 这里以pred_logits输出为例。

**torch输出如下:**

![](https://ai-studio-static-online.cdn.bcebos.com/9ad659fb46db498eb958359bd70104eb39636bbf2c51444c84ed7d122bdbae67)


In [18]:
%cd ~/my_df_detr
!python model.py

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/home/aistudio/data/data7122', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=5, eval=True, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbo

# 损失以及后处理对齐

对于相同的模型输入，在torch和paddle下分别对比，结果基本一致。

**troch输出如下:**

![](https://ai-studio-static-online.cdn.bcebos.com/bbfca0aab066400cb56c070e8ac70b8f6c77793327584692bffcb6aafa1cffef)

**paddle输出如下**


In [19]:
%cd ~/my_df_detr
!python model.py

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/home/aistudio/data/data7122', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=5, eval=True, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbo

# 训练对齐

固定输入和权重， 因为权重固定起来比较麻烦，这里让paddle和torch加载相同的预训练权重进行对比， 二者输出的loss仍有些差异，后面继续优化

**torch输出：**

![](https://ai-studio-static-online.cdn.bcebos.com/3756b709b2db44289fa74a1baf8d4a057f7c2eb7873946cd92c3fe58fbfe7987)


**注意这里学习率是按照网络层设置的不一样，导致训练loss对齐会有问题。**

In [32]:
%cd ~/my_df_detr
!python train_val.py

/home/aistudio/my_df_detr
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  """
Compiling user custom op, it will cost a few seconds.....
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
  return (isinstance(seq, collections.Sequence) and
  from collections import MutableMapping
  from collections import Iterable, Mapping
  from collections import Sized
Namespace(aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/home/aistudio/data/data7122', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=5, eval=False, focal

# 总结

模型已经大体复现，还有诸多不足
- 某些步骤消耗显存相比torch太多， 需要优化
- 对多batch的支持，没有充分验证
- 还没有嵌入到高层API或者PaddleDetection套件

In [None]:
#!git clone https://github.com.cnpmjs.org/PaddlePaddle/PaddleDetection -b release/2.0 --depth 1
#%cd work/PaddleDetection
#!python tools/dfdetr_train.py  -c configs/deformable_detr.yml

#bug1: outputs['pred_logits'] 出现nan
#bug2： 裁剪box时出现问题

请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>
Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. 