<a href="https://colab.research.google.com/github/xiaochengJF/DeepLearning/blob/DeepLearning/Faster_RCNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font face=STCAIYUN size=10 color=purple>构建Faster RCNN</font>


**<font face=楷体 color=skyblue size=5>训练Faster RCNN大致流程</font>**

<font face=楷体 color=skyblue>1、从图像中提取特征  
2、生成anchor，筛选、转换得到RPN网络的训练标签  
3、RPN网络中得到预测位置和正负样本的预测得分(Anchors$\Longrightarrow ROIs$)  
4、取前N个ROIs作为建议框（大约2000个proposals）  
5、从proposals中挑选出n个样本作为建议目标喂给Fast R-CNN网络  
6、通过Fast RCNN的分类和位置回归层得到预测    
7、采用2,3计算rpn_cls_loss和rpn_reg_loss  
8、采用5,6计算roi_cls_loss和roi_reg_loss  
</font> 


<img src="https://img-blog.csdnimg.cn/20200217201316137.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzcxMTU1NA==,size_16,color_FFFFFF,t_70" width="80%" height="80%">

<font face=楷体 size=5 color=green>绿色链接：</font>  
【1】[Guide to build Faster RCNN in PyTorch](https://medium.com/@fractaldle/guide-to-build-faster-rcnn-in-pytorch-95b10c273439)

## 提取特征
<font face=楷体 color=skyblue>示例如何定义一张图像、bbox（两个）及其标签</font>

In [88]:
import torch
image = torch.zeros((1, 3, 800, 800)).float()

bbox = torch.FloatTensor([[20, 30, 400, 500], [300, 400, 500, 600]]) # [y1, x1, y2, x2] format
labels = torch.LongTensor([6, 8]) # 0 represents background
sub_sample = 16

<font face=楷体 color=skyblue>生成一张图片（dummy image）</font>



In [89]:
import torchvision
dummy_img = torch.zeros((1, 3, 800, 800)).float()
print(dummy_img.shape)

torch.Size([1, 3, 800, 800])


<font face=楷体 color=skyblue>列出VGG16的所有层</font>

In [90]:
# model = torchvision.models.vgg16(pretrained=True)
model = torchvision.models.vgg16(pretrained=False)
fe = list(model.features)
for layer in fe:
    print (layer)

Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True)
Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
ReLU(inplace=True

<font face=楷体 color=skyblue>将图像喂给网络，确定得到相应的输出尺寸</font>

In [91]:
req_features = []
k = dummy_img.clone()
for i in fe:
    k = i(k)
    if k.size()[2] < 800//16:
        break
    req_features.append(i)
    out_channels = k.size()[1]
print(len(req_features)) #30
print(out_channels) # 512

30
512


<font face=楷体 color=skyblue>将list转换为Sequential module</font>

In [92]:
faster_rcnn_fe_extractor = torch.nn.Sequential(*req_features)

<font face=楷体 color=skyblue>用构建好的VGG16网络提取特征图</font>

In [93]:
out_map = faster_rcnn_fe_extractor(image)
print(out_map.size())

torch.Size([1, 512, 50, 50])


## Anchor boxes
<font face=楷体>

- 将feature map的每个像素位置映射回输入图，作为锚点位置
- 在feature map对应的所有位置上生成Anchor
- 将标签和目标与Anchor相对位置分配给每个Anchor

采用参数：anchor_scales=[8，16，32] ratio=[0.5，1，2] subsampling=16
</font>

<font face=楷体 color=skyblue>feature map的一个像素位置生成9个anchor boxes，每个像素对应输入图像中的16*16像素</font>

In [94]:
import numpy as np
ratios = [0.5, 1, 2]
anchor_scales = [8, 16, 32]

anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4), dtype=np.float32)

print(anchor_base)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


<font face=楷体 color=skyblue>一个位置要生成9个Anchors</font>

<img src="https://miro.medium.com/max/971/1*cPidpSRVUVgv3YeY9Fc11Q.png" width="25%">


In [95]:
ctr_y = sub_sample / 2.
ctr_x = sub_sample / 2.
print(ctr_y, ctr_x)

for i in range(len(ratios)):
 for j in range(len(anchor_scales)):
   h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])    # 输出的feature map的每个像素对应图像中的16*16像素
   w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])  # ration决定了anchor的长宽比例(相同面积)

   index = i * len(anchor_scales) + j

   anchor_base[index, 0] = ctr_y - h / 2.
   anchor_base[index, 1] = ctr_x - w / 2.
   anchor_base[index, 2] = ctr_y + h / 2.
   anchor_base[index, 3] = ctr_x + w / 2.
print(anchor_base)

8.0 8.0
[[ -37.254833  -82.50967    53.254833   98.50967 ]
 [ -82.50967  -173.01933    98.50967   189.01933 ]
 [-173.01933  -354.03867   189.01933   370.03867 ]
 [ -56.        -56.         72.         72.      ]
 [-120.       -120.        136.        136.      ]
 [-248.       -248.        264.        264.      ]
 [ -82.50967   -37.254833   98.50967    53.254833]
 [-173.01933   -82.50967   189.01933    98.50967 ]
 [-354.03867  -173.01933   370.03867   189.01933 ]]


<font face=楷体 color=skyblue>feature map的每一个像素位置都要生成9个anchor boxes，将每个像素位置映射回输入图中，作为Anchor的中心，每个像素对应输入图像中的16*16像素</font>

In [96]:
fe_size = (800//16)
ctr_x = np.arange(16, (fe_size+1) * 16, 16)
ctr_y = np.arange(16, (fe_size+1) * 16, 16)
print (ctr_x)

[ 16  32  48  64  80  96 112 128 144 160 176 192 208 224 240 256 272 288
 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576
 592 608 624 640 656 672 688 704 720 736 752 768 784 800]


<font face=楷体 color=skyblue>在原图上生成Anchor中心位置</font>  

<img src="https://miro.medium.com/max/969/1*f-AxsYA9ys5wtiY9NDZh9Q.png" width="25%">

In [97]:
index = 0
ctr  = np.zeros(shape=(len(ctr_x)*len(ctr_y),2))
for x in range(len(ctr_x)):
   for y in range(len(ctr_y)):
       ctr[index, 1] = ctr_x[x] - 8  # 前面生成中心点位置偏移了8个像素点
       ctr[index, 0] = ctr_y[y] - 8
       index +=1

<font face=楷体 color=skyblue>每个位置生成9个Anchors</font>





In [98]:
anchors = np.zeros(shape=(fe_size * fe_size * 9, 4))  # 共有2500个锚点，每个锚点9个框，每个框4个参数
index = 0
for c in ctr:
 ctr_y, ctr_x = c
 for i in range(len(ratios)):
   for j in range(len(anchor_scales)):
     h = sub_sample * anchor_scales[j] * np.sqrt(ratios[i])
     w = sub_sample * anchor_scales[j] * np.sqrt(1./ ratios[i])
     anchors[index, 0] = ctr_y - h / 2.
     anchors[index, 1] = ctr_x - w / 2.
     anchors[index, 2] = ctr_y + h / 2.
     anchors[index, 3] = ctr_x + w / 2.
     index += 1
print(anchors.shape)

(22500, 4)


<font face=楷体 color=skyblue size=5>给anchor分配标签和位置</font>  
<font face=楷体>
a)与ground-truth-box IoU最大的anchor标记为正标签1  
b)与ground-truth-box IoU大于0.7的anchor标记为正标签1  
c)与ground-truth-box IoU小于0.3的anchor标记为负标签0  
d)其余anchor既不是正样本的也不是负样本，对训练没有帮助-1  
<font color=red>注意：</font>单个ground-truth对象可以为多个anchor分配正标签  
</font>
<font face=楷体 color=skyblue>通过如下方式对anchor boxes分配标签和位置：</font>   
<font face=楷体>
1、找到有效的anchor boxes的索引，并且生成索引数组，生成标签数组其形状索引数组填充-1  
2、检查是否满足以上a、b、c条件中的一条，并相应填写标签。如果是正anchor box(标签为1)，注意哪个ground-truth目标可以得到这个结果。  
3、计算与anchor box相关的ground-truth的位置(loc)。  
4、通过为所有无效的anchor box填充-1和为所有有效Anchor计算具体值，重新组织所有anchor box。  
5、输出应该是(N, 1)数组的标签和带有(N, 4)数组的locs。  
6、找到所有有效anchor boxes的索引
</font>

<font face=楷体 color=skyblue>定义两个目标框及其标签</font>

In [99]:
bbox = np.asarray([[20, 30, 400, 500], [300, 400, 500, 600]], dtype=np.float32) # [y1, x1, y2, x2] format
labels = np.asarray([6, 8], dtype=np.int8) # 0 represents background 

<font face=楷体 color=skyblue>挑选出未超出图片坐标边界的框</font>


<img src="https://miro.medium.com/max/968/1*6E4EXMoTvSLZTlHLWpS3uA.png" width="25%">

In [100]:
inside_index = np.where(
       (anchors[:, 0] >= 0) &
       (anchors[:, 1] >= 0) &
       (anchors[:, 2] <= 800) &
       (anchors[:, 3] <= 800)
   )[0]
print(inside_index.shape)

(8940,)


<font face=楷体 color=skyblue>初始化标签为-1</font>

In [101]:
label = np.empty((len(inside_index), ), dtype=np.int32)
label.fill(-1)
print(label.shape)

(8940,)


<font face=楷体 color=skyblue>获取有效anchor（图片内anchor）坐标</font>

In [102]:
valid_anchor_boxes = anchors[inside_index]
print(valid_anchor_boxes.shape)

(8940, 4)


<font face=楷体 color=skyblue>计算每个anchor框与所有目标框的IOU，这里目标框共2个</font>

In [103]:
ious = np.empty((len(valid_anchor_boxes), 2), dtype=np.float32)
ious.fill(0)
print("两个目标框为：\n{}".format(bbox))
for num1, i in enumerate(valid_anchor_boxes):
   ya1, xa1, ya2, xa2 = i 
   anchor_area = (ya2 - ya1) * (xa2 - xa1)  # 有效框面积
   for num2, j in enumerate(bbox):
       yb1, xb1, yb2, xb2 = j
       box_area = (yb2- yb1) * (xb2 - xb1)  # 目标框面积

       inter_x1 = max([xb1, xa1])
       inter_y1 = max([yb1, ya1])
       inter_x2 = min([xb2, xa2])
       inter_y2 = min([yb2, ya2])
       if (inter_x1 < inter_x2) and (inter_y1 < inter_y2):
           iter_area = (inter_y2 - inter_y1) * \
(inter_x2 - inter_x1)
           iou = iter_area / \
(anchor_area + box_area - iter_area) 
       else:
           iou = 0.

       ious[num1, num2] = iou
print(ious.shape)

两个目标框为：
[[ 20.  30. 400. 500.]
 [300. 400. 500. 600.]]
(8940, 2)


<font face=楷体 color=skyblue>分别找到与每个gt_box iou最高的 anchor box(两个)</font> 

In [104]:
gt_argmax_ious = ious.argmax(axis=0)
print(gt_argmax_ious)
gt_max_ious = ious[gt_argmax_ious, np.arange(ious.shape[1])]
print(gt_max_ious)

[2262 5620]
[0.68130493 0.61035156]


<font face=楷体 color=skyblue>得到每个anchor box与所有ground-truth box中 的最高iou</font> 


In [105]:
argmax_ious = ious.argmax(axis=1)
print(argmax_ious.shape)
print(argmax_ious)
max_ious = ious[np.arange(len(inside_index)), argmax_ious]
print(max_ious.shape)
print(max_ious)

(8940,)
[0 0 0 ... 0 0 0]
(8940,)
[0.06811669 0.07083762 0.07083762 ... 0.         0.         0.        ]


<font face=楷体 color=skyblue>根据上面目标与Anchors的最大IOU值，获取等于该值的index（对应的Anchor索引，一般不存在与两个目标框的IoU都最大的情况），便于对Anchors进行正负样本设定</font>

<font face=楷体 color=green size=5>绿色链接</font>  
<font face=楷体>
【1】[numpy.where() 用法详解](https://www.cnblogs.com/massquantity/p/8908859.html)

In [106]:
gt_argmax_ious = np.where(ious == gt_max_ious)[0]
print(ious[np.where(ious == gt_max_ious)])
print(gt_argmax_ious)

[0.68130493 0.68130493 0.61035156 0.61035156 0.61035156 0.61035156
 0.61035156 0.61035156 0.61035156 0.61035156 0.61035156 0.61035156
 0.61035156 0.61035156 0.61035156 0.61035156 0.61035156 0.61035156]
[2262 2508 5620 5628 5636 5644 5866 5874 5882 5890 6112 6120 6128 6136
 6358 6366 6374 6382]


<font face=楷体 color=skyblue>设置正负样本阈值</font>

In [107]:
pos_iou_threshold  = 0.7
neg_iou_threshold = 0.3

label[max_ious < neg_iou_threshold] = 0  # 小于0.3为负样本
label[gt_argmax_ious] = 1  # 最大IOU对应anchor为正样本
label[max_ious >= pos_iou_threshold] = 1  # 大于0.7为正


<font face=楷体 color=skyblue size=4>设置正负样本比例及总量 <font>

<font face=楷体> 
每张图片都可以产生大量的样本，但多数情况下负样本占优，需要从样本中随机抽取一定数量的样本作为一个mini-batch，并且要保持一定正负样本比例维持均衡<font>

In [108]:
pos_ratio = 0.5  # 正样本比例
n_sample = 256  # 样本总量

n_pos = pos_ratio * n_sample  # 正样本数量

<font face=楷体 color=skyblue>如果正样本数量大于n_pos，则随机抽取n_pos个正样本</font>

In [109]:
pos_index = np.where(label == 1)[0]
print(np.where(label == 1))
print(pos_index)
if len(pos_index) > n_pos:
   disable_index = np.random.choice(pos_index, size=(len(pos_index) - n_pos), replace=False)
   label[disable_index] = -1

(array([2262, 2508, 5620, 5628, 5636, 5644, 5866, 5874, 5882, 5890, 6112,
       6120, 6128, 6136, 6358, 6366, 6374, 6382]),)
[2262 2508 5620 5628 5636 5644 5866 5874 5882 5890 6112 6120 6128 6136
 6358 6366 6374 6382]


In [110]:
n_neg = n_sample - np.sum(label == 1)
print(n_neg)
neg_index = np.where(label == 0)[0]
if len(neg_index) > n_neg:
   disable_index = np.random.choice(neg_index, size=(len(neg_index) - n_neg), replace = False)
   label[disable_index] = -1

238


In [111]:
print(bbox)  
print(argmax_ious)  # 各个Anchor与目标框最大IoU对应的的列索引 0：[ 20.  30. 400. 500.] ， 1：[300. 400. 500. 600.]
max_iou_bbox = bbox[argmax_ious]
print(max_iou_bbox)
print(max_iou_bbox.shape)

[[ 20.  30. 400. 500.]
 [300. 400. 500. 600.]]
[0 0 0 ... 0 0 0]
[[ 20.  30. 400. 500.]
 [ 20.  30. 400. 500.]
 [ 20.  30. 400. 500.]
 ...
 [ 20.  30. 400. 500.]
 [ 20.  30. 400. 500.]
 [ 20.  30. 400. 500.]]
(8940, 4)



<font face=楷体 color=skyblue>有效anchor的中心点和宽高：ctr_x, ctr_y, width, height  
有效anchor对应目标框的中心点和宽高: base_ctr_x, base_ctr_y, base_width, base_height</font>

In [112]:
height = valid_anchor_boxes[:, 2] - valid_anchor_boxes[:, 0]
width = valid_anchor_boxes[:, 3] - valid_anchor_boxes[:, 1]
ctr_y = valid_anchor_boxes[:, 0] + 0.5 * height
ctr_x = valid_anchor_boxes[:, 1] + 0.5 * width

base_height = max_iou_bbox[:, 2] - max_iou_bbox[:, 0]
base_width = max_iou_bbox[:, 3] - max_iou_bbox[:, 1]
base_ctr_y = max_iou_bbox[:, 0] + 0.5 * base_height
base_ctr_x = max_iou_bbox[:, 1] + 0.5 * base_width

<font face=楷体 color=yellow size=5>疑问：</font>  

<font face=楷体 color=yellow>有效anchor转为目标框的系数（dy，dx是平移系数；dh，dw是缩放系数）  
为什么不用绝对值呢？</font>   
<font face=楷体>
- 1）$\log/\exp$变换防止出现负数，网络学习的是<font color=skyblue>proposal box$\to$gt box</font>的变换系数
- 2）直接学习真实坐标，loss并不能很好地反映预测准确性，因为大目标即使预测得很准也可能比预测较差的小目标loss大得多，因此直接预测真实坐标所产生的loss并不能很好地反映预测框的好坏(yolo v1对w, h取了平方根，可以有效缓解这个问题，但是并不能解决这个问题)
</font> 

$$\begin{matrix}
&t_{x} = (x - x_{a})/w_{a} 
&t_{y} = (y - y_{a})/h_{a}\\
&t_{w} = \log(w/ w_a)
&t_{h} = \log(h/ h_a)
\end{matrix}$$

<font face=楷体 color=green size=5>绿色链接：</font>  
<font face=楷体>
【1】[python numpy np.finfo()函数 eps](https://blog.csdn.net/Dontla/article/details/103062246)
</font> 

In [113]:
eps = np.finfo(height.dtype).eps  # 代表非负的最小值
height = np.maximum(height, eps)
width = np.maximum(width, eps)
dy = (base_ctr_y - ctr_y) / height
dx = (base_ctr_x - ctr_x) / width
dh = np.log(base_height / height)
dw = np.log(base_width / width)
anchor_locs = np.vstack((dy, dx, dh, dw)).transpose()
print(anchor_locs)

[[ 0.5855728   2.30914558  0.7415674   1.64727602]
 [ 0.49718446  2.30914558  0.7415674   1.64727602]
 [ 0.40879611  2.30914558  0.7415674   1.64727602]
 ...
 [-2.50801936 -5.29225232  0.7415674   1.64727602]
 [-2.59640771 -5.29225232  0.7415674   1.64727602]
 [-2.68479606 -5.29225232  0.7415674   1.64727602]]


<font face=楷体 color=skyblue>每个anchor框对应的label
- -1：无效anchor，
- 0：负有效anchor，
- 1：正有效anchor
</font>

In [114]:
anchor_labels = np.empty((len(anchors),), dtype=label.dtype)
print(anchor_labels)
anchor_labels.fill(-1)
print(anchor_labels)
print(anchor_labels.shape)
print(label)
print(label.shape)
anchor_labels[inside_index] = label
print(anchor_labels.shape)

[-1768030208           0  -859807206 ...  1081065155  1717580045
  1082267608]
[-1 -1 -1 ... -1 -1 -1]
(22500,)
[-1 -1 -1 ... -1 -1 -1]
(8940,)
(22500,)


<font face=楷体 color=skyblue>每个anchor框对应的目标框的系数</font>

In [115]:
anchor_locations = np.empty((len(anchors),) + anchors.shape[1:], dtype=anchor_locs.dtype)  # (22500,)+(4,)=(22500,4),anchors.shape[1]=4,anchors.shape[1:]=(4,)
anchor_locations.fill(0)
anchor_locations[inside_index, :] = anchor_locs
print(anchor_locs.shape)
print(anchor_locations.shape)

(8940, 4)
(22500, 4)


<font face =楷体 color=yellow>用IOU过滤的方式给选定的Anchor分配正负样本标签+真实目标框和选定的Anchor的位置系数为位置标签$\color{red}\Longrightarrow$RPN的训练标签</font>

## RPN


<font face=楷体 color=skyblue>构建rpn网络</font>  

<font face=楷体>【1】[nn.Conv2d和其中的padding策略](https://blog.csdn.net/g11d111/article/details/82665265)</font>

In [116]:
import torch.nn as nn
mid_channels = 512
in_channels = 512  # depends on the output feature map. in vgg 16 it is equal to 512
n_anchor = 9     # Number of anchors at each location
conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
reg_layer = nn.Conv2d(mid_channels, n_anchor *4, 1, 1, 0)
cls_layer = nn.Conv2d(mid_channels, n_anchor *2, 1, 1, 0) # I will be going to use softmax here. you can equally use sigmoid if u replace 2 with 1.

<font face=楷体 color=skyblue>初始化</font>

In [117]:
# conv sliding layer
conv1.weight.data.normal_(0, 0.01)
conv1.bias.data.zero_()

# Regression layer
reg_layer.weight.data.normal_(0, 0.01)
reg_layer.bias.data.zero_()

# classification layer
cls_layer.weight.data.normal_(0, 0.01)
cls_layer.bias.data.zero_()

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [118]:
x = conv1(out_map) # out_map is obtained in section 1
pred_anchor_locs = reg_layer(x)
pred_cls_scores = cls_layer(x)

print(pred_cls_scores.shape, pred_anchor_locs.shape)

torch.Size([1, 18, 50, 50]) torch.Size([1, 36, 50, 50])


<font face=楷体 color=yellow>疑问：</font>  
<font face=楷体>
1、objectness_score = pred_cls_scores.view(1, 50, 50, 9, 2)[:, :, :, :, 1] 取[:, :, :, :, 0]呢  
</font>

In [119]:
pred_anchor_locs = pred_anchor_locs.permute(0, 2, 3, 1).contiguous().view(1, -1, 4)
print(pred_anchor_locs.shape)

#Out: torch.Size([1, 22500, 4])

pred_cls_scores = pred_cls_scores.permute(0, 2, 3, 1).contiguous()
print(pred_cls_scores.shape)
#Out torch.Size([1, 50, 50, 18])

objectness_score = pred_cls_scores.view(1, 50, 50, 9, 2)[:, :, :, :, 1].contiguous().view(1, -1)  # 取正的score
print(objectness_score.shape)
#Out torch.Size([1, 22500])

pred_cls_scores  = pred_cls_scores.view(1, -1, 2)
print(pred_cls_scores.shape)
# Out torch.size([1, 22500, 2])

torch.Size([1, 22500, 4])
torch.Size([1, 50, 50, 18])
torch.Size([1, 22500])
torch.Size([1, 22500, 2])


#### 生成候选区喂给Fast RCNN


<font face=楷体 color=skyblue>设置参数</font>

In [120]:
nms_thresh = 0.7
n_train_pre_nms = 12000
n_train_post_nms = 2000
n_test_pre_nms = 6000
n_test_post_nms = 300
min_size = 16  # cov5输出的特征图大小为原图1/16，相当于精度只有16个像素点

<font face=楷体 color=skyblue>转换anchor格式从 y1, x1, y2, x2 到 ctr_x, ctr_y, h, w </font>

In [121]:
anc_height = anchors[:, 2] - anchors[:, 0]
anc_width = anchors[:, 3] - anchors[:, 1]
anc_ctr_y = anchors[:, 0] + 0.5 * anc_height
anc_ctr_x = anchors[:, 1] + 0.5 * anc_width

<font face=楷体 color=skyblue>根据预测的四个系数，将anchor框通过平移和缩放转化为预测的目标框</font>

In [122]:
pred_anchor_locs_numpy = pred_anchor_locs[0].data.numpy()
objectness_score_numpy = objectness_score[0].data.numpy()

dy = pred_anchor_locs_numpy[:, 0::4]  # 0::4任然保持列，直接取0列会变成普通列表
dx = pred_anchor_locs_numpy[:, 1::4]
dh = pred_anchor_locs_numpy[:, 2::4]
dw = pred_anchor_locs_numpy[:, 3::4]

ctr_y = dy * anc_height[:, np.newaxis] + anc_ctr_y[:, np.newaxis]
ctr_x = dx * anc_width[:, np.newaxis] + anc_ctr_x[:, np.newaxis]
h = np.exp(dh) * anc_height[:, np.newaxis]
w = np.exp(dw) * anc_width[:, np.newaxis]

<font face=楷体 color=skyblue>将预测的目标框转换为[y1, x1, y2, x2]格式</font>

In [123]:
roi = np.zeros(pred_anchor_locs_numpy.shape, dtype=anchor_locs.dtype)
roi[:, 0::4] = ctr_y - 0.5 * h
roi[:, 1::4] = ctr_x - 0.5 * w
roi[:, 2::4] = ctr_y + 0.5 * h
roi[:, 3::4] = ctr_x + 0.5 * w
print(roi)

[[ -37.254834    -82.50966799   53.254834     98.50966799]
 [ -82.50966799 -173.01933598   98.50966799  189.01933598]
 [-173.01933598 -354.03867197  189.01933598  370.03867197]
 ...
 [ 701.49033201  746.745166    882.50966799  837.254834  ]
 [ 610.98066402  701.49033201  973.01933598  882.50966799]
 [ 429.96132803  610.98066402 1154.03867197  973.01933598]]


<font face=楷体 color=skyblue>剪辑预测框，超出输入图坐标范围的部分自动转化为边界</font>  
<font face=楷体 color=green size=5>绿色链接：
</font>  
<font face=楷体>
【1】[Python slice() 函数](https://zixuephp.net/manual-python3-1716.html)  
【2】[np.clip截取函数](https://www.cnblogs.com/cloud-ken/p/9946341.html)
</font> 

In [124]:
img_size = (800, 800) #Image size
roi[:, slice(0, 4, 2)] = np.clip(
           roi[:, slice(0, 4, 2)], 0, img_size[0])  # slice(start, stop, step) clip(a, a_min, a_max, out=None)
roi[:, slice(1, 4, 2)] = np.clip(
   roi[:, slice(1, 4, 2)], 0, img_size[1])

print(roi)

[[  0.           0.          53.254834    98.50966799]
 [  0.           0.          98.50966799 189.01933598]
 [  0.           0.         189.01933598 370.03867197]
 ...
 [701.49033201 746.745166   800.         800.        ]
 [610.98066402 701.49033201 800.         800.        ]
 [429.96132803 610.98066402 800.         800.        ]]


<font face=楷体 color=skyblue>去除高度或宽度 < threshold的预测框(去掉一些极小框)</font>

In [125]:
hs = roi[:, 2] - roi[:, 0]
ws = roi[:, 3] - roi[:, 1]
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
roi = roi[keep, :]
score = objectness_score_numpy[keep]

print(score.shape)

(22500,)


In [126]:
order = score.ravel().argsort()[::-1]  # # 按分数从高到低排序所有的
print(order)
print(order.shape)

[22499  7502  7494 ... 15003 15004     0]
(22500,)


<font face=楷体 color=skyblue>取前几个预测框pre_nms_topN(如训练时12000，测试时300)</font>

In [127]:
order = order[:n_train_pre_nms]
roi = roi[order, :]

print(roi.shape)
print(roi)

(12000, 4)
[[429.96132803 610.98066402 800.         800.        ]
 [280.           8.         792.         520.        ]
 [429.49033201 218.745166   610.50966799 309.254834  ]
 ...
 [  0.         514.98066402 562.03867197 800.        ]
 [125.49033201 514.98066402 306.50966799 800.        ]
 [  0.         333.96132803 317.01933598 800.        ]]


#### NMS
<font face=楷体>去除和极大值anchor框IOU大于0.7的框(去重)，保留score大，且基本不重叠的框</font>


In [128]:
y1 = roi[:, 0]
x1 = roi[:, 1]
y2 = roi[:, 2]
x2 = roi[:, 3]

areas = (x2 - x1 + 1) * (y2 - y1 + 1)
print(order.shape)
score = score[order]
order = score.argsort()[::-1]  
print(order)
keep = []
while order.size > 0:
   i = order[0]
   keep.append(i)
   xx1 = np.maximum(x1[i], x1[order[1:]])
   yy1 = np.maximum(y1[i], y1[order[1:]])
   xx2 = np.minimum(x2[i], x2[order[1:]])
   yy2 = np.minimum(y2[i], y2[order[1:]])
  
   w = np.maximum(0.0, xx2 - xx1 + 1)
   h = np.maximum(0.0, yy2 - yy1 + 1)    
   inter = w * h
   ovr = inter / (areas[i] + areas[order[1:]] - inter)
   
   inds = np.where(ovr <= nms_thresh)[0]
   order = order[inds + 1]  # 最大的作为目标框，从第二大的开始算iou


keep = keep[:n_train_post_nms]  # while training/testing , use accordingly
roi = roi[keep]  # the final region proposals
print(roi.shape)

(12000,)
[11999  3996  4005 ...  7995  7994     0]
(1758, 4)


<font face=楷体><font color=skyblue>n_sample</font>：roi中采样的样本数目，默认为128  
<font color=skyblue>pos_ratio</font>：n_samples中的正样本的比例，默认为0.25   
<font color=skyblue>pos_iou_thresh</font>：设置为正样本region proposal与ground-truth目标之间最小的重叠值阈值  
<font color=skyblue>[neg_iou_threshold_lo, neg_iou_threshold_hi]</font> : [0.0, 0.5], 设置为负样本[背景]的重叠值阈值  

In [129]:
n_sample = 128
pos_ratio = 0.25
pos_iou_thresh = 0.5
neg_iou_thresh_hi = 0.5
neg_iou_thresh_lo = 0.0  # 可作一定调节，如0.01，相当于挑选出较为困难的样本训练（样本在目标附近）

<font face=楷体 color=skyblue>找到每个ground-truth 目标与region proposal的iou</font> 


In [130]:
ious = np.empty((len(roi), 2), dtype=np.float32)
ious.fill(0)
for num1, i in enumerate(roi):
   ya1, xa1, ya2, xa2 = i  
   anchor_area = (ya2 - ya1) * (xa2 - xa1)
   for num2, j in enumerate(bbox):
       yb1, xb1, yb2, xb2 = j
       box_area = (yb2- yb1) * (xb2 - xb1)

       inter_x1 = max([xb1, xa1])
       inter_y1 = max([yb1, ya1])
       inter_x2 = min([xb2, xa2])
       inter_y2 = min([yb2, ya2])

       if (inter_x1 < inter_x2) and (inter_y1 < inter_y2):
           iter_area = (inter_y2 - inter_y1) * \
(inter_x2 - inter_x1)
           iou = iter_area / (anchor_area+ \
box_area - iter_area)            
       else:
           iou = 0.

       ious[num1, num2] = iou
print(ious.shape)

(1758, 2)



<font face=楷体 color=skyblue>找到与每个region proposal具有较高IoU的ground truth，得到其列索引及相应的IoU</font>   

In [131]:
gt_assignment = ious.argmax(axis=1)
max_iou = ious.max(axis=1)
print(gt_assignment.shape)
print(gt_assignment)
print(max_iou.shape)
print(max_iou)

(1758,)
[0 0 1 ... 0 0 0]
(1758,)
[0.17802154 0.17926688 0.04676318 ... 0.         0.         0.        ]


#### 为每个proposal分配标签

<font face=楷体 color=skyblue>分类标签</font>   
<font face=楷体>根据ROIs与真实目标框IOU分配类别标签</font>  
<font face=楷体><font face=楷体 color=red>注意：</font>这里默认背景标记为0</font>

In [132]:
gt_roi_label = labels[gt_assignment]
print(gt_roi_label)

[6 6 8 ... 6 6 6]


<font face=楷体>
根据每个pos_iou_thresh选择前景rois。只保留n_sample*pos_ratio（128*0.25=32）个前景样本，因此如果只得到少于32个正样本，保持原状。如果得到多余32个前景目标，从中采样32个样本
</font>

In [133]:
pos_roi_per_image = 32
pos_index = np.where(max_iou >= pos_iou_thresh)[0]
pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
if pos_index.size > 0:
   pos_index = np.random.choice(
       pos_index, size=pos_roi_per_this_image, replace=False)
print(pos_roi_per_this_image)
print(pos_index)

19
[1206  584  130  805   34  895  429  751  369  137  487  135  217   37
  762  637  427  409  569]


<font face=楷体>
针对负[背景]region proposal进行相似处理，如果对于之前分配的ground truth目标，region proposal的IoU在neg_iou_thresh_lo和neg_iou_thresh_hi之间，对该region proposal分配0标签，从这些负样本中采样n(n_sample-pos_samples,128-32=96)个region proposals
</font>  

In [134]:
neg_index = np.where((max_iou < neg_iou_thresh_hi) &
            (max_iou >= neg_iou_thresh_lo))[0]
neg_roi_per_this_image = n_sample - pos_roi_per_this_image
neg_roi_per_this_image = int(min(neg_roi_per_this_image, neg_index.size))
if  neg_index.size > 0 :
   neg_index = np.random.choice(
       neg_index, size=neg_roi_per_this_image, replace=False)
print(neg_roi_per_this_image)
print(neg_index)

109
[ 933 1018 1338 1433  269   13 1750 1681  994  614 1387   23  556  834
 1268  164 1687 1711 1239 1107  868  256 1498  266  950  454  607  464
 1076 1197  891  133  807 1480 1429 1289  186  491 1081 1137 1038 1207
  957 1625  557  783  626  816 1638 1580  867  490 1751 1175  822  428
  244  764  713 1134 1021   44 1172  725  579 1380  365  821 1409  629
  265  179 1082 1157  242 1312 1402 1695  808  687  897   51  481  431
 1191  354  889  886 1294  554 1348  523  168  809 1242  380  323  371
 1205 1055 1104  645  766  798  169  448  196 1253  338]


<font face=楷体 color=skyblue>整合正样本索引和负样本索引，及其各自的标签和region proposals</font>






In [135]:
keep_index = np.append(pos_index, neg_index)
gt_roi_labels = gt_roi_label[keep_index]
gt_roi_labels[pos_roi_per_this_image:] = 0  # negative labels --> 0
sample_roi = roi[keep_index]
print(sample_roi.shape)

(128, 4)


<font face=楷体 color=skyblue>对这些sample_roi选择ground truth目标之后按照为anchor boxes分配位置的方式进行参数化</font>
$$\begin{matrix}
&t_{x} = (x - x_{a})/w_{a} 
&t_{y} = (y - y_{a})/h_{a}\\
&t_{w} = \log(w/ w_a)
&t_{h} = \log(h/ h_a)
\end{matrix}$$



In [136]:
# 根据预测框和对应目标框，计算参数（平移参数：dy, dx； 缩放参数：dh, dw）
bbox_for_sampled_roi = bbox[gt_assignment[keep_index]]  # 标签对应的目标框
print(bbox_for_sampled_roi.shape)
#Out
#(128, 4)
height = sample_roi[:, 2] - sample_roi[:, 0]
width = sample_roi[:, 3] - sample_roi[:, 1]
ctr_y = sample_roi[:, 0] + 0.5 * height
ctr_x = sample_roi[:, 1] + 0.5 * width
base_height = bbox_for_sampled_roi[:, 2] - bbox_for_sampled_roi[:, 0]
base_width = bbox_for_sampled_roi[:, 3] - bbox_for_sampled_roi[:, 1]
base_ctr_y = bbox_for_sampled_roi[:, 0] + 0.5 * base_height
base_ctr_x = bbox_for_sampled_roi[:, 1] + 0.5 * base_width

(128, 4)


In [137]:
eps = np.finfo(height.dtype).eps
height = np.maximum(height, eps)
width = np.maximum(width, eps)

dy = (base_ctr_y - ctr_y) / height
dx = (base_ctr_x - ctr_x) / width
dh = np.log(base_height / height)
dw = np.log(base_width / width)

gt_roi_locs = np.vstack((dy, dx, dh, dw)).transpose()
print(gt_roi_locs.shape)

(128, 4)


In [138]:
rois = torch.from_numpy(sample_roi).float()

In [139]:
roi_indices = 0 * np.ones((len(rois),), dtype=np.int32) # 图片索引号，这里只有一张直接为0
roi_indices = torch.from_numpy(roi_indices).float()
print(rois.shape, roi_indices.shape)

torch.Size([128, 4]) torch.Size([128])


<font face=楷体>合并 rois and roi_indices, 得到维度为[N, 5] (index, x, y, h, w)的张量<font color=yellow>作为下一步Fast RCNN的训练标签</font>  
</font>

In [140]:
indices_and_rois = torch.cat([roi_indices[:, None], rois], dim=1)
print(indices_and_rois)
xy_indices_and_rois = indices_and_rois[:, [0, 2, 1, 4, 3]]  # 调整列的次序
print(xy_indices_and_rois)
indices_and_rois = xy_indices_and_rois.contiguous()
print(xy_indices_and_rois.shape)

tensor([[  0.0000,   0.0000,   2.9807, 466.0387, 365.0193],
        [  0.0000,   0.0000,  40.0000, 360.0000, 552.0000],
        [  0.0000,   0.0000, 146.9807, 562.0387, 509.0193],
        [  0.0000,  82.9807,   0.0000, 445.0193, 594.0387],
        [  0.0000,   0.0000, 136.0000, 408.0000, 648.0000],
        [  0.0000,   0.0000,  34.9807, 578.0387, 397.0193],
        [  0.0000,   0.0000,   0.0000, 360.0000, 328.0000],
        [  0.0000,  24.0000,   8.0000, 536.0000, 520.0000],
        [  0.0000,  98.9807,   0.0000, 461.0193, 434.0387],
        [  0.0000,  34.9807,   0.0000, 397.0193, 690.0387],
        [  0.0000,   0.0000,  56.0000, 264.0000, 568.0000],
        [  0.0000,   0.0000,  72.0000, 488.0000, 584.0000],
        [  0.0000,   0.0000, 178.9807, 386.0387, 541.0193],
        [  0.0000,   0.0000,  29.9613, 333.0193, 754.0387],
        [  0.0000,   0.0000,  82.9807, 466.0387, 445.0193],
        [  0.0000,   0.0000,  98.9807, 626.0387, 461.0193],
        [  0.0000,   0.0000,   0.0000, 2

<font face=楷体 color=skyblue>将rois映射回feature map，取出rois对应的feature map，然后传到roi_pooling层（7 x 7）</font>

<font face=楷体 color=green size=5>绿色链接：</font>

<font face=楷体>【1】[pytorch中torch.narrow()函数](https://www.cnblogs.com/qinduanyinghua/p/11862641.html)  
【2】[python中[...,1]与[...,1:2]](https://blog.csdn.net/weixin_41637329/article/details/88362345)
</font>


In [None]:
size = (7, 7)
adaptive_max_pool = torch.nn.AdaptiveMaxPool2d(size[0], size[1])
output = []
rois = indices_and_rois.data.float()
rois[:, 1:].mul_(1/16.0)  # Subsampling ratio
rois = rois.long()  # 取整
num_rois = rois.size(0)
for i in range(num_rois):
   roi = rois[i]
   im_idx = roi[0]  # tensor(0)
   # [..., roi[2]:(roi[4]+1), roi[1]:(roi[3]+1)]：在特征图谱上得到roi对应的特征图
   im = out_map.narrow(0, im_idx, 1)[..., roi[2]:(roi[4]+1), roi[1]:(roi[3]+1)]  # out_map:torch.Size([1, 512, 50, 50])  out_map = faster_rcnn_fe_extractor(image)
   #output.append(adaptive_max_pool(im))
   output.append(adaptive_max_pool(im)[0].data)  # 修改
output = torch.cat(output, 0)
print(output.size())
#output = torch.stack(output)
print(output.shape)
#Out:
# torch.Size([128, 512, 7, 7])
# Reshape the tensor so that we can pass it through the feed forward layer.
k = output.view(output.size(0), -1)
print(k.shape)
#Out:
# torch.Size([128, 25088])

<font face=楷体 color=skyblue>定义Fast RCNN的classification  和 位置regression网络 </font> 


In [142]:
roi_head_classifier = nn.Sequential(*[nn.Linear(25088, 4096),
                    nn.Linear(4096, 4096)])
cls_loc = nn.Linear(4096, 21 * 4) # (VOC 20 classes + 1 background. Each wil
cls_loc.weight.data.normal_(0, 0.01)
cls_loc.bias.data.zero_()
score = nn.Linear(4096, 21) # (VOC 20 classes + 1 background)

<font face=楷体 color=skyblue>将roi-pooling的输出传到上面定义的网络</font>

In [143]:
k = roi_head_classifier(k)
roi_cls_loc = cls_loc(k)
roi_cls_score = score(k)
print(roi_cls_loc.shape, roi_cls_score.shape)

torch.Size([128, 84]) torch.Size([128, 21])


## 损失函数

#### RPN损失

In [144]:
print(pred_anchor_locs.shape)  # RPN网络预测的坐标系数
print(pred_cls_scores.shape)  # RPN网络预测的类别
print(anchor_locations.shape)   # anchor对应的实际坐标系数
print(anchor_labels.shape)  # anchor的实际类别

torch.Size([1, 22500, 4])
torch.Size([1, 22500, 2])
(22500, 4)
(22500,)


<font face=楷体 color=skyblue>重新排列，将输入和输出排成一行</font>

In [145]:
rpn_loc = pred_anchor_locs[0]
rpn_score = pred_cls_scores[0]
gt_rpn_loc = torch.from_numpy(anchor_locations)
gt_rpn_score = torch.from_numpy(anchor_labels)
print(rpn_loc.shape, rpn_score.shape, gt_rpn_loc.shape, gt_rpn_score.shape)

torch.Size([22500, 4]) torch.Size([22500, 2]) torch.Size([22500, 4]) torch.Size([22500])


<font face=楷体 color=skyblue>pred_cls_scores 和 anchor_labels 是RPN网络的预测对象值和实际对象值  
对classification用Cross Entropy损失:</font>
$$H(y)=(-1)*\sum_iy_i*\log(y_i)\\
Softmax = e^{l_{ina}}/(\sum^3_{a=1}e^{O_{ina}})$$

<font face=楷体>用Pytorch计算损失  
rpn_cls_loss = F.cross_entropy(rpn_score, gt_rpn_score.long(), ignore_index = -1)</font>

In [146]:
import torch.nn.functional as F
#gt_rpn_score = torch.autograd.Variable(gt_rpn_score.long())
rpn_cls_loss = F.cross_entropy(rpn_score, gt_rpn_score.long(), ignore_index = -1)
print(rpn_cls_loss)

tensor(0.6931, grad_fn=<NllLossBackward>)


<font face=楷体>
对于 Regression 用smooth L1 损失:
$$\begin{aligned}
L_{Loc}(t^u,v)&=\sum_{i\in x,y,w,h}smooth_{L_1}(t^u_i-V_i)\\
smooth_{L_1} &=\left\{ \begin{aligned} 0.5*x^2 \quad if |x|<1\\
|x|-0.5 \quad otherwise\end{aligned}\right.
\end{aligned}$$

使用 L1 而不是 L2 损失，是因为RPN的预测回归头的值不是有限的。 Regression 损失也被应用在有正标签的边界区域中：
</font>


In [147]:
pos = gt_rpn_score > 0  # Anchor label大于0
print(pos.shape)
mask = pos.unsqueeze(1).expand_as(rpn_loc)  # unsqueeze(1)增加维度
print(mask.shape)
print(mask)

torch.Size([22500])
torch.Size([22500, 4])
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False],
        ...,
        [False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


<font face=楷体 color=skyblue>取有正数标签的边界区域：</font>  


In [148]:
mask_loc_preds = rpn_loc[mask].view(-1, 4)
mask_loc_targets = gt_rpn_loc[mask].view(-1, 4)
print(mask_loc_preds.shape, mask_loc_preds.shape)

torch.Size([18, 4]) torch.Size([18, 4])


<font face=楷体 color=skyblue>regression损失</font>

In [149]:
x = torch.abs(mask_loc_targets - mask_loc_preds)
rpn_loc_loss = ((x < 1).float() * 0.5 * x**2) + ((x >= 1).float() * (x-0.5))
print(rpn_loc_loss.sum())

tensor(1.1629, dtype=torch.float64, grad_fn=<SumBackward0>)


<font face=楷体 color=skyblue>合并rpn_cls_loss 和 rpn_reg_loss, 因为class loss 应用在标签框，而交叉损失已归一化，regression loss 之应用在正标签框上，所以要除以正标签框数量，rpn_lambda是一个超参数，使网络更专注于回归任务：</font>

In [150]:
rpn_lambda = 10.  
N_reg = (gt_rpn_score >0).float().sum()
rpn_loc_loss = rpn_loc_loss.sum() / N_reg
rpn_loss = rpn_cls_loss + (rpn_lambda * rpn_loc_loss)
print(rpn_loss)

tensor(1.3392, dtype=torch.float64, grad_fn=<AddBackward0>)


#### Fast RCNN 损失

<font face=楷体 color=skyblue>Fast RCNN预测</font>

In [151]:
print(roi_cls_loc.shape)
print(roi_cls_score.shape)

torch.Size([128, 84])
torch.Size([128, 21])


<font face=楷体 color=skyblue>训练标签 </font>

In [152]:
print(gt_roi_locs.shape)
print(gt_roi_labels.shape)

(128, 4)
(128,)


<font face=楷体 color=skyblue>转化到Torch变量</font>

In [153]:
gt_roi_loc = torch.from_numpy(gt_roi_locs)
gt_roi_label = torch.from_numpy(np.float32(gt_roi_labels)).long()
print(gt_roi_loc.shape, gt_roi_label.shape)

torch.Size([128, 4]) torch.Size([128])


<font face=楷体 color=skyblue>分类损失</font>

In [154]:
#gt_roi_label = torch.autograd.Variable(gt_roi_label)
roi_cls_loss = F.cross_entropy(roi_cls_score, gt_roi_label, ignore_index=-1)
print(roi_cls_loss)

tensor(3.0366, grad_fn=<NllLossBackward>)


<font face=楷体 color=skyblue>回归损失  
每个ROI位置有21（num_classes+background）预测边界框。只使用带有正标签的边界框（P_i^*）计算损失</font>

In [155]:
n_sample = roi_cls_loc.shape[0]
roi_loc = roi_cls_loc.view(n_sample, -1, 4)
print(roi_loc.shape)
#Out:
#torch.Size([128, 21, 4])
roi_loc = roi_loc[torch.arange(0, n_sample).long(), gt_roi_label]
print(roi_loc.shape)
#Out:
#torch.Size([128, 4])

torch.Size([128, 21, 4])
torch.Size([128, 4])


<font face=楷体 color=skyblue>
用计算RPN网络回归损失的方法计算回归损失
</font>

In [156]:
# 用计算RPN网络回归损失的方法计算回归损失
# roi_loc_loss = REGLoss(roi_loc, gt_roi_loc)

pos = gt_roi_label.data > 0  # Regression 损失也被应用在有正标签的边界区域中
mask = pos.unsqueeze(1).expand_as(roi_loc)
print(mask.shape)  # (128, 4L)

# 现在取有正数标签的边界区域
mask_loc_preds = roi_loc[mask].view(-1, 4)
mask_loc_targets = gt_roi_loc[mask].view(-1, 4)
print(mask_loc_preds.shape, mask_loc_targets.shape)  # ((19L, 4L), (19L, 4L))

x = np.abs(mask_loc_targets.numpy() - mask_loc_preds.data.numpy())
print (x.shape)  # (19, 4)

roi_loc_loss = ((x < 1) * 0.5 * x**2) + ((x >= 1) * (x-0.5))
print(roi_loc_loss.sum())  # 1.4645805211187053

N_reg = (gt_roi_label > 0).float().sum()
N_reg = np.squeeze(N_reg.data.numpy())
roi_loc_loss = roi_loc_loss.sum() / N_reg
roi_loc_loss = np.float32(roi_loc_loss)
print (roi_loc_loss)  # 0.077294916
# roi_loc_loss = torch.autograd.Variable(torch.from_numpy(roi_loc_loss))

torch.Size([128, 4])
torch.Size([19, 4]) torch.Size([19, 4])
(19, 4)
1.4763964655548227
0.07770508


<font face=楷体 color=skyblue>ROI损失总和</font>

In [157]:
roi_lambda = 10.
#roi_cls_loss = np.squeeze(roi_cls_loss.data.numpy())
roi_loss = roi_cls_loss + (roi_lambda * roi_loc_loss)
print(roi_loss)

tensor(3.8137, grad_fn=<AddBackward0>)


#### 总损失：RPN+ Fast RCNN

In [158]:
total_loss = rpn_loss + roi_loss
print(total_loss)

tensor(5.1529, dtype=torch.float64, grad_fn=<AddBackward0>)
