<a href="https://colab.research.google.com/github/xiaochengJF/DeepLearning/blob/master/darknet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font face=STCAIYUN color=purple size=8>YOLOv3</font>
<font face=楷体 color=skyblue size=4>YOLO 是 You Only Look Once 的缩写</font>  
<font face=楷体 color=skyblue size=4>**全卷积神经网络** </font>  
<font face=楷体>
YOLO 仅使用卷积层，即：全卷积神经网络（FCN对于输入图像的大小不敏感） ，它拥有 75 个卷积层，带有跳跃连接和上采样层。不使用任何它形式的，使用步幅为 2 的卷积层代替池化层对特征图进行下采样，防止池化导致的低级特征丢失 
    
问题是：如果我们希望按批次处理图像（批量图像由 GPU 并行处理，这样可以提升速度），我们就需要固定所有图像的高度和宽度。这就需要将多个图像整合进一个大的批次（将许多 PyTorch 张量合并成一个）</font>  
 


<font face=楷体 color=skyblue size=4>**YOLOv3网络结构图** </font>  

![替代文字](https://img-blog.csdnimg.cn/20190824145218799.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MzcxMTU1NA==,size_16,color_FFFFFF,t_70)  

<font face=楷体 color=skyblue size=4>**网络输出**</font>   

![替代文字](https://blog.paperspace.com/content/images/2018/04/yolo-5.png)  

<font face=楷体 color=skyblue size=4>锚点框（Anchor Box）</font>  
<font face=楷体>
预测边界框的宽度和高度看上去是最直接的，但实践训练中会带来梯度不稳定，所以现在大部分目标检测器都预测对数空间（log-space）变换量，或者预测与预定义边界框（即锚点）之间的偏移量  
这些变换被应用到锚点框来获得预测，YOLO v3 有三个锚点，所以每个单元格会预测 3 个边界框  
</font>   

<font face=楷体 color=skyblue size=4>中心坐标</font>    
<font face=楷体>
通过一个sigmoid函数对中心坐标预测，迫使输出介于0和1之间  
YOLO通常不会预测边界框中心的绝对坐标而是预测偏移量：  


*   相对于预测对象格单元格的左上角
*   由特征图中的单元格尺寸标准化，如上图：对狗狗中心的预测是（0.4,0.7），那么中心位于13 x 13特征图上的（6.4,6.7）位置  
</font>


<font face=楷体 color=skyblue size=4>预测</font>  
<font face=楷体>每个bounding box预测5个值：$\color{pink}{t_x，t_y，t_w，t_h，t_o}$ （$t_o$类似YOLOv1中的confidence）  

*   $\color{pink}{t_x，t_y}$：经过sigmoid函数处理后范围在0到1之间，模型训练更加稳定  
*   $\color{pink}{c_x，c_y}$：表示一个cell和图像左上角的横纵距离 
*   $\color{pink}{p_w，p_h}$：表示bounding box的宽高 
</font>

 
$$
\begin{aligned} b_{x} &=\sigma\left(t_{x}\right)+c_{x} \\ b_{y} &=\sigma\left(t_{y}\right)+c_{y} \\ b_{w} &=p_{w} e^{t_{w}} \\ b_{h} &=p_{h} e^{t_{h}} \\ \operatorname{Pr}(\text { object }) * I O U(b, \text { object }) &=\sigma\left(t_{o}\right)\end{aligned}
$$

<font face=楷体 color=yellow>在Faster R-CNN中：</font>
$$\begin{align}
t_x &= (x - x_a) /w_a, \ t_y = (y - y_a) / h_a\\[1ex]t_w &= log(w/ w_a), t_h = log(h/h_a)\\[1ex]t_x^* &= (x^* - x_a) / w_a, t_y^* = (y^* - y_a) /h_a\\[1ex]t_w^* &= log(w^* - w), h_h^* = log(h^*/h_a)
\end{align}$$  

<font face=楷体 color=skyblue size=4>边界框的尺寸</font>   
<font face=楷体>
将输出进行对数变换乘以锚来预测边界框的尺寸  

</font>
![替代文字](https://blog.paperspace.com/content/images/2018/04/yolo-regression-1.png)


# Darknet.py

### 1.1 解析配置文件

In [0]:
from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np
import cv2

#from util import * 

<font face=楷体>
定义一个函数 parse_cfg，用配置文件的路径作为输入  
    
将 <font color=skyblue size=4>**Net 、  Convolutional  、 Shortcut  、Upsample 、 Route 、 YOLO**</font> 以blocks 列表的形式返回
</font>

In [0]:
def parse_cfg(cfgfile):
    """
    Takes a configuration file
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list    
    """
    file = open(cfgfile, 'r')
    lines = file.read().split('\n')                        # store the lines in a list
    lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
    lines = [x for x in lines if x[0] != '#']              # get rid of comments
    lines = [x.rstrip().lstrip() for x in lines]           # get rid of fringe whitespaces
    
    block = {}
    blocks = []

    for line in lines:
        if line[0] == "[":               # This marks the start of a new block
            if len(block) != 0:          # If block is not empty, implies it is storing values of previous block.
                blocks.append(block)     # add it the blocks list
                block = {}               # re-init the block
            block["type"] = line[1:-1].rstrip()     
        else:
            key,value = line.split("=") 
            block[key.rstrip()] = value.lstrip()
    blocks.append(block)

    return blocks

In [0]:
#blocks = parse_cfg("/content/gdrive/My Drive/YOLO/v3/cfg/yolov3.cfg")
#blocks

###1.2构建 PyTorch 模块   

<font face=楷体 color=skyblue size=4>**create_modules 函数用 parse_cfg 函数返回的 blocks 列表构建网络模块：**</font>

<font face=楷体>

*   先定义变量 net_info，来存储该网络的信息

*  当添加 nn.ModuleList 作为 nn.Module 对象的一个成员时（即添加模块到网络），所有 nn.ModuleList 内部的 nn.Module 对象（模块）的 parameter 也被添加   作为 nn.Module 对象（即网络添加 nn.ModuleList 作为其成员）的 parameter

*   卷积核的深度是由上一层的卷积核数量（或特征图深度）决定的。这意味着我们需要持续追踪被应用卷积层的卷积核数量。用变量 prev_filter 实现追踪（RGB3通道所以初始化为3）

*  路由层（route layer）从前面层得到特征图，不仅需要追踪前一层的卷积核数量，还需要追踪之前每一层。不断地迭代，将每个模块的输出卷积核数量添加到 output_filters 列表 

*  nn.Sequential 类是能让nn.Module 对象有序执行 的数字，一个模块可能包含多个层，用 nn.Sequential 将这些层串联起来，得到 add_module 函数
    
</font>  

<font face=楷体 color=yellow size=4>疑问：</font>   

1、anchors = [anchors[i] for i in mask]   
2、detection = DetectionLayer(anchors)  
3、路由层  
4、YOLO层mask  
line107:  anchors = [anchors[i] for i in mask]   

<font face=楷体 color=green size=4>**绿色链接:**</font>  

[【1】nn.Conv2d 、nn.BatchNorm2d](https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-nn/#torchnn)  
[【2】torch.nn.Upsample](https://pytorch.org/docs/stable/nn.html?highlight=nn%20upsample#torch.nn.Upsample)  
[【3】darknet 所有层功能说明 ](https://blog.csdn.net/zhuiqiuk/article/details/88187034)   







In [0]:
def create_modules(blocks):
    net_info = blocks[0]  # Captures the information about the input and pre-processing
    module_list = nn.ModuleList()
    index = 0  # indexing blocks helps with implementing route  layers (skip connections)
    prev_filters = 3
    output_filters = []
    
    # 迭代模块的列表，并为每个模块创建一个 PyTorch 模块
    for index, x in enumerate(blocks[1:]): 
        module = nn.Sequential()
        #check the type of block
        #create a new module for the block
        #append to module_list
        
       #创建卷积层
       # If it's a convolutional layer
        if (x["type"] == "convolutional"):
            # Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters = int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            # Add the convolutional layer
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=bias)
            module.add_module("conv_{0}".format(index), conv)
            
            # Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            # Check the activation.
            # It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace=True)
                module.add_module("leaky_{0}".format(index), activn)

        # 构建上采样层
        # If it's an upsampling layer
        # We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor=2, mode="nearest")
            module.add_module("upsample_{}".format(index), upsample)
            

        # 路由层
        # If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')

            # Start  of a route
            start = int(x["layers"][0])

            # end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0

            # Positive anotation
            if start > 0:
                start = start - index

            if end > 0:
                end = end - index

            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)

            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters = output_filters[index + start]
                
        # 捷径层       
        # shortcut corresponds to skip connection
        if x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)    
            
            
            
        # YOLO层
        # Yolo is the detection layer
        if x["type"] == "yolo":
            mask = x["mask"].split(",") 
            mask = [int(x) for x in mask] 

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
            anchors = [anchors[i] for i in mask]  # 取前三个？？

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)  
            
        # 回路结束时，做一些统计（bookkeeping.）
        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)
        
    return (net_info, module_list)

## 定义网络


<font face=楷体 color=skyblue size=4>
用 nn.Module 在 PyTorch 中构建自定义架构：  
    
*   定义nn.Module子类Darknet，用 members、blocks、net_info 和 module_list 进行初始化  

*   重写forward方法来实现网络的前向传递nn.Module ：1）计算输出；2）以一种可以更容易处理的方式转换输出检测特征图，例如：将它们转换为可以连接多个尺度的检测图（ self.blocks 的第一个元素是 net 块，不属于前向传播，所以迭代的对象是 self.block[1:]）

*   与create_modules函数一样，module_list中包含网络的模块。模块的附加顺序与配置文件中的顺序相同。这意味着，我们可以通过每个模块简单地运行输入以获得输出。
*   路由层需要连接两个特征映射， 用torch.cat带有第二个参数的函数为1，因为要沿着深度连接特征映射  
    
</font>



### 加载权重
<font face=楷体 color=skyblue size=4>权重是如何存储：</font>  

<font face=楷体>
权重只属于<font color=skyblue>批量归一化层（batch norm layer）和卷积层</font>  两种类型的层，储存顺序和配置文件中定义层级的顺序完全相同  
    
下图展示了权重如何储存：
</font>

![替代文字](https://img-blog.csdn.net/20180507102704610?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3UwMTE1MjA1MTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  

<font face=楷体 color=yellow size=4>疑问：</font>  
~~~
# 卷积和上采样层
if module_type == "convolutional" or module_type == "upsample":
    x = self.module_list[i](x)
    outputs[i] = x 
line74: batch_normalize = int(self.blocks[i + 1]["batch_normalize"])    
line102:  bn_biases = bn_biases.view_as(bn.bias.data)
                
~~~


In [0]:
class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)
        
    # 前向传播
    def forward(self, x, CUDA):
        detections = []
        modules = self.blocks[1:]
        outputs = {}  # We cache the outputs for the route layer    
        
        write = 0
        for i in range(len(modules)):
            module_type = (modules[i]["type"])    
            
            # 卷积和上采样层
            if module_type == "convolutional" or module_type == "upsample":
                x = self.module_list[i](x)
                outputs[i] = x     
                
            # 路由层
            elif module_type == "route":
                layers = modules[i]["layers"]
                layers = [int(a) for a in layers]

                if (layers[0]) > 0:
                    layers[0] = layers[0] - i

                if len(layers) == 1:
                    x = outputs[i + (layers[0])]

                else:
                    if (layers[1]) > 0:
                        layers[1] = layers[1] - i

                    map1 = outputs[i + layers[0]]
                    map2 = outputs[i + layers[1]]

                    x = torch.cat((map1, map2), 1)
                
            elif module_type == "shortcut":
                from_ = int(modules[i]["from"])
                x = outputs[i - 1] + outputs[i + from_]
                outputs[i] = x
                
    def load_weights(self, weightfile):
        # Open the weights file
        fp = open(weightfile, "rb")

        # 第一个 160 比特的权重文件保存了 5 个 int32 值，它们构成了文件的标头
        # The first 4 values are header information
        # 1. Major version number
        # 2. Minor Version Number
        # 3. Subversion number
        # 4. IMages seen
        header = np.fromfile(fp, dtype=np.int32, count=5)
        self.header = torch.from_numpy(header)
        self.seen = self.header[3]        
        
        # The rest of the values are the weights
        # Let's load them up
        weights = np.fromfile(fp, dtype=np.float32)
        
        # 循环地加载权重文件到网络的模块上
        ptr = 0
        for i in range(len(self.module_list)):
            module_type = self.blocks[i + 1]["type"]

            if module_type == "convolutional":        
                
                model = self.module_list[i]
                try:
                    batch_normalize = int(self.blocks[i + 1]["batch_normalize"])  # blocks包括Net，module_list只包含网络层
                except:
                    batch_normalize = 0

                conv = model[0]
                
                #如果 batch_normalize 检查结果是 True，则我们按以下方式加载权重
                if (batch_normalize):
                    bn = model[1]

                    # Get the number of weights of Batch Norm Layer
                    num_bn_biases = bn.bias.numel()

                    # Load the weights
                    bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
                    ptr += num_bn_biases

                    bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                    ptr += num_bn_biases

                    bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                    ptr += num_bn_biases

                    bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
                    ptr += num_bn_biases

                    # Cast the loaded weights into dims of model weights.
                    bn_biases = bn_biases.view_as(bn.bias.data)
                    bn_weights = bn_weights.view_as(bn.weight.data)
                    bn_running_mean = bn_running_mean.view_as(bn.running_mean)
                    bn_running_var = bn_running_var.view_as(bn.running_var)

                    # Copy the data to model
                    bn.bias.data.copy_(bn_biases)
                    bn.weight.data.copy_(bn_weights)
                    bn.running_mean.copy_(bn_running_mean)
                    bn.running_var.copy_(bn_running_var)     
                    
                #如果 batch_normalize 的检查结果不是 True，只需要加载卷积层的偏置项。
                else:
                    # Number of biases
                    num_biases = conv.bias.numel()

                    # Load the weights
                    conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
                    ptr = ptr + num_biases

                    # reshape the loaded weights according to the dims of the model weights
                    conv_biases = conv_biases.view_as(conv.bias.data)

                    # Finally copy the data
                    conv.bias.data.copy_(conv_biases)
                    
                # Let us load the weights for the Convolutional layers
                num_weights = conv.weight.numel()

                # Do the same as above for weights
                conv_weights = torch.from_numpy(weights[ptr:ptr + num_weights])
                ptr = ptr + num_weights

                conv_weights = conv_weights.view_as(conv.weight.data)
                conv.weight.data.copy_(conv_weights)                    

                    


创建路由层：首先，我们提取关于层属性的值，将其表示为一个整数，并保存在一个列表中。

 然后我们得到一个新的称为 EmptyLayer 的层，即空的层。  
 
 <font color=red size=4>**为什么要一个空的层？？**</font>  
 
 在 Route 模块中设计一个层，我们必须建立一个 nn.Module 对象，其作为 layers 的成员被初始化。然后，我们可以写下代码，将 forward 函数中的特征图拼接起来并向前馈送。最后，执行网络的某个 forward 函数的这个层。 
 
 但拼接操作的代码相当地简短（在特征图上调用 torch.cat），像上述过程那样设计一个层将导致不必要的抽象，增加样板代码。取而代之，我们可以将一个假的层置于之前提出的路由层的位置上，然后直接在代表 darknet 的 nn.Module 对象的 forward 函数中执行拼接运算。

In [0]:
class EmptyLayer(nn.Module):
     def __init__(self):
            super(EmptyLayer, self).__init__()

### 定义DetectionLayer 保存用于检测边界框的锚点
<font color=red size=4>nn.Module</font>

In [0]:
class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

In [0]:
def get_test_input():
    img = cv2.imread("dog-cycle-car.png")
    img = cv2.resize(img, (416, 416))
    img_ = img[:, :, ::-1].transpose((2, 0, 1))
    img_ = img_[np.newaxis, :, :, :] / 255.0
    img_ = torch.from_numpy(img_).float()
    img_ = Variable(img_)
    return img_

# util.py

In [0]:
from __future__ import division

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
import cv2
import matplotlib.pyplot as plt

<font face=楷体>
YOLO 的输出是一个卷积特征图，包含沿特征图深度的边界框属性。边界框属性由彼此堆叠的单元格预测得出。因此，如果你需要在 (5,6) 处访问单元格的第二个边框，那么你需要通过 map[5,6, (5+C): 2*(5+C)] 将其编入索引。这种格式对于输出处理过程（例如通过目标置信度进行阈值处理、添加对中心的网格偏移、应用锚点等）很不方便。

另一个问题是由于检测是在三个尺度上进行的，预测图的维度将是不同的。虽然三个特征图的维度不同，但对它们执行的输出处理过程是相似的。最好能在单个张量上执行这些运算而不是三个单独张量

为了解决这些问题，引入函数 predict_transform.</font>

![替代文字](https://blog.paperspace.com/content/images/2018/04/bbox_-2.png#pic_center)


<font face=楷体 color=skyblue size=4>**predict_transform 有 5 个参数：**</font>  

<font face=楷体 color=pink>prediction（输出）、inp_dim（输入图像的维度）、anchors、num_classes、CUDA flag（可选）</font>  
<font face=楷体 color=yellow size=4>疑问：</font>  
prediction
~~~
line5 : stride = inp_dim // prediction.size(2) 输入图像维度不是多维的吗？
line12: prediction = prediction.view(batch_size, grid_size * grid_size * num_anchors, bbox_attrs)
line14: anchors = [(a[0] / stride, a[1] / stride) for a in anchors]
line32: x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0) 
line34: prediction[:, :, :2] += x_y_offset
line42： anchors = anchors.repeat(grid_size * grid_size, 1).unsqueeze(0)
line43: prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4]) * anchors  
line48: prediction[:, :, :4] *= stride

~~~

In [0]:
def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA=True):
    
    # 将上图代码化
    batch_size = prediction.size(0)  # 参数0代表第一个维度，例如：prediction是二维的时候size(0)等于其行数
    stride = inp_dim // prediction.size(2) 
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)

    prediction = prediction.view(batch_size, bbox_attrs * num_anchors, grid_size * grid_size)
    prediction = prediction.transpose(1, 2).contiguous()
    prediction = prediction.view(batch_size, grid_size * grid_size * num_anchors, bbox_attrs)
    
    anchors = [(a[0] / stride, a[1] / stride) for a in anchors]
    
    # Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:, :, 0] = torch.sigmoid(prediction[:, :, 0])
    prediction[:, :, 1] = torch.sigmoid(prediction[:, :, 1])
    prediction[:, :, 4] = torch.sigmoid(prediction[:, :, 4])
    
    # Add the center offsets
    grid_len = np.arange(grid_size)
    a, b = np.meshgrid(grid_len, grid_len)  # a,b均为grid_len*grid_len矩阵，互为转置关系

    x_offset = torch.FloatTensor(a).view(-1, 1)
    y_offset = torch.FloatTensor(b).view(-1, 1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0) 

    prediction[:, :, :2] += x_y_offset    
    
    # log space transform height and the width
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size * grid_size, 1).unsqueeze(0)
    prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4]) * anchors   
    
    # Softmax the class scores
    prediction[:, :, 5: 5 + num_classes] = torch.sigmoid((prediction[:, :, 5: 5 + num_classes])) 
    
    prediction[:, :, :4] *= stride  # ??????????????????????
    
    return prediction    

<font face=楷体 color=skyblue>输出满足 objectness 分数阈值和非极大值抑制（NMS），以得到后文所提到的「真实（true）」检测结果  
创建一个名为 write_results 的函数  

<font face=楷体 color=yellow size=4>疑问：</font>
~~~
line5: conf_mask = (prediction[:, :, 4] > confidence).float().unsqueeze(2)
line6: prediction = prediction * conf_mask   
line32: max_conf, max_conf_score = torch.max(image_pred[:, 5:5 + num_classes], 1)
line34: max_conf, max_conf_score = torch.max(image_pred[:, 5:5 + num_classes], 1)
line35: max_conf = max_conf.float().unsqueeze(1)
line37: seq = (image_pred[:, :5], max_conf, max_conf_score)   
line38: image_pred = torch.cat(seq, 1)
line44: image_pred_ = image_pred[non_zero_ind.squeeze(), :].view(-1, 7)  # ????????????
line47: img_classes = unique(image_pred_[:, -1])
~~~

In [0]:
def write_results(prediction, confidence, num_classes, nms=True, nms_conf=0.4):

    # 预测张量包含有关 B x 10647 边界框的信息
    # 对于有低于一个阈值的 objectness 分数的每个边界框，将其每个属性的值（表示该边界框的一整行）都设为零
    conf_mask = (prediction[:, :, 4] > confidence).float().unsqueeze(2)  
    # 例如：torch.Size([3, 4, 5])____[:, :, 4]____>torch.Size([3, 4])____unsqueeze(2)___>torch.Size([3, 4, 1])
    prediction = prediction * conf_mask   
    
    # 现在的边界框属性是由中心坐标以及边界框的高度和宽度决定
    # 使用每个框的两个对角坐标能更轻松地计算两个框的 IoU
    # 将框的 (中心 x, 中心 y, 宽, 高) 属性转换成 (左上角 x, 左上角 y, 右下角 x, 右下角 y)
    box_a = prediction.new(prediction.shape)  # 构建一个具有相同类型的新张量作中间变量
    
    box_a[:, :, 0] = (prediction[:, :, 0] - prediction[:, :, 2] / 2)
    box_a[:, :, 1] = (prediction[:, :, 1] - prediction[:, :, 3] / 2)
    box_a[:, :, 2] = (prediction[:, :, 0] + prediction[:, :, 2] / 2)
    box_a[:, :, 3] = (prediction[:, :, 1] + prediction[:, :, 3] / 2)
    prediction[:, :, :4] = box_a[:, :, :4]
    
    # 每张图像中的「真实」检测结果的数量可能存在差异
    # 比如，一个大小为 3 的 batch 中有 1、2、3 这 3 张图像，它们各自有 5、2、4 个「真实」检测结果
    # 因此，一次只能完成一张图像的置信度阈值设置和 NMS，不能将所涉及的操作向量化，且必须在预测的第一个维度（包含一个 batch 中图像的索引）上循环
    batch_size = prediction.size(0)
    write = False
    for ind in range(batch_size):
        # select the image from the batch
        image_pred = prediction[ind]
        # confidence threshholding
        # NMS    

        # Get the class having maximum score, and the index of that class
        # Get rid of num_classes softmax scores
        # Add the class index and the class score of class having maximum score
        max_conf, max_conf_score = torch.max(image_pred[:, 5:5 + num_classes], 1)  # axis=1纵向比较，返回行索引
        max_conf = max_conf.float().unsqueeze(1)
        max_conf_score = max_conf_score.float().unsqueeze(1)
        seq = (image_pred[:, :5], max_conf, max_conf_score)
        image_pred = torch.cat(seq, 1)   
        
        # Get rid of the zero entries
        non_zero_ind = (torch.nonzero(image_pred[:, 4]))

        try:
            image_pred_ = image_pred[non_zero_ind.squeeze(), :].view(-1, 7)  # ????????????
        except:
            continue  # 跳出本轮循环   
            
        # Get the various classes detected in the image
        img_classes = unique(image_pred_[:, -1])     
        
        # WE will do NMS classwise
        for cls in img_classes:
        
            # get the detections with one particular class
            cls_mask = image_pred_ * (image_pred_[:, -1] == cls).float().unsqueeze(1)
            class_mask_ind = torch.nonzero(cls_mask[:, -2]).squeeze()

            image_pred_class = image_pred_[class_mask_ind].view(-1, 7)

            # sort the detections such that the entry with the maximum objectness
            # confidence is at the top
            conf_sort_index = torch.sort(image_pred_class[:, 4], descending=True)[1]
            image_pred_class = image_pred_class[conf_sort_index]
            idx = image_pred_class.size(0)  
            
                # 执行 NMS
                # For each detection
                for i in range(idx):
                    # Get the IOUs of all boxes that come after the one we are looking at
                    # in the loop
                    try:
                        ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i + 1:])
                    except ValueError:
                        break

                    except IndexError:
                        break

                    # Zero out all the detections that have IoU > treshhold
                    iou_mask = (ious < nms_conf).float().unsqueeze(1)
                    image_pred_class[i + 1:] *= iou_mask

                    # Remove the non-zero entries
                    non_zero_ind = torch.nonzero(image_pred_class[:, 4]).squeeze()
                    image_pred_class = image_pred_class[non_zero_ind].view(-1, 7)
                    
            # 将所得到的检测结果加入到输出张量中
            batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)
            seq = batch_ind, image_pred_class
            if not write:
                output = torch.cat(seq, 1)
                write = True
            else:
                out = torch.cat(seq, 1)
                output = torch.cat((output, out))     
                
    try:
        return output
    except:
        return 0               

<font face=楷体 color=skyblue>同一类别可能会有多个「真实」检测结果，所以我们使用 unique 函数来获取任意给定图像中存在的类别</font>

疑问

In [0]:
def unique(tensor):
    tensor_np = tensor.cpu().numpy()
    unique_np = np.unique(tensor_np)
    unique_tensor = torch.from_numpy(unique_np)

    tensor_res = tensor.new(unique_tensor.shape)
    tensor_res.copy_(unique_tensor)
    return tensor_res

**计算IOU**

疑问：  


line19: inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape).cuda()) * torch.max(
            inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape).cuda())
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1) #加1？？？？？、

In [0]:
def bbox_iou(box1, box2):
    """
    Returns the IoU of two bounding boxes


    """
    # Get the coordinates of bounding boxes
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    # get the corrdinates of the intersection rectangle
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)

    # Intersection area
    if torch.cuda.is_available():
        inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape).cuda()) * torch.max(
            inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape).cuda())
    else:
        inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape)) * torch.max(
            inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape))

    # Union Area
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area)

    return iou

<font face=楷体 color=skyblue>将每个类的索引映射到它的名称字符串，并以字典的形式返回</font>

In [0]:
def load_classes(namesfile):
    fp = open(namesfile, "r")
    names = fp.read().split("\n")[:-1]
    return names

In [0]:
def get_test_input():
    img = cv2.imread("img/dog-cycle-car.png")
    img = cv2.resize(img, (416, 416))
    img_ = img[:, :, ::-1].transpose((2, 0, 1))
    img_ = img_[np.newaxis, :, :, :] / 255.0
    img_ = torch.from_numpy(img_).float()
    img_ = Variable(img_)
    return img_

<font face=楷体 color=skyblue>OpenCV以numpy数组的形式加载图像，BGR作为颜色通道的顺序。PyTorch的图像输入格式为：<font color=pink>**【批量x通道x高x宽】通道顺序为RGB**</font>。  
因此，用函数prep_image将numpy数组转换为PyTorch的输入格式</font>  

<font face=楷体 color=yellow size=4>疑问</font>  
~~~
line9: img = img[:,:,::-1].transpose((2,0,1)).copy()
~~~

In [0]:
def prep_image(img, inp_dim):
    """
    Prepare image for inputting to the neural network. 
    
    Returns a Variable 
    """

    img = cv2.resize(img, (inp_dim, inp_dim))
    img = img[:,:,::-1].transpose((2,0,1)).copy()
    img = torch.from_numpy(img).float().div(255.0).unsqueeze(0)
    return img

函数letterbox_image来调整图像的大小，保持长宽比一致，并用颜色填充未填充的区域(128,128,128)

In [0]:
def letterbox_image(img, inp_dim):
    '''resize image with unchanged aspect ratio using padding'''
    img_w, img_h = img.shape[1], img.shape[0]
    w, h = inp_dim
    new_w = int(img_w * min(w/img_w, h/img_h))
    new_h = int(img_h * min(w/img_w, h/img_h))
    resized_image = cv2.resize(img, (new_w,new_h), interpolation = cv2.INTER_CUBIC)
    
    canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)

    canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,  :] = resized_image
    
    return canvas

# detector.py  
需要的参数：python detect.py --images dog-cycle-car.png --det det

In [0]:
from __future__ import division
import time
import torch 
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import cv2 
#from util import *
import argparse
import os 
import os.path as osp
#from darknet import Darknet
import pickle as pkl
import pandas as pd
import random

**检测文件需要传递命令行参数，用python的ArgParse模块来实现**

In [0]:
def arg_parse():
    """
    Parse arguements to the detect module
    
    """
    
    parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
    
    parser.add_argument("--images", dest = 'images', help = 
                        "Image / Directory containing images to perform detection upon",
                        default = "imgs", type = str)
    parser.add_argument("--det", dest = 'det', help = 
                        "Image / Directory to store detections to",
                        default = "det", type = str)
    parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1)
    parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5)
    parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
    parser.add_argument("--cfg", dest = 'cfgfile', help = 
                        "Config file",
                        default = "cfg/yolov3.cfg", type = str)
    parser.add_argument("--weights", dest = 'weightsfile', help = 
                        "weightsfile",
                        default = "yolov3.weights", type = str)
    parser.add_argument("--reso", dest = 'reso', help = 
                        "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                        default = "416", type = str)
    
    return parser.parse_args()

In [0]:
args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

**加载the class file**

In [0]:
num_classes = 80    #For COCO
classes = load_classes("data/coco.names")

**初始化网络并加载权重**

In [0]:
#Set up the neural network
print("Loading network.....")
model = Darknet(args.cfgfile)
model.load_weights(args.weightsfile)
print("Network successfully loaded")

model.net_info["height"] = args.reso
inp_dim = int(model.net_info["height"])
assert inp_dim % 32 == 0 
assert inp_dim > 32

#If there's a GPU availible, put the model on GPU
if CUDA:
    model.cuda()

#Set the model in evaluation mode
model.eval()

**读取输入图像**

In [0]:
read_dir = time.time()
#Detection phase
try:
    imlist = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images)]
except NotADirectoryError:
    imlist = []
    imlist.append(osp.join(osp.realpath('.'), images))
except FileNotFoundError:
    print ("No file or directory with the name {}".format(images))
    exit()

如果保存检测的目录(由det标志定义)，不存在则创建它

In [0]:
if not os.path.exists(args.det):
    os.makedirs(args.det)

用OpenCV加载图像

In [0]:
load_batch = time.time()
loaded_ims = [cv2.imread(x) for x in imlist]

除转换后的图像，还保留了原始图像列表和im_dim_list，后者包含原始图像的维度。

In [0]:
#PyTorch Variables for images
im_batches = list(map(prep_image, loaded_ims, [inp_dim for x in range(len(imlist))]))

#List containing dimensions of original images
im_dim_list = [(x.shape[1], x.shape[0]) for x in loaded_ims]
im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

if CUDA:
    im_dim_list = im_dim_list.cuda()

创建批次（batches）

In [0]:
leftover = 0
if (len(im_dim_list) % batch_size):
    leftover = 1

if batch_size != 1:
    num_batches = len(imlist) // batch_size + leftover            
    im_batches = [torch.cat((im_batches[i*batch_size : min((i +  1)*batch_size,
                       len(im_batches))]))  for i in range(num_batches)]  

In [0]:
write = 0
start_det_loop = time.time()
for i, batch in enumerate(im_batches):
    #load the image 
    start = time.time()
    if CUDA:
        batch = batch.cuda()

    prediction = model(Variable(batch, volatile = True), CUDA)

    prediction = write_results(prediction, confidence, num_classes, nms_conf = nms_thesh)

    end = time.time()

    if type(prediction) == int:

        for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
            im_id = i*batch_size + im_num
            print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
            print("{0:20s} {1:s}".format("Objects Detected:", ""))
            print("----------------------------------------------------------")
        continue

    prediction[:,0] += i*batch_size    #transform the atribute from index in batch to index in imlist 

    if not write:                      #If we have't initialised output
        output = prediction  
        write = 1
    else:
        output = torch.cat((output,prediction))

    for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
        im_id = i*batch_size + im_num
        objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
        print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
        print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
        print("----------------------------------------------------------")

    if CUDA:
        torch.cuda.synchronize()           

**在图像上绘制边框**


In [0]:
try:
    output
except NameError:
    print ("No detections were made")
    exit()

在绘制边界框之前，输出张量中包含的预测符合网络的输入大小，而不是图像的原始大小。因此，在绘制边界框之前，要将每个边界框的角属性转换为图像的原始维度  
在绘制边界框之前，输出张量中包含的预测是对填充图像的预测，而不是对原始图像的预测。仅仅将它们重新缩放到输入图像的维数在这里是行不通的。首先，我们需要转换要测量的框的坐标相对于包含原始图像的填充图像上区域的边界。

In [0]:
im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())

scaling_factor = torch.min(inp_dim/im_dim_list,1)[0].view(-1,1)


output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim_list[:,0].view(-1,1))/2
output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim_list[:,1].view(-1,1))/2

现在，坐标符合填充区域上图像的尺寸。然而，在函数letterbox_image中，通过缩放因子调整了图像的两个维度的大小(两个维度都用一个公共因子来划分，以保持长宽比)，现在要撤消这个，重新缩放，以获得原始图像上的边框的坐标。

In [0]:
output[:,1:5] /= scaling_factor

Let us now clip any bounding boxes that may have boundaries outside the image to the edges of our image.

In [0]:
for i in range(output.shape[0]):
    output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim_list[i,0])
    output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim_list[i,1])

用不同的颜色画框

In [0]:
class_load = time.time()
colors = pkl.load(open("pallete", "rb"))

开始画框

In [0]:
draw = time.time()

def write(x, results, color):
    c1 = tuple(x[1:3].int())
    c2 = tuple(x[3:5].int())
    img = results[int(x[0])]
    cls = int(x[-1])
    label = "{0}".format(classes[cls])
    cv2.rectangle(img, c1, c2,color, 1)
    t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
    c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
    cv2.rectangle(img, c1, c2,color, -1)
    cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
    return img

In [0]:
list(map(lambda x: write(x, loaded_ims), output))

通过在图像名称前面加上前缀“det_”保存每个图像。我们创建一个地址列表，并将检测图像保存到其中。

In [0]:
det_names = pd.Series(imlist).apply(lambda x: "{}/det_{}".format(args.det,x.split("/")[-1]))

最后，用det_names将检测到的图像写入地址。

In [0]:
list(map(cv2.imwrite, det_names, loaded_ims))
end = time.time()

**打印Time Summary**

In [0]:
print("SUMMARY")
print("----------------------------------------------------------")
print("{:25s}: {}".format("Task", "Time Taken (in seconds)"))
print()
print("{:25s}: {:2.3f}".format("Reading addresses", load_batch - read_dir))
print("{:25s}: {:2.3f}".format("Loading batch", start_det_loop - load_batch))
print("{:25s}: {:2.3f}".format("Detection (" + str(len(imlist)) +  " images)", output_recast - start_det_loop))
print("{:25s}: {:2.3f}".format("Output Processing", class_load - output_recast))
print("{:25s}: {:2.3f}".format("Drawing Boxes", end - draw))
print("{:25s}: {:2.3f}".format("Average time_per_img", (end - load_batch)/len(imlist)))
print("----------------------------------------------------------")


torch.cuda.empty_cache()

# 测试

<div id="测试一">测试一</div>

In [0]:
!mkdir cfg
!cd cfg
!wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
blocks = parse_cfg("yolov3.cfg")
print(create_modules(blocks))

mkdir: cannot create directory ‘cfg’: File exists
--2019-08-22 02:51:16--  https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8342 (8.1K) [text/plain]
Saving to: ‘yolov3.cfg.2’


2019-08-22 02:51:16 (124 MB/s) - ‘yolov3.cfg.2’ saved [8342/8342]

({'type': 'net', 'batch': '64', 'subdivisions': '16', 'width': '608', 'height': '608', 'channels': '3', 'momentum': '0.9', 'decay': '0.0005', 'angle': '0', 'saturation': '1.5', 'exposure': '1.5', 'hue': '.1', 'learning_rate': '0.001', 'burn_in': '1000', 'max_batches': '500200', 'policy': 'steps', 'steps': '400000,450000', 'scales': '.1,.1'}, ModuleList(
  (0): Sequential(
    (conv_0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bia

测试一

In [0]:
!wget https://pjreddie.com/media/files/yolov3.weights
model = Darknet("yolov3.cfg")
model.load_weights("yolov3.weights")

--2019-08-22 03:00:24--  https://pjreddie.com/media/files/yolov3.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 248007048 (237M) [application/octet-stream]
Saving to: ‘yolov3.weights’


2019-08-22 03:00:29 (55.2 MB/s) - ‘yolov3.weights’ saved [248007048/248007048]



In [0]:
!wget wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png

In [0]:
!wget https://pjreddie.com/media/files/yolov3.weights

In [0]:
model = Darknet("yolov3.cfg")
inp = get_test_input()
pred = model(inp, torch.cuda.is_available())
print (pred)

In [0]:
model = Darknet("yolov3.cfg")
model.load_weights("yolov3.weights")

In [0]:
!mkdir data

In [0]:
!cd data

In [0]:
!wget https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names

In [0]:
st0= 'iisongiiihuaniiiigongi'

print(st0.split('i'))

['', '', 'song', '', '', 'huan', '', '', '', 'gong', '']


In [0]:
str="hello boy<[www.doiido.com]>byebye"
st = str.split("[")[1]
print(st)
print(type(st))
s = st.split("]")[0]
print(s)

www.doiido.com]>byebye
<class 'str'>
www.doiido.com


In [0]:
s[1:-1]

'ww.doiido.co'

In [0]:
s[-1]

'm'

In [0]:
 anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]

In [0]:
a = []
for i in range(0, 9,8):
    a.append(i)
print(a)

torch.Size([3, 2, 4, 5])