七扭八歪解faster rcnn（keras版）（一）

首先看流程图

这里的原始图片经过conv layer提取出特征图，这个conv layer可以为vgg16或者其他卷积网络，然后特征图流向两个通道，一个是RPN（region proposal network），先看这个分支

七扭八歪解faster rcnn（keras版）（一）

anchor有两个参数，看代码（我拿的这个实现代码不好，很多地方匪夷所思装神弄鬼）

首先，从前边传来两个anchor的参数

def calc_rpn(C, img_data, width, height, resized_width, resized_height):

   downscale = float(C.rpn_stride)

anchor_sizes = C.anchor_box_scales
anchor_ratios = C.anchor_box_ratios

# anchor box scales
self.anchor_box_scales = [64, 128, 256, 512]

# anchor box ratios
self.anchor_box_ratios = [[1, 1], [1, 2], [2, 1]]

在config文件中可见anchor的默认设置（但我看有的文章上写默认设置的scales为128,256,512三种，这里有四种）

七扭八歪解faster rcnn（keras版）（一）

看图，在特征图上的每个特征点预测多个region proposals。具体作法是：把每个特征点映射回映射回原图的感受野的中心点当成一个基准点，然后围绕这个基准点选取k个不同scale、aspect ratio的anchor。

再看代码，里面downscale就是从原图到特征图的收缩系数，

# stride at the RPN (this depends on the network configuration)
self.rpn_stride = 16

depends on the network configuration

# size to resize the smallest side of the image
self.im_size = 300

# get image dimensions for resizing
resized_width, resized_height, _ = get_new_img_size(width, height, C.im_size)

def get_new_img_size(width, height, img_min_side=600):
   """
   Get the resized shape, keeping the same ratio
   """
   if width <= height:
      f = float(img_min_side) / width
      resized_height = int(f * height)
      resized_width = img_min_side
   else:
      f = float(img_min_side) / height
      resized_width = int(f * width)
      resized_height = img_min_side

   return resized_width, resized_height, f

暂时没看出来问为啥要reshape宽高，效果是让宽高最小是你设定那个值，然后

# resize the image so that smalles side is length = 600px
x_img = cv2.resize(x_img, (resized_width, resized_height), interpolation=cv2.INTER_CUBIC)

y_rpn_cls, y_rpn_regr = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height)

这里把原始的图像宽高和resized以后的宽高都传了进去

计算bounding box（bbox）是基于resized后的宽高，（x1，y1，x2，y2分别为左上右下）代码可见

# rpn ground truth
for anchor_size_idx in range(len(anchor_sizes)):
   for anchor_ratio_idx in range(n_anchratios):
      anchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0]
      anchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1]  
      
      for ix in range(output_width):             
         # x-coordinates of the current anchor box  
         x1_anc = downscale * (ix + 0.5) - anchor_x / 2
         x2_anc = downscale * (ix + 0.5) + anchor_x / 2 
         
         # ignore boxes that go across image boundaries             
         if x1_anc < 0 or x2_anc > resized_width:
            continue
            
         for jy in range(output_height):

            # y-coordinates of the current anchor box
            y1_anc = downscale * (jy + 0.5) - anchor_y / 2
            y2_anc = downscale * (jy + 0.5) + anchor_y / 2

            # ignore boxes that go across image boundaries
            if y1_anc < 0 or y2_anc > resized_height:
               continue

            # bbox_type indicates whether an anchor should be a target 
            bbox_type = 'neg'

            # this is the best IOU for the (x,y) coord and the current anchor
            # note that this is different from the best IOU for a GT bbox
            best_iou_for_loc = 0.0

            for bbox_num in range(num_bboxes):
               
               # get IOU of the current GT box and the current anchor box
               curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])
               # calculate the regression targets if they will be needed
               if curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:
                  cx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0
                  cy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0
                  cxa = (x1_anc + x2_anc)/2.0
                  cya = (y1_anc + y2_anc)/2.0

                  tx = (cx - cxa) / (x2_anc - x1_anc)
                  ty = (cy - cya) / (y2_anc - y1_anc)
                  tw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))
                  th = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))

然后基于output的宽高计算出anchor的左上，右下坐标，超出原图的不要（我很奇怪这里为啥用output的size，不应该是width和height嘛，因为downscale也是根据width和height计算出来的，output的宽高已经发生变化了啊？是不是实现这个代码这货实现的有问题，等我看看其他的实现再回来看看）

然后计算anchor和人工标注的样本里面bbox（所谓的ground truth值）的IOU（intersection over union)，如果该值大于

best_iou_for_bbox[bbox_num]或者C.rpn_max_overlap（该值默认为0.7）

（在此我又纳闷，在该代码中，bes_iou_for_bbox全部被初始化为0，那岂不是每个得到的anchor都会进行下边的计算）

则计算出anchor和我们bbox的中心点，然后计算中心点偏移量和宽高缩放。

参考文章链接：

https://zhuanlan.zhihu.com/p/28585873

https://zhuanlan.zhihu.com/p/24916624

http://blog.****.net/shenxiaolu1984/article/details/51152614

七扭八歪解faster rcnn（keras版）（一）

相关推荐