Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

Week3: Object Detection

1. Object Localization

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

  • 与softmax分类层并联建立一个层,输出表示目标Boundingbox的四个参数,中心点的位置及高度宽度。
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
  • 训练集标签定义如下:
    y^=[pcbxbybhbwc1c2c3]\hat{y}=\left[\begin{matrix} p_c\\ b_x\\ b_y\\ b_h\\ b_w\\ c_1\\ c_2\\ c_3 \end{matrix}\right]
  • 其中pcp_c表示图片中有无目标,1表示有,0表示没有
  • 此时损失函数变成:
    L(y^,y)={(y^1y1)2+(y^2y2)2+...+(y^8y8)2,ify1=1(y^1y1)2,ify1=0L(\hat{y},y)=\begin{cases} (\hat{y}_1-y_1)^2+(\hat{y}_2-y_2)^2+...+(\hat{y}_8-y_8)^2, \quad if \quad y_1=1\\ (\hat{y}_1-y_1)^2,\quad if \quad y_1=0 \end{cases}
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

2. Landmark Detection

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3. Object Detection

3.1 Siding Windows Detection

  • Sliding Windows 将每一次框得的图像输入卷积网络进行分类,会造成非常大的计算量
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3.2 Convolutional implementation of sliding window

  • 5×5×165\times5\times16的卷积层展平为大小为 400400 的全连接层,然后进行一次全连接层的正向传播,其效果等价于直接使用 400400 个大小为5×5×165\times5\times16 的核,进行一次卷积运算。两者学习的参数是一样的,都是400×400400\times400
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
为什么要将全连接层转换为卷积层?
  • 因为卷积变换具有平移等变性,而全连接层没有;
  • 平移等变性:即先对图像进行平移,再进行卷积和先进行卷积再平移得到的结果是一样的;
  • 因此可以设想:将Sliding Window先平移,然后分别输入卷积网络得到的结果,与先将整个图像输入卷积网络,再进行sliding的结果是一样的。
  • 如下图:

[Sermanet at al., 2014, OverFeat: Integrated recognition, localization and detection using convolutional networks]
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

  • 因此,相比于将每一次框出的图像输入卷积网络进行计算,这种方法,可以同时计算出所有的框的结果,只需要在结果中寻找识别成功的boundingbox就可以了。
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3.2 Bounding box predictions

  • 如何提高Bounding box的精确度? - YOLO algorithm

[Redmon et al., 2015, You Only Look Once: Unified real-time object detection]
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

  • bx,by[0,1]b_x,b_y\in [0,1]
  • bh,bw[0,+]b_h,b_w \in[0,+\infin]
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3.3 Intersection over union(IoU)-检验重叠程度的标准

  • It is a function to evaluate your object localization algorithm is accurate or not.
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3.4 Non-max Suppression 获取唯一解的方法

  • Highlight the bounding box with maximum pcp_c;
  • Suppress/Darken the bounding box with non-maximum pcp_c;
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3.5 Anchor Boxes - 解决目标重叠的问题

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

4. YOLO Algorithm - 总结

  • 训练:
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
  • 预测:
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3
  • Non-max supression
    Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

5. Region Proposals - R-CNN - Fast R-CNN - Faster R-CNN

[Girshik et.al, 2013, Rich feature hierarchies for accurate object detection and semantic segmentation]
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

[Girshik, 2015. Fast R-CNN]
[Ren et.al, 2016. Faster R-CNN: Towards real-time object detection with region proposal networks]
Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3