您的位置: 首页 > 文章 > Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

分类: 文章 • 2024-11-06 16:38:28

Week3: Object Detection

文章目录

Week3: Object Detection

1. Object Localization
2. Landmark Detection
3. Object Detection

3.1 Siding Windows Detection
3.2 Convolutional implementation of sliding window

==为什么要将全连接层转换为卷积层？==

3.2 Bounding box predictions
3.3 Intersection over union(IoU)-检验重叠程度的标准
3.4 Non-max Suppression 获取唯一解的方法
3.5 Anchor Boxes - 解决目标重叠的问题

4. YOLO Algorithm - 总结
5. Region Proposals - R-CNN - Fast R-CNN - Faster R-CNN

1. Object Localization

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

与softmax分类层并联建立一个层，输出表示目标Boundingbox的四个参数，中心点的位置及高度宽度。
训练集标签定义如下：
$\hat{y}=\left[\begin{matrix} p_c\\ b_x\\ b_y\\ b_h\\ b_w\\ c_1\\ c_2\\ c_3 \end{matrix}\right]$
其中 $p_c$ 表示图片中有无目标，1表示有，0表示没有
此时损失函数变成：
$L(\hat{y},y)=\begin{cases} (\hat{y}_1-y_1)^2+(\hat{y}_2-y_2)^2+...+(\hat{y}_8-y_8)^2, \quad if \quad y_1=1\\ (\hat{y}_1-y_1)^2,\quad if \quad y_1=0 \end{cases}$

2. Landmark Detection

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

3. Object Detection

3.1 Siding Windows Detection

Sliding Windows 将每一次框得的图像输入卷积网络进行分类，会造成非常大的计算量

3.2 Convolutional implementation of sliding window

将 $5\times5\times16$ 的卷积层展平为大小为 $400$ 的全连接层，然后进行一次全连接层的正向传播，其效果等价于直接使用 $400$ 个大小为 $5\times5\times16$ 的核，进行一次卷积运算。两者学习的参数是一样的，都是 $400\times400$ 。

为什么要将全连接层转换为卷积层？

因为卷积变换具有平移等变性，而全连接层没有；
平移等变性：即先对图像进行平移，再进行卷积和先进行卷积再平移得到的结果是一样的；
因此可以设想：将Sliding Window先平移，然后分别输入卷积网络得到的结果，与先将整个图像输入卷积网络，再进行sliding的结果是一样的。
如下图：

[Sermanet at al., 2014, OverFeat: Integrated recognition, localization and detection using convolutional networks]

因此，相比于将每一次框出的图像输入卷积网络进行计算，这种方法，可以同时计算出所有的框的结果，只需要在结果中寻找识别成功的boundingbox就可以了。

3.2 Bounding box predictions

如何提高Bounding box的精确度？ - YOLO algorithm

[Redmon et al., 2015, You Only Look Once: Unified real-time object detection]

$b_x,b_y\in [0,1]$
$b_h,b_w \in[0,+\infin]$

3.3 Intersection over union(IoU)-检验重叠程度的标准

It is a function to evaluate your object localization algorithm is accurate or not.

3.4 Non-max Suppression 获取唯一解的方法

Highlight the bounding box with maximum $p_c$ ;
Suppress/Darken the bounding box with non-maximum $p_c$ ;

3.5 Anchor Boxes - 解决目标重叠的问题

Notes on Convolutional Neural Networks (from deeplearning.ai) WEEK 3

4. YOLO Algorithm - 总结

训练：
预测：
Non-max supression

5. Region Proposals - R-CNN - Fast R-CNN - Faster R-CNN

[Girshik et.al, 2013, Rich feature hierarchies for accurate object detection and semantic segmentation]

[Girshik, 2015. Fast R-CNN]
[Ren et.al, 2016. Faster R-CNN: Towards real-time object detection with region proposal networks]