faster r-cnn :object detection

PASCAL VOC 2007, 2012, and MS COCO
“Person”, “Car” and “Mobile phone” (Google’s Open Images Dataset V4.)

brief explanantion:

r-cnn (search selective):

uses 2,000 proposed areas (rectangular boxes) from search selective
2,000 areas are passed to a pre-trained CNN model
the outputs (feature maps) are passed to a SVM for classification

faste r-cnn:

passes the original image to a pre-trained CNN model
Search selective algorithm is computed base on the output feature map of the previous step
ROI pooling layer is used to ensure the standard and pre-defined output size. (valid outputs are passed to a fully connected layer as inputs.)
two output vectors are used to
1. predict the observed object with a softmax classifier
2. adapt bounding box localisations with a linear regressor

faster r-cnn:

rpn replace ss

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

RPN is connected to a Conv layer with 3x3 filters, 1 padding, 512 output channels.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

2.Similar to Fast R-CNN, ROI pooling is used for these proposed regions (ROIs)

3.a softmax function for classification and linear regression to fix the boxes’ location

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

code explanation:

Part 1: Extract annotation for custom classes from Google’s Open Images Dataset v4 (Bounding Boxes)

Download and load three .csv files

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

1. class name

2.download from figure eight

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

3.downloaded the train-annotaion-bbox.csv and train-images-boxable.csv

rain-images-boxable.csv
- boxable image name
- their URL link
class-descriptions-boxable.csv
- class name corresponding to their class LabelName
train-annotations-bbox.csv
- one bounding box (bbox for short) coordinates for one image
- bbox’s LabelName and current image’s ID (ImageID+’.jpg’=Image_name)
  - XMin, YMin is the top left point of this bbox
  - XMax, YMax is the bottom right point of this bbox.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Get the subset of the whole dataset

1000 image
Person’, ‘Mobile phone’ and ‘Car’ respectively.

1. downloading 3000 image in .txt flie

each row :

file_path -- absolute file path

(x1,y1) and (x2,y2) -- top left and bottom right real coordinates of the original image

class_name: class name of the current bounding box

training :80%

test :20%

expected number of training images and testing images should be :

3x800 -> 2400 and 3x200 -> 600 ( maybe overlapped)

Part 2: Faster R-CNN code

Rebuild the structure of VGG-16 and load pre-trained model (`nn_base`)

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Prepare training data and training labels (`get_anchor_gt`)

input data: annotation.txt file

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Calculate rpn for each image (calc_rpn)

feature map shape : 18x25=450

anchor sizes=9

potential anchors: 450x9=4050

we set the anchor to positive if the IOU is >0.7

RPN has many more negative than positive regions : turn off some of the negative regions.

limit the total number of positive regions and negative regions to 256

y_is_box_valid : this anchor has an object

y_rpn_overlap :this anchor overlaps with the ground-truth bounding box

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

y_rpn_cls is (1, 18, 25, 18):

feature map size: 18x25

the fourth shape 18 is from 9x2:

. 9 anchors

each anchor has 2 values for y_is_box_valid and y_rpn_overlap respectively.

y_rpn_regr is (1, 18, 25, 72):

feature map size:18x25

he fourth shape 72 is from 9x4x2:

9 anchors and each anchor has 4 values for tx, ty, tw and th respectively.

4 value has their own y_is_box_valid and y_rpn_overlap

Calculate region of interest from RPN (rpn_to_roi)

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

RoIPooling layer and Classifier layer (RoiPoolingConv, classifier_layer)

RoIPooling layer i: process the roi to a specific size output by max pooling.

input roi is divided into some sub-cells

applied max pooling to each sub-cell

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Classifier layer: the final layer of the whole model and just behind the RoIPooling layer.

predict the class name for each input anchor

the regression of their bounding box.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

First, the pooling layer is flattened.
Then, it’s followed with two fully connected layer and 0.5 dropout.
Finally, there are two output layers.
# out_class: softmax activation function for classifying the class name of the object
# out_regr: linear activation function for bboxes coordinates regression

Dataset

Car’, ‘Mobile Phone’ and ‘Person’ is 2383, 1108 and 3745 respectively.

Parameters

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Environment

Google’s Colab with Tesla K80 GPU acceleration for training.

Training time

each epoch:1000

total number of epochs I trained is 114

Every epoch spends around 700 seconds - total time: 22hours

Result

two loss functions

RPN model has two output: cls; regression

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

cls low from 20s :

reason

the accuracy for objectness is already high for the early stage of our training

the accuracy of bounding boxes’ coordinates is still low and needs more time to learn.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

a similar tendency and even similar loss value: predicting the quite similar value

predicting objectness is easier than predicting the class name of a bbox

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

sum of four losses above

mAP (mean average precision) doesn’t increase as the loss decreases:

epoch 60: 0.15

epoch 87: 0.19

epoch 114: 0.13

reason: small number of training images which leads to overfitting of the model

Other things we could tune

1. 300 resized [64, 128, 256]

2.vgg-16 simple structure , but retnet-50 is better

3.rpn_max_overlap=0.7 and rpn_min_overla=0.3 is the range to differentiate ‘positive’, ‘neutral’ and ‘negative’ for each anchor. overlap_thresh=0.7 is the threshold for non-max-suppression.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

faster r-cnn :object detection

brief explanantion:

code explanation:

Part 1: Extract annotation for custom classes from Google’s Open Images Dataset v4 (Bounding Boxes)

Download and load three .csv files

Get the subset of the whole dataset

Part 2: Faster R-CNN code

Rebuild the structure of VGG-16 and load pre-trained model (nn_base)

Prepare training data and training labels (get_anchor_gt)

Calculate rpn for each image (calc_rpn)

Calculate region of interest from RPN (rpn_to_roi)

RoIPooling layer and Classifier layer (RoiPoolingConv, classifier_layer)

Dataset

Parameters

Environment

Training time

Result

Other things we could tune

相关推荐

Rebuild the structure of VGG-16 and load pre-trained model (`nn_base`)

Prepare training data and training labels (`get_anchor_gt`)