Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

faster r-cnn :object detection

  • PASCAL VOC 2007, 2012, and MS COCO
  •  “Person”, “Car” and “Mobile phone” (Google’s Open Images Dataset V4.)

brief explanantion:

  • r-cnn (search selective):  
  1. uses 2,000 proposed areas (rectangular boxes) from search selective
  2. 2,000 areas are passed to a pre-trained CNN model
  3. the outputs (feature maps) are passed to a SVM for classification
  • faste r-cnn:
  1. passes the original image to a pre-trained CNN model
  2. Search selective algorithm is computed base on the output feature map of the previous step
  3. ROI pooling layer is used to ensure the standard and pre-defined output size. (valid outputs are passed to a fully connected layer as inputs.)
  4. two output vectors are used to
    1. predict the observed object with a softmax classifier
    2. adapt bounding box localisations with a linear regressor
  • faster r-cnn:
  1. rpn replace ss

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

RPN is connected to a Conv layer with 3x3 filters, 1 padding, 512 output channels. 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

2.Similar to Fast R-CNN, ROI pooling is used for these proposed regions (ROIs)

3.a softmax function for classification and linear regression to fix the boxes’ location

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

 

 

code explanation:

Part 1: Extract annotation for custom classes from Google’s Open Images Dataset v4 (Bounding Boxes)

 

Download and load three .csv files

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

1. class name

2.download from figure eight

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

3.downloaded the train-annotaion-bbox.csv and train-images-boxable.csv

  • rain-images-boxable.csv
    •  boxable image name
    •  their URL linkFaster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas
  • class-descriptions-boxable.csv
    • class name corresponding to their class LabelNameFaster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas
  • train-annotations-bbox.csv
    • one bounding box (bbox for short) coordinates for one image
    •  bbox’s LabelName and current image’s ID (ImageID+’.jpg’=Image_name)
      • XMin, YMin is the top left point of this bbox
      •  XMax, YMax is the bottom right point of this bbox.  

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

 

Get the subset of the whole dataset

  • 1000 image
  • Person’, ‘Mobile phone’ and ‘Car’ respectively.

 

1. downloading 3000 image in .txt flie

each row :

file_path -- absolute file path

(x1,y1) and (x2,y2) -- top left and bottom right real coordinates of the original image

 class_name: class name of the current bounding box

 

training :80%

test :20%

expected number of training images and testing images should be :

3x800 -> 2400 and 3x200 -> 600      ( maybe overlapped)

 

 

Part 2: Faster R-CNN code

Rebuild the structure of VGG-16 and load pre-trained model (nn_base)

 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Prepare training data and training labels (get_anchor_gt)

 input data: annotation.txt file 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

 

Calculate rpn for each image (calc_rpn)

feature map shape : 18x25=450

anchor sizes=9

potential anchors:  450x9=4050

we set the anchor to positive if the IOU is >0.7

RPN has many more negative than positive regions : turn off some of the negative regions.

limit the total number of positive regions and negative regions to 256

y_is_box_valid : this anchor has an object

y_rpn_overlap :this anchor overlaps with the ground-truth bounding box

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

y_rpn_cls is (1, 18, 25, 18):

feature map size: 18x25 

the fourth shape 18 is from 9x2:

 . 9 anchors

    each anchor has 2 values for y_is_box_valid and y_rpn_overlap respectively.  

 

 y_rpn_regr is (1, 18, 25, 72):

feature map size:18x25

he fourth shape 72 is from 9x4x2:

   9 anchors and each anchor has 4 values for txtytw and th respectively.

   4 value has their own y_is_box_valid and y_rpn_overlap

 

 

 

Calculate region of interest from RPN (rpn_to_roi)

 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

RoIPooling layer and Classifier layer (RoiPoolingConv, classifier_layer)

RoIPooling layer i: process the roi to a specific size output by max pooling.

     input roi is divided into some sub-cells

     applied max pooling to each sub-cell

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Classifier layer: the final layer of the whole model and just behind the RoIPooling layer.

       predict the class name for each input anchor

       the regression of their bounding box.      

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

First, the pooling layer is flattened.
Then, it’s followed with two fully connected layer and 0.5 dropout.
Finally, there are two output layers.
# out_class: softmax activation function for classifying the class name of the object
# out_regr: linear activation function for bboxes coordinates regression

 

Dataset

Car’, ‘Mobile Phone’ and ‘Person’ is 2383, 1108 and 3745 respectively.

Parameters

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

Environment

Google’s Colab with Tesla K80 GPU acceleration for training.

Training time

each epoch:1000

total number of epochs I trained is 114

Every epoch spends around 700 seconds - total time: 22hours

 

Result

two loss functions

RPN model has two output: cls; regression

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

cls low from 20s :

reason 

the accuracy for objectness is already high for the early stage of our training

the accuracy of bounding boxes’ coordinates is still low and needs more time to learn.

 

 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

a similar tendency and even similar loss value: predicting the quite similar value 

predicting objectness is easier than predicting the class name of a bbox

 

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas

sum of four losses above

 mAP (mean average precision) doesn’t increase as the loss decreases:

epoch 60: 0.15

epoch 87: 0.19

epoch 114: 0.13

reason: small number of training images which leads to overfitting of the model

Other things we could tune

1. 300 resized [64, 128, 256]

2.vgg-16 simple structure , but retnet-50 is better 

3.rpn_max_overlap=0.7 and rpn_min_overla=0.3 is the range to differentiate ‘positive’, ‘neutral’ and ‘negative’ for each anchor. overlap_thresh=0.7 is the threshold for non-max-suppression.

Faster R-CNN (object detection) implemented by Keras for custom data from Google’s Open Images Datas