Fast RCNN的训练与测试

1.准备工作

1.1 软件准备

首先，需要安装Caffe和pycaffe。

caffe原作者网页：//caffe.berkeleyvision.org/installation.html

注意：必须在Makefile.config配置文件中打开Python层支持。

# In your Makefile.config, make sure to have this lineuncommented

WITH_PYTHON_LAYER := 1

其次，可能需要Python安装包：cython，python-opencv，easydict

先装一个python包管理器pip：

sudo apt-get install python-pip

再装那三个包：

sudo pip install cython

#sudopip install python-opencv

sudo pip install easydict

再次，可能需要MATLAB，主要用于对PASCALVOC数据集的评估。

1.2 硬件准备

对于训练较小的网络（CaffeNet,VGG_CNN_M_1024），至少需要3G内存的GPU（如：Titan，K20，K40...）

对于训练VGG16，至少需要一个K40（约11G内存），这里我们就不考虑了。

**2.安装（用于demo）**

2.1 从github上clone到Fast RCNN。最好就直接这么clone，不要自己去下载，不然还满麻烦的。

# Make sure to clone with --recursive

git clone --recursivehttps://github.com/rbgirshick/fast-rcnn.git

2.2 生成Cython模块（下面的fast-rcnn都是指fast-rcnn的解压位置）

cd fast-rcnn/lib

make

2.3 生成Caffe和pycaffe

cd fast-rcnn/caffe-fast-rcnn

Fast RCNN中caffe-fast-rcnn文件编译时，由于Fast RCNN中的caffe版本已经比较老，里面一些关于cudnn的配置文件需要从最新的caffe中对应的文件拷贝过来替代掉：

（1）.用最新caffe源码的以下文件替换掉fast_rcnn的对应文件

include/caffe/layers/cudnn_relu_layer.hpp,

src/caffe/layers/cudnn_relu_layer.cpp,

src/caffe/layers/cudnn_relu_layer.cu

include/caffe/layers/cudnn_sigmoid_layer.hpp,

src/caffe/layers/cudnn_sigmoid_layer.cpp,

src/caffe/layers/cudnn_sigmoid_layer.cu

include/caffe/layers/cudnn_tanh_layer.hpp,

src/caffe/layers/cudnn_tanh_layer.cpp,

src/caffe/layers/cudnn_tanh_layer.cu

src/caffe/layers/cudnn_conv_layer.cpp,

src/caffe/layers/cudnn_conv_layer.cu

（2）.用caffe源码中的这个文件替换掉fast_rcnn对应文件

include/caffe/util/cudnn.hpp

（3）.将fast_rcnn 中的src/caffe/layers/cudnn_conv_layer.cu文件中的所有

cudnnConvolutionBackwardData_v3 函数名替换为为 cudnnConvolutionBackwardData
cudnnConvolutionBackwardFilter_v3函数名替换为 cudnnConvolutionBackwardFilter

在运行fast_rcnn文件中tools文件中的demo.py文件时，要把caffe-fast-rcnn文件中的Makefile.config文件中的USE_CUDNN=1注释掉，若是不注释掉这一行，那么在运行demo文件是总会报浮点错误。这个错误在网上找不到确切的解决方法，还好电脑本身的gpu性能非常好，即使不使用cudnn加速来训练和测试后面的数据集，其速度也很快。

做完这些再编译caffe-fast-rcnn：

make -j8 && make pycaffe

2.4下载相关的压缩包

在data/scripts文件夹下有fetch_fast_rcnn_models.sh、fetch_imagenet_models.sh和fetch_selective_search_models.sh三个文件来下载三个训练测试时需要的压缩包。在fast_rcnn路径下执行命令：

./data/scripts/fetch_fast_rcnn_models.sh

./data/scripts/fetch_imagenet_models.sh

./data/scripts /fetch_selective_search_models.sh

——fetch_fast_rcnn_models.sh: 是用来下载Fast RCNN检测器的；

——fetch_imagenet_models.sh：是用来下载作者预先训练好的ImageNet模型；

——fetch_selective_search_models.sh: 是用来下载预先用selective search计算好的objectproposal。

注：这三个压缩包的下载用命令端下载的话可能非常慢或者下不了，建议打开.sh文件直接复制里面的地址用浏览器翻墙自己下载，下载完后在data文件夹下解压好。

3.运行demo

3.1 Python版

cd fast-rcnn

./tools/demo.py

如果用CPU模式，就是

cd fast-rcnn

./tools/demo.py --cpu

demo中是用VGG16网络，在PASCALVOC2007上训练的模型来执行检测的，这个模型比较大，如果把caffe弄崩溃了，可以换一个小一点的网络，其实还更快一点，如

./tools/demo.py --net caffenet

或者

./tools/demo.py --net vgg_cnn_m_1024

或者就用CPU模式好了。

3.2 MATLAB版（暂时没找到编译好的caffe，应该需要配置caffe的MATLAB接口）

在matlab文件夹下打开matlab，下面是我的matlab的安装地址。

cd fast-rcnn/matlab

/usr/local/MATLAB/R2016b/bin/matlab # wait for matlab to start...

把fast-rcnn/caffe-fast-rcnn/matlab下的caffe文件夹拷贝到fast-rcnn/matlab中，为防止内存不够，我们还是以CaffeNet为例，把fast-rcnn-demo.m中的所有VGG16改为CaffeNet。在matlab命令行下输入命令：

>> fast_rcnn_demo

3.3 一些获取object proposal的算法代码

Selective Search: originalmatlab code, python wrapper

EdgeBoxes: matlabcode

GOP and LPO: pythoncode

MCG: matlabcode

RIGOR: matlabcode

4.准备数据集

4.1 首先要下载训练集、验证集、测试集，例子是VOC2007。资源在墙外，将给出百度云盘中的地址。

wget//pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtrainval_06-Nov-2007.tar

wget//pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar

wget//pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCdevkit_08-Jun-2007.tar

4.2 提取所有压缩包到同一个下面称为VOCdevkit 的文件夹下。

tar xvf VOCtrainval_06-Nov-2007.tar

tar xvf VOCtest_06-Nov-2007.tar

tar xvf VOCdevkit_08-Jun-2007.tar

要有这些基本的目录：

VOCdevkit/ # development kit

VOCdevkit/VOCcode/ # VOC utility code

注：VOCcode文件夹并且该文件夹内是有一些文件的，这个文件夹可能会由于VOC训练集、验证集、测试集这三个压缩包没下好的原因丢失掉，那需要重新下载不然测试时会报错。

VOCdevkit/VOC2007 # image sets, annotations, etc.

4.3 创建对VOC2007数据集的symlink，也就是链接fast-rcnn和VOC2007的目录。

cd fast-rcnn/data

ln -s VOCdevkit VOCdevkit2007

这个方法非常好，因为别的工程里面也可能用到这个数据集，这样就不用多次拷贝了，节省了很多存储空间，windows下面就没有。

4.4 可以再用同样的办法得到VOC2010和2012的数据集，如果有需要的话。

5.模型的训练与测试

5.1 训练模型

训练FastR-CNN检测器，以在VOC2007上训练一个CaffeNet的网络为例。

./tools/train_net.py --gpu 0 --solvermodels/CaffeNet/solver.prototxt --weightsdata/imagenet_models/CaffeNet.v2.caffemodel

这里我出现了EnvironmentError: MATLAB command 'matlab' not found.Please add 'matlab' to yourPATH.这种错误，说明没把matlab的路径添加到环境变量中，下面的语句设置环境变量：

export PATH=$PATH:"/usr/local/MATLAB/R2014a/bin"

又提示说ImportError: No module namedyaml，那就下载安装一个：

sudo apt-get installpython-yaml

再次运行代码就可以了。如果显示内存不够，可以用nvidia-smi随时查看内存使用情况。每10000次迭代会生成一个model，结果存放在output文件夹中。

训练VGG_CNN_M_1024网络时，会提示说内存不够，就把fast-rcnn/lib/fast_rcnn下的config.py中每个minibatch所用的图片由2改为1，如果还不行，说明GPU内存太小，只能换GPU了。

./tools/train_net.py --gpu 0 --solvermodels/VGG_CNN_M_1024/solver.prototxt --weightsdata/imagenet_models/VGG_CNN_M_1024.v2.caffemodel

训练VGG16网络，据作者说，即使把每个minibatch所用的图片由2改为1，也需要将近5G的GPU内存，3G以上内存的可以尝试一下，cudnn可能在一定程度上起到了优化作用。

注：在数据集上训练模型时，会调用到一个matlab的路径，所以需要实现安装好matlab（不需要配置caffe的matlab接口），然后添加好matlab的路径。

‚在训练模型开始后有可能会报关于程序的错误，可能和使用models下使用模型中的solver.prototxt和imagenet_models中预训练文件的路径有关，建议使用完整路径。

5.2 测试模型

在自己的模型还没有训练好，或者训练得不够好的时候，可以试试作者提供的模型：

./tools/test_net.py --gpu 0 --def models/CaffeNet/test.prototxt --net data/fast_rcnn_models/caffenet_fast_rcnn_iter_40000.caffemodel

下面再测试自己的模型：

./tools/test_net.py --gpu 0 --def models/CaffeNet/test.prototxt --net output/default/voc_2007_trainval/caffenet_fast_rcnn_iter_40000.caffemodel

测试的结果也在output文件夹中。

注：测试模型时，使用的命令中涉及的模型下的文件脚本也需要使用完整的路径，不然还是会报错；

‚需要在fast-rcnn\data文件下自己新建一个VOCdevkit\results\VOC2007\Main文件。

5.3 用全连接层压缩的SVD来压缩FRCNN模型

./tools/compress_net.py --def models/CaffeNet/test.prototxt --def -svd models/CaffeNet/compressed/test.prototxt --net output/default/voc_2007_trainval/caffenet_fast_rcnn_iter_40000.caffemodel

压缩后的模型和压缩前的模型是放在一起的，只是名字不一样，在output下的相应文件夹下。再测试这个压缩后的模型：

./tools/test_net.py --gpu 0 --def models/CaffeNet/compressed/test.prototxt --net output/default/voc_2007_trainval/vcaffenet_fast_rcnn_iter_40000_svd_fc6_1024_fc7_256.caffemodel

注：所有的都建议使用完整（绝对）路径。

注：在experiments存放配置文件以及运行的log文件，另外这个目录下有scripts 用来获取imagenet的模型，以及作者训练好的fast rcnn模型，以及相应的pascal-voc数据集。运行scripts文件下的.sh文件后会在log文件夹生成整个运行的过程记录，看这个又便于学习整个过程（如）：

·./experiments/scripts/all_vgg_cnn_m_1024.sh o VGG_CNN_M_1024 pascal_voc

(建议使用绝对路径)

KITTI数据集制作成VOC格式的方法：

# 在data/文件夹下新建KITTIdevkit/KITTI两层子目录，所需文件放在KITTI/中

Annotations/

└── 000000.xml

ImageSets/

└── main/

└── trainval.txt

└── test.txt # 等等

JPEGImages/

└── 000000.png

Labels/

└── 000000.txt # 自建文件夹，存放原始标注信息，待转化为xml，不属于VOC格式

create_train_test_txt.py # 3个python工具，后面有详细介绍

modify_annotations_txt.py

txt_to_xml.py

将KITTI数据集使用的.png格式的图片转换成VOC使用的.jpg图片并放在JPEGImages文件夹中：

·ls -1 *.png | xargs -n 1 bash -c ‘convert “$0” “${0% .png}.jpg”’

·rm -rf *.png（在转换图片的路径下删除.png图片，上面指令转换后不会自动删除.png图片）

1、转换KITTI类别

PASCAL VOC数据集总共20个类别，如果用于特定场景，20个类别确实多了。此次博主为数据集设置3个类别， ‘Car’，’Cyclist’，’Pedestrian’，只不过标注信息中还有其他类型的车和人，直接略过有点浪费，博主希望将 ‘Van’, ‘Truck’, ‘Tram’ 合并到 ‘Car’ 类别中去，将 ‘Person_sitting’ 合并到 ‘Pedestrian’ 类别中去（‘Misc’ 和 ‘Dontcare’ 这两类直接忽略）。这里使用的是modify_annotations_txt.py工具，源码如下：

#!/usr/bin/env python

# -*- coding: UTF-8 -*-

# modify_annotations_txt.py

import glob

import string

txt_list = glob.glob('./Labels/*.txt') # 存储Labels文件夹所有txt文件路径

def show_category(txt_list):

category_list= []

for item in txt_list:

try:

with open(item) as tdf:

for each_line in tdf:

labeldata = each_line.strip().split(' ') # 去掉前后多余的字符并把其分开

category_list.append(labeldata[0]) # 只要第一个字段，即类别

except IOError as ioerr:

print('File error:'+str(ioerr))

print(set(category_list)) # 输出集合

def merge(line):

each_line=''

for i in range(len(line)):

if i!= (len(line)-1):

each_line=each_line+line[i]+' '

else:

each_line=each_line+line[i] # 最后一条字段后面不加空格

each_line=each_line+'\n'

return (each_line)

print('before modify categories are:\n')

show_category(txt_list)

for item in txt_list:

new_txt=[]

try:

with open(item, 'r') as r_tdf:

for each_line in r_tdf:

labeldata = each_line.strip().split(' ')

if labeldata[0] in ['Truck','Van','Tram','Car']: # 合并汽车类

labeldata[0] = labeldata[0].replace(labeldata[0],'car')

if labeldata[0] in ['Person_sitting','Cyclist','Pedestrian']: # 合并行人类

labeldata[0]=labeldata[0].replace(labeldata[0],

'pedestrian')

if labeldata[0] == 'DontCare': # 忽略Dontcare类

continue

if labeldata[0] == 'Misc': # 忽略Misc类

continue

new_txt.append(merge(labeldata)) # 重新写入新的txt文件

with open(item,'w+') as w_tdf: # w+是打开原文件将内容删除，另写新内容进去

for temp in new_txt:

w_tdf.write(temp)

except IOError as ioerr:

print('File error:'+str(ioerr))

print('\nafter modify categories are:\n')

show_category(txt_list)

执行命令：python modify_annotations_txt.py 来运行py程序

**2、转换txt标注信息为xml格式**

对原始txt文件进行上述处理后，接下来需要将标注文件从txt转化为xml，并去掉标注信息中用不上的部分，只留下3类，还有把坐标值从float型转化为int型，最后所有生成的xml文件要存放在Annotations文件夹中。这里使用的是txt_to_xml.py工具：

#!/usr/bin/env python

# -*- coding: UTF-8 -*-

# txt_to_xml.py

# 根据一个给定的XML Schema，使用DOM树的形式从空白文件生成一个XML

from xml.dom.minidom import Document

import cv2

import os

def generate_xml(name,split_lines,img_size,class_ind):

doc = Document() # 创建DOM文档对象

annotation = doc.createElement('annotation')

doc.appendChild(annotation)

title = doc.createElement('folder')

title_text = doc.createTextNode('VOC2007')

title.appendChild(title_text)

annotation.appendChild(title)

img_name=name+'.jpg'

title = doc.createElement('filename')

title_text = doc.createTextNode(img_name)

title.appendChild(title_text)

annotation.appendChild(title)

source = doc.createElement('source')

annotation.appendChild(source)

title = doc.createElement('database')

title_text = doc.createTextNode('The VOC2007 Database')

title.appendChild(title_text)

source.appendChild(title)

title = doc.createElement('annotation')

title_text = doc.createTextNode('PASCAL VOC2007')

title.appendChild(title_text)

source.appendChild(title)

size = doc.createElement('size')

annotation.appendChild(size)

title = doc.createElement('width')

title_text = doc.createTextNode(str(img_size[1]))

title.appendChild(title_text)

size.appendChild(title)

title = doc.createElement('height')

title_text = doc.createTextNode(str(img_size[0]))

title.appendChild(title_text)

size.appendChild(title)

title = doc.createElement('depth')

title_text = doc.createTextNode(str(img_size[2]))

title.appendChild(title_text)

size.appendChild(title)

for split_line in split_lines:

line=split_line.strip().split()

if line[0] in class_ind:

object = doc.createElement('object')

annotation.appendChild(object)

title = doc.createElement('name')

title_text = doc.createTextNode(line[0])

title.appendChild(title_text)

object.appendChild(title)

title = doc.createElement('difficult')

title_text = doc.createTextNode('0')

title.appendChild(title_text)

object.appendChild(title)

bndbox = doc.createElement('bndbox')

object.appendChild(bndbox)

title = doc.createElement('xmin')

title_text = doc.createTextNode(str(int(float(line[4]))))

title.appendChild(title_text)

bndbox.appendChild(title)

title = doc.createElement('ymin')

title_text = doc.createTextNode(str(int(float(line[5]))))

title.appendChild(title_text)

bndbox.appendChild(title)

title = doc.createElement('xmax')

title_text = doc.createTextNode(str(int(float(line[6]))))

title.appendChild(title_text)

bndbox.appendChild(title)

title = doc.createElement('ymax')

title_text = doc.createTextNode(str(int(float(line[7]))))

title.appendChild(title_text)

bndbox.appendChild(title)

# 将DOM对象doc写入文件

f = open('Annotations/'+name+'.xml','w')

f.write(doc.toprettyxml(indent = ''))

f.close()

if __name__ == '__main__':

class_ind=('pedestrian', 'car')

cur_dir=os.getcwd()

labels_dir=os.path.join(cur_dir,'Labels')

for parent, dirnames, filenames in os.walk(labels_dir): # 分别得到根目录，子目录和根目录下文件

for file_name in filenames:

full_path=os.path.join(parent, file_name) # 获取文件全路径

#print full_path

f=open(full_path)

split_lines = f.readlines()

name= file_name[:-4] # 后四位是扩展名.txt，只取前面的文件名

#print name

img_name=name+'.jpg'

img_path=os.path.join('./JPEGImages',img_name) # 路径需要自行修改

#print img_path

img_size=cv2.imread(img_path).shape

generate_xml(name,split_lines,img_size,class_ind)

print('all txts has converted into xmls')

执行命令：python txt_to_xml.py 来运行py程序

3、生成训练验证集和测试集列表

最后，在相同路径下创建文件夹ImageSets及其子文件夹Main，Layout和Segmentation，使用python3运行create_train_test_txt.py生成Main的txt文件。Main子文件夹，这个文件夹存放的是训练验证集，测试集的相关列表文件，如下图所示：

create_train_test_txt.py代码如下：

# create_train_test_txt.py

# encoding:utf-8

import pdb

import glob

import os

import random

import math

def get_sample_value(txt_name, category_name):

label_path = './Labels/'

txt_path = label_path + txt_name+'.txt'

try:

with open(txt_path) as r_tdf:

if category_name in r_tdf.read():

return ' 1'

else:

return '-1'

except IOError as ioerr:

print('File error:'+str(ioerr))

txt_list_path = glob.glob('./Labels/*.txt')

txt_list = []

for item in txt_list_path:

temp1,temp2 = os.path.splitext(os.path.basename(item))

txt_list.append(temp1)

txt_list.sort()

print(txt_list, end = '\n\n')

num_trainval = random.sample(txt_list, math.floor(len(txt_list)*9/10.0))

num_trainval.sort()

print(num_trainval, end = '\n\n')

num_train = random.sample(num_trainval,math.floor(len(num_trainval)*8/9.0))

num_train.sort()

print(num_train, end = '\n\n')

num_val = list(set(num_trainval).difference(set(num_train)))

num_val.sort()

print(num_val, end = '\n\n')

num_test = list(set(txt_list).difference(set(num_trainval)))

num_test.sort()

print(num_test, end = '\n\n')

pdb.set_trace()

Main_path = './ImageSets/Main/'

train_test_name = ['trainval','train','val','test']

category_name = ['Car','Pedestrian']

# 循环写trainvl train val test

for item_train_test_name in train_test_name:

list_name = 'num_'

list_name += item_train_test_name

train_test_txt_name = Main_path + item_train_test_name + '.txt'

try:

# 写单个文件

with open(train_test_txt_name, 'w') as w_tdf:

# 一行一行写

for item in eval(list_name):

w_tdf.write(item+'\n')

# 循环写Car Pedestrian

for item_category_name in category_name:

category_txt_name = Main_path + item_category_name + '_' + item_train_test_name + '.txt'

with open(category_txt_name, 'w') as w_tdf:

# 一行一行写

for item in eval(list_name):

w_tdf.write(item+' '+ get_sample_value(item, item_category_name)+'\n')

except IOError as ioerr:

print('File error:'+str(ioerr))

执行：python3 create_train_test_txt.py ，执行程序过程中，如遇到pdb提示，可按c键，再按enter键。

Fast RCNN的训练与测试

1.准备工作

1.1 软件准备

1.2 硬件准备

2.安装（用于demo）

2.1 从github上clone到Fast RCNN。最好就直接这么clone，不要自己去下载，不然还满麻烦的。

2.2 生成Cython模块（下面的fast-rcnn都是指fast-rcnn的解压位置）

2.3 生成Caffe和pycaffe

（1）.用最新caffe源码的以下文件替换掉fast_rcnn的对应文件

（2）.用caffe源码中的这个文件替换掉fast_rcnn对应文件

（3）.将fast_rcnn 中的src/caffe/layers/cudnn_conv_layer.cu文件中的所有

2.4下载相关的压缩包

3.运行demo

3.1 Python版

3.2 MATLAB版（暂时没找到编译好的caffe，应该需要配置caffe的MATLAB接口）

3.3 一些获取object proposal的算法代码

4.准备数据集

4.1 首先要下载训练集、验证集、测试集，例子是VOC2007。资源在墙外，将给出百度云盘中的地址。

4.2 提取所有压缩包到同一个下面称为VOCdevkit 的文件夹下。

4.3 创建对VOC2007数据集的symlink，也就是链接fast-rcnn和VOC2007的目录。

4.4 可以再用同样的办法得到VOC2010和2012的数据集，如果有需要的话。

5.模型的训练与测试

5.2 测试模型

5.3 用全连接层压缩的SVD来压缩FRCNN模型

KITTI数据集制作成VOC格式的方法：

1、转换KITTI类别

2、转换txt标注信息为xml格式

3、生成训练验证集和测试集列表

相关推荐

**2.安装（用于demo）**

**2、转换txt标注信息为xml格式**