SSD 安装、训练、测试(ubuntu14.04+cuda7.5+openvc2.4.9)
安装步骤
1.安装git,下载SSD源码包
sudo apt-get install git
git clone https://github.com/weiliu89/caffe.git
cd caffe
git checkout ssd
以下几条命令是验证相应的包是否齐全
sudo apt-get install python-pip
sudo apt-get install python-numpy
sudo apt-get install python-scipy
pip install cython -ihttp://pypi.douban.com/simple
pip install eaydict
也可以指定git clone 存放地址(到指定目录下,运行以上命令就行了)
2.修改Makefile.config文件
复制根目录下的Makefile.config.example为Makefile.config
根据本机环境,调整以下参数:
CUDA_ARCH:
BLAS:
MATLAB_DIR:(可选)
PYTHON_INCLUDE:
3.编译
在源码包的根目录下运行以下命令:
make -j8
make py
make test -j8
make runtest -j8(可选)
4.编译错误分析
注意:cuda8.0要将gcc升级到5.0,否则就会出现上图的错误。错误链接https://github.com/weiliu89/caffe/issues/237
The problem was that to make caffe with CUDA 8 it is necessary a 5.3 or 5.4 GCC version.sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5
错误1
如果有多GPU,运行make runtest 出错
解决方案:export CUDA_VISIBLE_DEVICES=0; makeruntest -j8
如果出现错误: check failed :error == cudasuccess(10 vs. 0) invaliddevice ordinal
解决方案:首先需要确保使用的是特定的GPU,或者尝试
unsetCUDA_VISIBLE_DEVICES
错误2
使用caffe时编译出错
include and lib
使用自己机器编译的include和lib (caffe/build/lib, caffe/include)
caffe.pb.h丢失问题:
/home/xxx/caffe/include/caffe/blob.hpp:9:34:fatal error: caffe/proto/caffe.pb.h: No such file or directory
#include "caffe/proto/caffe.pb.h"
解决方法: 用protoc从caffe/src/caffe/proto/caffe.proto生成caffe.pb.h和caffe.pb.cc
[email protected]:~/caffe/src/caffe/proto$protoc --cpp_out=/home/xxx/caffe/include/caffe/ caffe.proto
错误3
stdc++
linker error:
/usr/bin/ld:caffe_cnn_handler.o: undefined reference to symbol'[email protected]@GLIBCXX_3.4'
//usr/lib/x86_64-linux-gnu/libstdc++.so.6:error adding symbols: DSO missing from command line
解决方案:是找不到libstdc++.so.6,解决方法是在Makefile中加入:
LIBS +=-L/usr/lib/x86_64-linux-gnu -lstdc++
测试步骤
1.下载官网提供的模型,解压放到/caffe/models/
比如:models_VGGNet_VOC0712_SSD_300x300.tar.gz,
解压出来的是models文件夹,把这个文件夹里面的VGGNet拷贝放到caffe/models/下
2.测试
源码包根目录下运行:
pythonexamples/ssd/score_ssd_pascal.py (数值在0.718左右)(老版本使用)
pythonexamles/ssd/ssd_pascal_webcam.py
pythonexamles/ssd/ssd_pascal_video.py
3.错误解析
错误1
提示:no module named caffe
在score_ssd_pascal.py/ssd_pascal_webcam.py/ssd_pascal_video.py等对应脚本中添加
import sys
sys.path.insert(0,'/home/xxx/caffe/python')
训练步骤
1.制作自己的数据集(与faster rcnn类似)可参考我的另一篇博文:faster rcnn的安装、训练、调试
①新建
(1)data/VOCdevkit/VOC2007新建 Annotations;ImageSets/Main;JPEGImages
说明:
Annotations:保存标签txt转换的xml文件
JPEGImages: 图片文件
ImageSets/Main:文件名列表(不含后缀)
训练集: train.txt
训练验证集: trainval.txt
测试集: test.txt
验证集: val.txt
②拷贝
将data/VOC0712下面的create_list.sh、create_data.sh、labelmap_voc.prototxt拷贝到data/VOCdevkit2007/VOC2007/
③修改接口
**create_list.sh**:修改3处
1.root_dir=$HOME/data/VOCdevkit/
改写为 root_dir=$HOME/caffe/data/VOCdevkit/
2.for name inVOC2007 VOC2012
改写为 for name in VOC2007
3.$bash_dir/../../build/tools/get_image_size
改写为 $HOME/caffe/build/tools/get_image_size
**create_data.sh**修改5处
1.root_dir=$cur_dir/../..
改写为 root_dir=$HOME/caffe
2.data_root_dir="$HOME/data/VOCdevkit"
改写为 data_root_dir="$HOME/caffe/data/VOCdevkit"
3.dataset_name="VOC0712"
改写为 dataset_name="VOC2007"
4.mapfile="$root_dir/data/$dataset_name/labelmap_voc.prototxt"
改写为 mapfile="$root_dir/data/VOCdevkit/$dataset_name/labelmap_voc.prototxt"
5.python$root_dir/scripts/create_annoset.py --anno-type=$anno_type--label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim--resize-width=$width --resize-height=$height --check-label $extra_cmd$data_root_dir $root_dir/data/$dataset_name/$subset.txt$data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$dbexamples/$dataset_name
改写为
python$root_dir/scripts/create_annoset.py --anno-type=$anno_type--label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim--resize-width=$width --resize-height=$height --check-label $extra_cmd$data_root_dir $root_dir/data/VOCdevkit/$dataset_name/$subset.txt$data_root_dir/$dataset_name/$db/$dataset_name"_"$subset"_"$dbexamples/$dataset_name
**labelmap_voc.prototxt**
需要注意是label需要小写,删除多余的label,保留label=0的背景,以及自己数据的name和label
例如:
item {
name: "none_of_the_above"
label: 0
display_name: "background"
}
item {
name: "face"
label: 1
display_name: "face"
}
item {
name: "pedestrian"
label: 2
display_name: "pedestrian"
}
2. 转换成 LMDB文件
到 caffe/examples 路径下新建VOC2007文件夹,用于创建LMDB文件软连接
然后到根目录下运行已经修改的sh文件
./data/VOCdevkit/VOC2007/create_list.sh
./data/VOCdevkit/VOC2007/create_data.sh
如果出现: nomoudle named caffe/caffe-proto,
则在终端输入:exportPYTHONPATH=$PYTHONPATH:/home/**(服务器的名字)/caffe/python
如果依然不行,打开 ./scripts/creta_annosetpy
在import sys后添加以下代码:
import os.path asosp
defadd_path(path):
if path not in sys.path:
sys.path.insert(0,path)
caffe_path =osp.join('/home/****/caffe/python')
add_path(caffe_path)
3.
如果是直接使用他人已经制作好的LMDB 文件,则只需创建链接文件
到 ./scripts 创建 create_link.py 文件,并粘贴如下代码:
import argparse
import os
import shutil
import subprocess
import sys
from caffe.protoimport caffe_pb2
fromgoogle.protobuf import text_format
example_dir ='/home/li/caffe/examples/VOC2007'
out_dir ='/home/***/caffe/data/VOCdevkit/VOC2007/lmdb'
lmdb_name =['VOC2007_test_lmdb', 'VOC2007_trainval_lmdb']
# checkexample_dir is exist
if notos.path.exists(example_dir):
os.makedirs(example_dir)
for lmdb_sub inlmdb_name:
link_dir = os.path.join(example_dir,lmdb_sub)
# check lin_dir is exist
if os.path.exists(link_dir):
os.unlink(link_dir)
os.symlink(os.path.join(out_dir,lmdb_sub),link_dir)
4.
下载预训练模型
下载预训练模型VGG_ILSVRC_16_layers_fc_reduced.caffemodel,放在 ./models/VGGNet/路径下
5.
修改./examples/ssd/ssd_pascal.py脚本
需要修改的地方在对应行之后用######标注出来了
from __future__import print_function
import sys######
sys.path.insert(0,'/XXX/caffe/python')######添加路径SSD/caffe/python路径,防止找不到caffe
import caffe
fromcaffe.model_libs import *
fromgoogle.protobuf import text_format
import math
import os
import shutil
import stat
import subprocess
# Add extra layerson top of a "base" network (e.g. VGGNet or Inception).
defAddExtraLayers(net, use_batchnorm=True, lr_mult=1):
use_relu = True
# Add additional convolutional layers.
# 19 x 19
from_layer = net.keys()[-1]
# TODO(weiliu89): Construct the name usingthe last layer to avoid duplication.
# 10 x 10
out_layer = "conv6_1"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm,use_relu, 256, 1, 0, 1,
lr_mult=lr_mult)
from_layer = out_layer
out_layer = "conv6_2"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 512, 3, 1, 2,
lr_mult=lr_mult)
# 5 x 5
from_layer = out_layer
out_layer = "conv7_1"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)
from_layer = out_layer
out_layer = "conv7_2"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 1, 2,
lr_mult=lr_mult)
# 3 x 3
from_layer = out_layer
out_layer = "conv8_1"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)
from_layer = out_layer
out_layer = "conv8_2"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 0, 1,
lr_mult=lr_mult)
# 1 x 1
from_layer = out_layer
out_layer = "conv9_1"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)
from_layer = out_layer
out_layer = "conv9_2"
ConvBNLayer(net, from_layer, out_layer,use_batchnorm, use_relu, 256, 3, 0, 1,
lr_mult=lr_mult)
return net
### Modify thefollowing parameters accordingly ###
# The directorywhich contains the caffe code.
# We assume youare running the script at the CAFFE_ROOT.
caffe_root =os.getcwd()
# Set true if youwant to start training right after generating all files.
run_soon = True
# Set true if youwant to load from most recently saved snapshot.
# Otherwise, wewill load from the pretrain_model defined below.
resume_training =True
# If true, Removeold model files.
remove_old_models= False
# The databasefile for training data. Created by data/VOC0712/create_data.sh
train_data ="examples/VOC2007/VOC2007_trainval_lmdb"######
# The databasefile for testing data. Created by data/VOC0712/create_data.sh
test_data ="examples/VOC2007/VOC2007_test_lmdb"######
# Specify thebatch sampler.
resize_width = 300######
resize_height =300######
resize ="{}x{}".format(resize_width, resize_height)
batch_sampler = [
{
'sampler': {
},
'max_trials': 1,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'min_jaccard_overlap':0.1,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'min_jaccard_overlap':0.3,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'min_jaccard_overlap':0.5,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'min_jaccard_overlap':0.7,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.9,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio':0.5,
'max_aspect_ratio':2.0,
},
'sample_constraint': {
'max_jaccard_overlap':1.0,
},
'max_trials': 50,
'max_sample': 1,
},
]
train_transform_param= {
'mirror': True,
'mean_value': [104, 117, 123],
'resize_param': {
'prob': 1,
'resize_mode': P.Resize.WARP,
'height': resize_height,
'width': resize_width,
'interp_mode': [
P.Resize.LINEAR,
P.Resize.AREA,
P.Resize.NEAREST,
P.Resize.CUBIC,
P.Resize.LANCZOS4,
],
},
'distort_param': {
'brightness_prob': 0.5,
'brightness_delta': 32,
'contrast_prob': 0.5,
'contrast_lower': 0.5,
'contrast_upper': 1.5,
'hue_prob': 0.5,
'hue_delta': 18,
'saturation_prob': 0.5,
'saturation_lower': 0.5,
'saturation_upper': 1.5,
'random_order_prob': 0.0,
},
'expand_param': {
'prob': 0.5,
'max_expand_ratio': 4.0,
},
'emit_constraint': {
'emit_type':caffe_pb2.EmitConstraint.CENTER,
}
}
test_transform_param= {
'mean_value': [104, 117, 123],
'resize_param': {
'prob': 1,
'resize_mode': P.Resize.WARP,
'height': resize_height,
'width': resize_width,
'interp_mode':[P.Resize.LINEAR],
},
}
# If true, usebatch norm for all newly added layers.
# Currently onlythe non batch norm version has been tested.
use_batchnorm =False
lr_mult = 1
# Use differentinitial learning rate.
if use_batchnorm:
base_lr = 0.0004
else:
# A learning rate for batch_size = 1,num_gpus = 1.
base_lr = 0.000004######
# Modify the jobname if you want.
job_name ="SSD_{}".format(resize)
# The name of themodel. Modify it if you want.
model_name ="VGG_VOC2007_{}".format(job_name)######
# Directory whichstores the model .prototxt file.
save_dir ="models/VGGNet/VOC2007/{}".format(job_name)######
# Directory whichstores the snapshot of models.
snapshot_dir ="models/VGGNet/VOC2007/{}".format(job_name)######
# Directory whichstores the job script and log file.
job_dir ="jobs/VGGNet/VOC2007/{}".format(job_name)######
# Directory whichstores the detection results.
output_result_dir= "{}/data/VOCdevkit/results/VOC2007/{}/Main".format(os.environ['HOME'],job_name)######
# model definitionfiles.
train_net_file ="{}/train.prototxt".format(save_dir)
test_net_file ="{}/test.prototxt".format(save_dir)
deploy_net_file ="{}/deploy.prototxt".format(save_dir)
solver_file ="{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix ="{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file ="{}/{}.sh".format(job_dir, model_name)
# Stores the testimage names and sizes. Created by data/VOC0712/create_list.sh
name_size_file ="data/VOCdevkit/VOC2007/test_name_size.txt"######
# The pretrainedmodel. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model ="models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"######
# StoresLabelMapItem.
label_map_file ="data/VOCdevkit/VOC2007/labelmap_voc.prototxt"######
# MultiBoxLossparameters.
num_classes = 2######
share_location =True
background_label_id=0
train_on_diff_gt =True
normalization_mode= P.Loss.VALID
code_type =P.PriorBox.CENTER_SIZE
ignore_cross_boundary_bbox= False
mining_type =P.MultiBoxLoss.MAX_NEGATIVE
neg_pos_ratio = 3.
loc_weight =(neg_pos_ratio + 1.) / 4.
multibox_loss_param= {
'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,
'conf_loss_type': P.MultiBoxLoss.SOFTMAX,
'loc_weight': loc_weight,
'num_classes': num_classes,
'share_location': share_location,
'match_type':P.MultiBoxLoss.PER_PREDICTION,
'overlap_threshold': 0.5,
'use_prior_for_matching': True,
'background_label_id': background_label_id,
'use_difficult_gt': train_on_diff_gt,
'mining_type': mining_type,
'neg_pos_ratio': neg_pos_ratio,
'neg_overlap': 0.5,
'code_type': code_type,
'ignore_cross_boundary_bbox':ignore_cross_boundary_bbox,
}
loss_param = {
'normalization': normalization_mode,
}
# parameters forgenerating priors.
# minimumdimension of input image
min_dim = 300
# conv4_3 ==>38 x 38
# fc7 ==> 19 x19
# conv6_2 ==>10 x 10
# conv7_2 ==> 5x 5
# conv8_2 ==> 3x 3
# conv9_2 ==> 1x 1
mbox_source_layers= ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']
# in percent %
min_ratio = 20
max_ratio = 90
step =int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio inxrange(min_ratio, max_ratio + 1, step):
min_sizes.append(min_dim * ratio / 100.)
max_sizes.append(min_dim * (ratio + step) /100.)
min_sizes =[min_dim * 10 / 100.] + min_sizes
max_sizes =[min_dim * 20 / 100.] + max_sizes
steps = [8, 16,32, 64, 100, 300]
aspect_ratios =[[2], [2, 3], [2, 3], [2, 3], [2], [2]]
# L2 normalizeconv4_3.
normalizations =[20, -1, -1, -1, -1, -1]
# variance used toencode/decode prior bboxes.
if code_type ==P.PriorBox.CENTER_SIZE:
prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
prior_variance = [0.1]
flip = True
clip = False
# Solverparameters.
# Defining whichGPUs to use.
gpus ="0"######
gpulist =gpus.split(",")
num_gpus =len(gpulist)
# Divide themini-batch to different GPUs.
batch_size = 32######
accum_batch_size =32######
iter_size =accum_batch_size / batch_size
solver_mode =P.Solver.CPU
device_id = 0
batch_size_per_device= batch_size
if num_gpus >0:
batch_size_per_device =int(math.ceil(float(batch_size) / num_gpus))
iter_size =int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
solver_mode = P.Solver.GPU
device_id = int(gpulist[0])
ifnormalization_mode == P.Loss.NONE:
base_lr /= batch_size_per_device
elifnormalization_mode == P.Loss.VALID:
base_lr *= 25. / loc_weight
elifnormalization_mode == P.Loss.FULL:
# Roughly there are 2000 prior bboxes perimage.
# TODO(weiliu89): Estimate the exact # ofpriors.
base_lr *= 2000.
# Evaluate onwhole test set.
num_test_image =15439######
test_batch_size =8######
test_iter =num_test_image / test_batch_size
solver_param = {
# Train parameters
'base_lr': base_lr,
'weight_decay': 0.0005,
'lr_policy': "multistep",
'stepvalue': [80000, 100000, 120000],
'gamma': 0.1,
'momentum': 0.9,
'iter_size': iter_size,
'max_iter': 120000,
'snapshot': 80000,
'display': 10,
'average_loss': 10,
'type': "SGD",
'solver_mode': solver_mode,
'device_id': device_id,
'debug_info': False,
'snapshot_after_train': True,
# Test parameters
'test_iter': [test_iter],
'test_interval': 10000,
'eval_type': "detection",
'ap_version': "11point",
'test_initialization': False,
}
# parameters forgenerating detection output.
det_out_param = {
'num_classes': num_classes,
'share_location': share_location,
'background_label_id': background_label_id,
'nms_param': {'nms_threshold': 0.45,'top_k': 400},
'save_output_param': {
'output_directory': output_result_dir,
'output_name_prefix':"comp4_det_test_",
'output_format': "VOC",
'label_map_file': label_map_file,
'name_size_file': name_size_file,
'num_test_image': num_test_image,
},
'keep_top_k': 200,
'confidence_threshold': 0.01,
'code_type': code_type,
}
# parameters forevaluating detection results.
det_eval_param = {
'num_classes': num_classes,
'background_label_id': background_label_id,
'overlap_threshold': 0.5,
'evaluate_difficult_gt': False,
'name_size_file': name_size_file,
}
### Hopefully youdon't need to change the following ###
# Check file.
check_if_exist(train_data)
check_if_exist(test_data)
check_if_exist(label_map_file)
check_if_exist(pretrain_model)
make_if_not_exist(save_dir)
make_if_not_exist(job_dir)
make_if_not_exist(snapshot_dir)
# Create trainnet.
net =caffe.NetSpec()
net.data,net.label = CreateAnnotatedDataLayer(train_data,batch_size=batch_size_per_device,
train=True, output_label=True,label_map_file=label_map_file,
transform_param=train_transform_param,batch_sampler=batch_sampler)
VGGNetBody(net,from_layer='data', fully_conv=True, reduced=True, dilated=True,
dropout=False)
AddExtraLayers(net,use_batchnorm, lr_mult=lr_mult)
mbox_layers =CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
use_batchnorm=use_batchnorm,min_sizes=min_sizes, max_sizes=max_sizes,
aspect_ratios=aspect_ratios,steps=steps, normalizations=normalizations,
num_classes=num_classes,share_location=share_location, flip=flip, clip=clip,
prior_variance=prior_variance,kernel_size=3, pad=1, lr_mult=lr_mult)
# Create theMultiBoxLossLayer.
name ="mbox_loss"
mbox_layers.append(net.label)
net[name] =L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
loss_param=loss_param,include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
propagate_down=[True, True, False,False])
withopen(train_net_file, 'w') as f:
print('name:"{}_train"'.format(model_name), file=f)
print(net.to_proto(), file=f)
shutil.copy(train_net_file,job_dir)
# Create test net.
net =caffe.NetSpec()
net.data,net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,
train=False, output_label=True,label_map_file=label_map_file,
transform_param=test_transform_param)
VGGNetBody(net,from_layer='data', fully_conv=True, reduced=True, dilated=True,
dropout=False)
AddExtraLayers(net,use_batchnorm, lr_mult=lr_mult)
mbox_layers =CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
use_batchnorm=use_batchnorm,min_sizes=min_sizes, max_sizes=max_sizes,
aspect_ratios=aspect_ratios,steps=steps, normalizations=normalizations,
num_classes=num_classes,share_location=share_location, flip=flip, clip=clip,
prior_variance=prior_variance,kernel_size=3, pad=1, lr_mult=lr_mult)
conf_name ="mbox_conf"
ifmultibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:
reshape_name ="{}_reshape".format(conf_name)
net[reshape_name] = L.Reshape(net[conf_name],shape=dict(dim=[0, -1, num_classes]))
softmax_name ="{}_softmax".format(conf_name)
net[softmax_name] =L.Softmax(net[reshape_name], axis=2)
flatten_name ="{}_flatten".format(conf_name)
net[flatten_name] =L.Flatten(net[softmax_name], axis=1)
mbox_layers[1] = net[flatten_name]
elifmultibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:
sigmoid_name ="{}_sigmoid".format(conf_name)
net[sigmoid_name] = L.Sigmoid(net[conf_name])
mbox_layers[1] = net[sigmoid_name]
net.detection_out= L.DetectionOutput(*mbox_layers,
detection_output_param=det_out_param,
include=dict(phase=caffe_pb2.Phase.Value('TEST')))
net.detection_eval= L.DetectionEvaluate(net.detection_out, net.label,
detection_evaluate_param=det_eval_param,
include=dict(phase=caffe_pb2.Phase.Value('TEST')))
withopen(test_net_file, 'w') as f:
print('name:"{}_test"'.format(model_name), file=f)
print(net.to_proto(), file=f)
shutil.copy(test_net_file,job_dir)
# Create deploynet.
# Remove the firstand last layer from test net.
deploy_net = net
withopen(deploy_net_file, 'w') as f:
net_param = deploy_net.to_proto()
# Remove the first (AnnotatedData) and last(DetectionEvaluate) layer from test net.
del net_param.layer[0]
del net_param.layer[-1]
net_param.name ='{}_deploy'.format(model_name)
net_param.input.extend(['data'])
net_param.input_shape.extend([
caffe_pb2.BlobShape(dim=[1, 3,resize_height, resize_width])])
print(net_param, file=f)
shutil.copy(deploy_net_file,job_dir)
# Create solver.
solver =caffe_pb2.SolverParameter(
train_net=train_net_file,
test_net=[test_net_file],
snapshot_prefix=snapshot_prefix,
**solver_param)
withopen(solver_file, 'w') as f:
print(solver, file=f)
shutil.copy(solver_file,job_dir)
max_iter = 0
# Find most recentsnapshot.
for file inos.listdir(snapshot_dir):
if file.endswith(".solverstate"):
basename = os.path.splitext(file)[0]
iter =int(basename.split("{}_iter_".format(model_name))[1])
if iter > max_iter:
max_iter = iter
train_src_param ='--weights="{}" \\\n'.format(pretrain_model)
ifresume_training:
if max_iter > 0:
train_src_param ='--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix,max_iter)
ifremove_old_models:
# Remove any snapshots smaller than max_iter.
for file in os.listdir(snapshot_dir):
if file.endswith(".solverstate"):
basename = os.path.splitext(file)[0]
iter =int(basename.split("{}_iter_".format(model_name))[1])
if max_iter > iter:
os.remove("{}/{}".format(snapshot_dir, file))
if file.endswith(".caffemodel"):
basename = os.path.splitext(file)[0]
iter =int(basename.split("{}_iter_".format(model_name))[1])
if max_iter > iter:
os.remove("{}/{}".format(snapshot_dir, file))
# Create job file.
withopen(job_file, 'w') as f:
f.write('cd {}\n'.format(caffe_root))
f.write('./build/tools/caffe train \\\n')
f.write('--solver="{}"\\\n'.format(solver_file))
f.write(train_src_param)
if solver_param['solver_mode'] ==P.Solver.GPU:
f.write('--gpu {} 2>&1 | tee{}/{}.log\n'.format(gpus, job_dir, model_name))
else:
f.write('2>&1 | tee{}/{}.log\n'.format(job_dir, model_name))
# Copy the pythonscript to job_dir.
py_file =os.path.abspath(__file__)
shutil.copy(py_file,job_dir)
# Run the job.
os.chmod(job_file,stat.S_IRWXU)
if run_soon:
subprocess.call(job_file, shell=True)
train\test\deploy\solver.prototxt等都是运行这个脚本自动生成的。
gpus='0,1,2,3',如果有一块GPU,则删除123,有两块则删除23
如果没有GPU,需要注释以下几行,程序会以cpu形式训练:(这个是解决 cudasucess(10vs0)的方法)
#ifnum_gpus >0:
#batch_size_per_device=int(math.ceil(float(batch_size)/num_gpus))
#iter_size = int(math.ceil(float(accum_batch_size)/(batch_size_per_device*num_gpus)))
#solver_model=P.Solver.GPU
#device_id=int(gpulist[0])
6. 修改 ./examples/ssd/ssd_pascal_webcam.py脚本
对应修改就行了
7. 训练
在根目录下运行
python ./examples/ssd/ssd_pascal.py 2>&1 | tee ssd_train_log.txt
如果出现 cudasucess(2vs0):说明显卡的计算能力有限,需要更改 caffe/examples/sdd/ssd_pascal.py 中的batch_size. 默认的32变小成16、8、4。
8. 测试单张图片,并显示框的坐标信息
- # coding: utf-8
- # Note: this file is expected to be in {caffe_root}/examples
- # ### 1. Setup
- from __future__ import print_function
- import numpy as np
- import matplotlib.pyplot as plt
- import pylab
- plt.rcParams['figure.figsize'] = (10, 10)
- plt.rcParams['image.interpolation'] = 'nearest'
- plt.rcParams['image.cmap'] = 'gray'
- caffe_root = '../'
- import os
- os.chdir(caffe_root)
- import sys
- sys.path.insert(0, '/home/lilai/LL/caffe/python')
- import caffe
- from google.protobuf import text_format
- from caffe.proto import caffe_pb2
- caffe.set_device(0)
- caffe.set_mode_gpu()
- labelmap_file = '/home/lilai/LL/caffe/data/VOC0712/labelmap_voc.prototxt'
- file = open(labelmap_file, 'r')
- labelmap = caffe_pb2.LabelMap()
- text_format.Merge(str(file.read()), labelmap)
- def get_labelname(labelmap, labels):
- num_labels = len(labelmap.item)
- labelnames = []
- if type(labels) is not list:
- labels = [labels]
- for label in labels:
- found = False
- for i in xrange(0, num_labels):
- if label == labelmap.item[i].label:
- found = True
- labelnames.append(labelmap.item[i].display_name)
- break
- assert found == True
- return labelnames
- model_def = '/home/lilai/LL/caffe/models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt'
- model_weights = '/home/lilai/LL/caffe/models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel'
- net = caffe.Net(model_def, model_weights, caffe.TEST)
- # input preprocessing: 'data' is the name of the input blob == net.inputs[0]
- transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
- transformer.set_transpose('data', (2, 0, 1))
- transformer.set_mean('data', np.array([104, 117, 123])) # mean pixel
- transformer.set_raw_scale('data', 255) # the reference model operates on images in [0,255] range instead of [0,1]
- transformer.set_channel_swap('data', (2, 1, 0)) # the reference model has channels in BGR order instead of RGB
- # ### 2. SSD detection
- # Load an image.
- image_resize = 300
- net.blobs['data'].reshape(1, 3, image_resize, image_resize)
- image = caffe.io.load_image('/home/lilai/LL/caffe/examples/images/fish-bike.jpg')
- plt.imshow(image)
- # Run the net and examine the top_k results
- transformed_image = transformer.preprocess('data', image)
- net.blobs['data'].data[...] = transformed_image
- # Forward pass.
- detections = net.forward()['detection_out']
- # Parse the outputs.
- det_label = detections[0, 0, :, 1]
- det_conf = detections[0, 0, :, 2]
- det_xmin = detections[0, 0, :, 3]
- det_ymin = detections[0, 0, :, 4]
- det_xmax = detections[0, 0, :, 5]
- det_ymax = detections[0, 0, :, 6]
- # Get detections with confidence higher than 0.6.
- top_indices = [i for i, conf in enumerate(det_conf) if conf >= 0.6]
- top_conf = det_conf[top_indices]
- top_label_indices = det_label[top_indices].tolist()
- top_labels = get_labelname(labelmap, top_label_indices)
- top_xmin = det_xmin[top_indices]
- top_ymin = det_ymin[top_indices]
- top_xmax = det_xmax[top_indices]
- top_ymax = det_ymax[top_indices]
- # Plot the boxes
- colors = plt.cm.hsv(np.linspace(0, 1, 21)).tolist()
- currentAxis = plt.gca()
- for i in xrange(top_conf.shape[0]):
- # bbox value
- xmin = int(round(top_xmin[i] * image.shape[1]))
- ymin = int(round(top_ymin[i] * image.shape[0]))
- xmax = int(round(top_xmax[i] * image.shape[1]))
- ymax = int(round(top_ymax[i] * image.shape[0]))
- # score
- score = top_conf[i]
- # label
- label = int(top_label_indices[i])
- label_name = top_labels[i]
- # display info: label score xmin ymin xmax ymax
- display_txt = '%s: %.2f %d %d %d %d' % (label_name, score,xmin, ymin, xmax, ymax)
- # display_bbox_value = '%d %d %d %d' % (xmin, ymin, xmax, ymax)
- coords = (xmin, ymin), xmax - xmin + 1, ymax - ymin + 1
- color = colors[label]
- currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor=color, linewidth=2))
- currentAxis.text(xmin, ymin, display_txt, bbox={'facecolor': color, 'alpha': 0.5})
- # currentAxis.text((xmin+xmax)/2, (ymin+ymax)/2, display_bbox_value, bbox={'facecolor': color, 'alpha': 0.5})
- plt.imshow(image)
- pylab.show()
9. 关于aspect_ratios问题
SSD算法中aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]],这句话具体是什么意思
[2, 3] means using default box of aspect ratio of 2 and 3. And since we set flip=True at here, it will also use default box of aspect ratio of 1/2 and 1/3.
举例说明:[2]表示ar
= {1,2,1/2};[2,3]表示ar = {1,2,3,1/2,1/3}。当等于1的时候会在增加一个默认框。
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]总计有6个元素,每一个元素对应相应的feature map
第一个元素表示第一个feature map上每个像素点上有多少个box:【2】:表明ar = {1,2,1/2}。等于1的时候会再增加一个(论文中有说明)
第二个元素同理:【2,3】:表明ar={1,2,3,1/2,1/3}.等于1的时候会再增加一个(论文中有说明)
不明白的直接看看prior_box_layer.cpp代码。里面有具体操作。一看就懂。
具体的