原

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification

2018年07月25日 11:15:44 ProYH 阅读数：399

													<span class="tags-box artic-tag-box">
							<span class="label">标签：</span>
															<a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;caffe&quot;}" class="tag-link" href="http://so.****.net/so/search/s.do?q=caffe&amp;t=blog" target="_blank">caffe																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;深度学习&quot;}" class="tag-link" href="http://so.****.net/so/search/s.do?q=深度学习&amp;t=blog" target="_blank">深度学习																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;人脸识别&quot;}" class="tag-link" href="http://so.****.net/so/search/s.do?q=人脸识别&amp;t=blog" target="_blank">人脸识别																</a><a data-track-click="{&quot;mod&quot;:&quot;popu_626&quot;,&quot;con&quot;:&quot;损失&quot;}" class="tag-link" href="http://so.****.net/so/search/s.do?q=损失&amp;t=blog" target="_blank">损失																</a>
						<span class="article_info_click">更多</span></span>
																				<div class="tags-box space">
							<span class="label">个人分类：</span>
															<a class="tag-link" href="https://blog.****.net/u010579901/article/category/7800281" target="_blank">Deep-Learning 人脸识别论文																</a>
						</div>
																							</div>
			<div class="operating">
													</div>
		</div>
	</div>
</div>
<article class="baidu_pl">
	<div id="article_content" class="article_content clearfix ****-tracking-statistics" data-pid="blog" data-mod="popu_307" data-dsm="post">
							<div class="article-copyright">
				版权声明：本文为博主原创文章，未经博主允许不得转载。					https://blog.****.net/u010579901/article/details/81198950				</div>
							            <div id="content_views" class="markdown_views">
						<!-- flowchart 箭头图标 勿删 -->
						<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path></svg>
						<h1 id="论文阅读笔记am-softmax-additive-margin-softmax-for-face-verification"><a name="t0"></a>论文阅读笔记：AM-Softmax:  Additive Margin Softmax for Face Verification</h1>

Tags：Deep_Learning_基础论文

本文主要包含如下内容：

论文地址
 代码地址
 参考博客

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification

本篇论文来自电子科技大学UESTC，论文参考NormFace、A-Softmax进行优化，提出了AM-Softmax。

主要思想

L-Softmax, A-Softmax引入了角间距的概念，用于改进传统的softmax loss函数，使得人脸特征具有更大的类间距和更小的类内距。作者在这些方法的启发下，提出了一种更直观和更易解释的additive margin Softmax (AM-Softmax)。同时，本文强调和讨论了特征正则化的重要性。实验表明AM-Softmax在LFW和MegaFace得到了比之前方法更好的效果。
论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification, why add layer margin,

算法原理

L-Softmax和A-Softmax均是引入了一个参数因子m 将权重W和f的cos距离变为cos(mθ)，通过m 来调节特征间的距离。与前两者类似，AM-Softmax将cos(θ)的式子改写为：式子是一个单调递减的函数，且比L-Softmax/A-Softmax所用的 Ψ(θ)在形式和计算时更为简单。

Ψ (θ) = c o s (θ) - m Ψ(θ)=cos(θ)-m

其中s是一个缩放因子，论文中固定为30。
角度距离与余弦距离的关系：Asoftmax是用m乘以θ，而AMSoftmax是用cosθ减去m，这是两者的最大不同之处：一个是角度距离，一个是余弦距离。之所以选择cosθ-m而不是cos（θ-m），这是因为我们从网络中得到的是W和f的内积，如果要优化cos（θ-m）那么会涉及到arccos操作，计算量过大。
归一化特征 feature normalization：高质量的图片提取出来的特征范数大，低质量的图片提取出来的特征范数小，在进行了feature normalizaiton后，这些质量较差的图片特征会产生更大的梯度，导致网络在训练过程中将更多的注意力集中在这些样本上。因此，对于数据集图片质量较差时，更适合采用feature normalization。
论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification, why add layer margin,

实验结果

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification, why add layer margin,
值得注意的是，在LFW集上，未采用feature normalization比采用了feature normalizaiton的结果更好，作者分析是由于LFW的数据质量较高。
这里的了feature normalizaiton指的是将Scale层s的参数进行相应的更换，即将固定的s参数改变为对应的特征归一化尺度。即根据特征，缩放比例不一样了。

总结

本文在特征和权值正则化的情况下，提出了一种 additive margin Softmax，更直观也更易解释，同时也取得了比A-Softmax更好的实验结果。

代码实现

代码可以参考NormFace的相关代码，比较类似。只是在上面进行想应该改进。
这里在NormFace的基础上，提出了新的层LabelSpecificAdd，即AMSoftmax的核心，将cosθ减去m。

layer {
  name: "norm1"
  type: "Normalize"
  bottom: "fc5"
  top: "norm1"
}
layer {
  name: "fc6_l2"
  type: "InnerProduct"
  bottom: "norm1"
  top: "fc6"
  param {
    lr_mult: 1
  }
  inner_product_param{
    num_output: 10516
    normalize: true
    weight_filler {
      type: "xavier"
    }
    bias_term: false
  }
}
layer {
  name: "label_specific_margin"
  type: "LabelSpecificAdd"
  bottom: "fc6"
  bottom: "label"
  top: "fc6_margin"
  label_specific_add_param {
    bias: -0.35
  }
}
layer {
  name: "fc6_margin_scale"
  type: "Scale"
  bottom: "fc6_margin"
  top: "fc6_margin_scale"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    filler{
      type: "constant"
      value: 30
    }
  }
}
layer {
  name: "softmax_loss"
  type: "SoftmaxWithLoss"
  bottom: "fc6_margin_scale"
  bottom: "label"
  top: "softmax_loss"
  loss_weight: 1
}
layer {
  name: "Accuracy"
  type: "Accuracy"
  bottom: "fc6"
  bottom: "label"
  top: "accuracy"
  include { 
    phase: TEST
  }
}

label_specific_add_layer.hpp/label_specific_add_layer.cpp

label_specific_add_layer.hpp/label_specific_add_layer.cpp（执行cosθ减去m操作）
公式：

Ψ (θ) = c o s (θ) - m Ψ(θ)=cos(θ)-m

#ifndef CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_
#define CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_

#include <vector>

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

namespace caffe {

template <typename Dtype>
class LabelSpecificAddLayer : public Layer<Dtype> {
 public:
  explicit LabelSpecificAddLayer(const LayerParameter& param)
      : Layer<Dtype>(param) {}
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
                          const vector<Blob<Dtype>*>& top);
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
                       const vector<Blob<Dtype>*>& top);

  virtual inline const char* type() const { return "LabelSpecificAdd"; }
  virtual inline int MinNumBottomBlobs() const { return 2; }

 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

  Dtype bias_;
  bool transform_test_;
  bool anneal_bias_;
  Dtype bias_base_;
  Dtype bias_gamma_;
  Dtype bias_power_;
  Dtype bias_min_;
  Dtype bias_max_;
  int iteration_;
};

}  // namespace caffe

#endif  // CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_

#include <algorithm>
#include <vector>

#include "caffe/layers/label_specific_add_layer.hpp"

namespace caffe {

  template <typename Dtype>
  void LabelSpecificAddLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
                                                    const vector<Blob<Dtype>*>& top) {
    const LabelSpecificAddParameter& param = this->layer_param_.label_specific_add_param();
    bias_ = param.bias();
    transform_test_ = param.transform_test() & (this->phase_ == TRAIN);
    anneal_bias_ = param.has_bias_base();
    bias_base_ = param.bias_base();
    bias_gamma_ = param.bias_gamma();
    bias_power_ = param.bias_power();
    bias_min_ = param.bias_min();
    bias_max_ = param.bias_max();
    iteration_ = param.iteration();
  }

  template <typename Dtype>
  void LabelSpecificAddLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
                                                    const vector<Blob<Dtype>*>& top) {
    if(top[0] != bottom[0]) top[0]->ReshapeLike(*bottom[0]);
    if (top.size() == 2) top[1]->Reshape({ 1 });
  }

template <typename Dtype>
void LabelSpecificAddLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                                                  const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* label_data = bottom[1]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();

  int num = bottom[0]->num();   // 返回batch_size
  int count = bottom[0]->count();   //返回输入的维度
  int dim = count / num;    // 对应输出的类别数

  if (top[0] != bottom[0]) caffe_copy(count, bottom_data, top_data);

  if (!transform_test_ && this->phase_ == TEST) return;     // 如果测试，则不进行该操作（思路正确）

  if (anneal_bias_) {   // 计算偏差，这里可以模拟模拟退化（缓慢变化）
    bias_ = bias_base_ + pow(((Dtype)1. + bias_gamma_ * iteration_), bias_power_) - (Dtype)1.;
    bias_ = std::max(bias_, bias_min_);
    bias_ = std::min(bias_, bias_max_);
    iteration_++;
  }
  if (top.size() == 2) {
    top[1]->mutable_cpu_data()[0] = bias_;
  }     // 输出计算偏差结果

  for (int i = 0; i < num; ++i) {
    int gt = static_cast<int>(label_data[i]);
    if(top_data[i * dim + gt] > -bias_) top_data[i * dim + gt] += bias_;    // 对应标签位置加上bias
  }
}

template <typename Dtype>
void LabelSpecificAddLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
                                                   const vector<bool>& propagate_down,
                                                   const vector<Blob<Dtype>*>& bottom) {       // 反向传播就是本身，故复制本身即可
  if (top[0] != bottom[0] && propagate_down[0]) {
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    int count = bottom[0]->count();
    caffe_copy(count, top_diff, bottom_diff);
  }
}


#ifdef CPU_ONLY
STUB_GPU(LabelSpecificAddLayer);
#endif

INSTANTIATE_CLASS(LabelSpecificAddLayer);
REGISTER_LAYER_CLASS(LabelSpecificAdd);

}  // namespace caffe

					<link href="https://****img.cn/release/phoenix/mdeditor/markdown_views-a47e74522c.css" rel="stylesheet">
            </div>
				</article>

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification, why add layer margin,