【计算机科学】【2016.05】论对抗环境下深度学习系统的完善性

【计算机科学】【2016.05】论对抗环境下深度学习系统的完善性
本文为美国宾夕法尼亚州立大学(作者:Nicolas Papernot)的硕士论文,共60页。

深度学习利用大数据集和计算效率高的训练算法,在各种机器学习任务中优于其他方法。然而,深度神经网络在训练阶段的不完善,使得它们像其他机器学习技术一样,容易受到对手样本的攻击:对手精心设计输入目的是导致机器学习算法进行错误的分类。

在这项工作中,我们对深度神经网络(DNN)的对抗空间进行了公式化建模,并在精确理解DNN输入和输出之间映射的基础上,引入了一类新的算法来构造对抗样本。在计算机视觉的应用中,我们证明了我们的算法能够可靠地产生由人类被试正确分类的样本,而由DNN在特定目标中错误分类的样本,其对抗成功率为97%,而平均每个样本仅修改4.02%的输入特征。然后,我们通过定义一个硬性度量指标来评估不同样本类对对抗性扰动的脆弱性,最后,通过定义良性输入和目标分类之间距离的预测性度量,我们描述了对抗性样本防御的初步工作。

Deep learning takes advantage of largedatasets and computationally efficient training algorithms to outperform otherapproaches at various machine learning tasks. However, imperfections in thetraining phase of deep neural networks make them, like other machine learningtechniques, vulnerable to adversarial samples: inputs crafted by adversarieswith the intent of causing machine learning algorithms to misclassify. In thiswork, we formalize the space of adversaries against deep neural networks (DNNs)and introduce a novel class of algorithms to craft adversarial samples based ona precise understanding of the mapping between inputs and outputs of DNNs. Inan application to computer vision, we show that our algorithms can reliablyproduce samples correctly classified by human subjects but misclassified inspecific targets by a DNN with a 97% adversarial success rate while onlymodifying on average 4.02% of the input features per sample. We then evaluatethe vulnerability of different sample classes to adversarial perturbations bydefining a hardness measure. Finally, we describe preliminary work outliningdefenses against adversarial samples by defining a predictive measure ofdistance between a benign input and a target classification.

  1. 引言
  2. 关于深度学习
  3. 深度学习中威胁模型的分类
  4. 在测试过程中攻击深度神经网络的完善性
  5. 攻击效果验证
  6. 基于攻击理解的防御机制设计
  7. 相关工作
  8. 结论

更多精彩文章请关注公众号:【计算机科学】【2016.05】论对抗环境下深度学习系统的完善性