读书笔记-增量学习-LwF_Learning without Forgetting

2017的经典论文,Learning without Forgetting(LwF)。在多篇论文中被用作实验比较的经典算法。作者认为Fine Tuning / Duplicating and Fine Tuning / Feature Extraction / Joint Training这几种基于修改参数的算法均存在性能或效率不高的问题。实验证明,作者提出的LwF算法可以克服上述算法的不足。

读书笔记-增量学习-LwF_Learning without Forgetting

LwF实现增量学习的核心是对参数的更新方法,文章介绍并比较了几种经典增量学习的算法Fine Tuning / Duplicating and Fine Tuning / Feature Extraction / Joint Training。如图:

读书笔记-增量学习-LwF_Learning without Forgetting

  • 以CNN模型为例,图中读书笔记-增量学习-LwF_Learning without Forgetting代表卷积层和全连接层的共享参数,读书笔记-增量学习-LwF_Learning without Forgetting代表先前学习的任务的特定参数,读书笔记-增量学习-LwF_Learning without Forgetting代表新任务的特定参数。
  • (a)代表无增量学习能力的原始模型,所有的参数均不会更新。
  • (b)代表微调算法,在增量学习阶段,读书笔记-增量学习-LwF_Learning without Forgetting不变,随机初始化读书笔记-增量学习-LwF_Learning without Forgetting,并在训练过程更新读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting
  • (c)代表特征提取算法,在增量学习阶段,读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting不变,读书笔记-增量学习-LwF_Learning without Forgetting在旧任务提取的特征上进行训练并更新。
  • (d)代表联合训练算法,在增量学习阶段,联合优化读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting至收敛。
  • (e)代表作者提出的LwF算法,在增量学习阶段,先用读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting训练读书笔记-增量学习-LwF_Learning without Forgetting至收敛,再联合优化读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting至收敛。

LwF算法伪代码如下:

读书笔记-增量学习-LwF_Learning without Forgetting

  • 读书笔记-增量学习-LwF_Learning without Forgetting代表卷积层和全连接层的共享参数,读书笔记-增量学习-LwF_Learning without Forgetting代表先前学习的任务的特定参数,读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting代表新数据的值和标签。
  • 初始阶段:模型用旧的读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting输出对新数据的预测读书笔记-增量学习-LwF_Learning without Forgetting。同时随机初始化代表新任务的特定参数读书笔记-增量学习-LwF_Learning without Forgetting
  • 增量阶段是一个重复多次直至Loss函数最小的过程,期间使用读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting输出读书笔记-增量学习-LwF_Learning without Forgetting,使用读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting输出读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting存在损失,读书笔记-增量学习-LwF_Learning without Forgetting读书笔记-增量学习-LwF_Learning without Forgetting存在损失,调整三个参数直至损失函数收敛。

LwF与联合训练(Joint learning)的异同:

联合训练需要用到旧任务的数据和标签,而LwF使用新数据读书笔记-增量学习-LwF_Learning without Forgetting和上一次模型的预测输出读书笔记-增量学习-LwF_Learning without Forgetting


实验中,增量数据采用不同的数据集,LwF对新类别数据的分类准确率较高且能克服旧类别数据灾难性遗忘问题。

读书笔记-增量学习-LwF_Learning without Forgetting