持续学习和多任务学习的区别_持续学习我们在哪里
持续学习和多任务学习的区别
As the deep learning community aims to bridge the gap between human and machine intelligence, the need for agents that can adapt to continuously evolving environments is growing more than ever. This was evident at the ICML 2020 which hosted two different workshop tracks on continual and lifelong learning. As an attendee, the key takeaways that I amassed and that I believe are going to shape the imminent developments in the field are two folds: (a) experience replay (whether real or augmented) is integral for optimal performances, and (b) a temporally evolving agent must be aware of task semantics. Throughout this blog post, I will try to shed light on the effectiveness of these traits for continual learning (CL) agents.
由于深度学习社区旨在弥合人机智能之间的鸿沟,因此对能够适应不断发展的环境的代理的需求比以往任何时候都在增长。 这在ICML 2020上很明显,它举办了两个关于持续学习和终身学习的研讨会。 作为与会者,我积累的关键知识以及我认为将影响该领域即将发生的发展的两个方面:(a)体验重放(无论是真实的还是增强的)对于实现最佳性能是必不可少的;以及(b)a时间演变的代理人必须了解任务的语义。 在整个博客文章中,我将尝试阐明这些特征对于持续学习(CL)代理的有效性。
[Although a background on the above terms would help, the post is intended for readers with no prior knowledge of continual learning literature.]
[尽管具有上述术语的背景会有所帮助,但该帖子的读者对象是不具备持续学习文学知识的读者。]
A quick refresher: In a task-incremental setup, a continuously learning agent at a time step ‘t’ is trained to recognize the tasks 1, .., t-1, t while the data for the tasks 1, …, t-1 may or may not be available. Such learning dynamic has two main hurdles to overcome. The first of these being forward transfer (FT) which measures how learning incrementally up to task t influences the agent’s knowledge about it. In terms of performance, a positive FT suggests that the agent should deliver better accuracy on the task t if allowed to learn it incrementally through tasks 1, …, t-1.
快速更新:在任务递增设置中,训练了一个连续学习的代理,在时间步长“ t”时识别任务1,..,t-1,t,而任务1,…,t-的数据1可能会或可能不会。 这种学习动力有两个主要障碍需要克服。 第一个是前向转移(FT) ,它测量直到任务t为止的增量学习如何影响代理对它的了解。 在性能方面,正面FT表示,如果允许通过任务1,…,t-1逐步学习任务t ,则代理应该对任务t提供更好的准确性。
The other desirable feature is the backward transfer (BT) which measures the influence that learning a task t has on the performance of a previous task. A positive BT means that learning a new task t would increase the performance of the model on the previously learned tasks 1, …, t-1. This compromise between learning a new task while also preserving the knowledge on previously learned tasks is referred to as plasticity-stability trade-off.
另一个理想的功能是后向传输(BT) ,它测量学习任务t对先前任务的执行产生的影响。 正BT意味着学习新任务t将提高模型在先前学习的任务1,…,t-1上的性能。 在学习新任务和保留先前学习的任务之间的知识之间的这种折衷被称为可塑性-稳定性的折衷。
Based on the steps taken while training on an incremental task, continual learning literature comprises mainly of two categories of agents to handle the aforementioned trade-off: (a) experience replay-based agents usually store a finite amount of examples (either real or generative) from previous tasks and mix these together with the train data of the new task, and (b) regularisation-based methods use additional loss terms to consolidate previous knowledge. Keeping these in mind, let us now dive into the real questions!
根据在执行增量任务时所采取的步骤,持续学习文献主要包括两类代理,以处理上述折衷:(a)基于体验重放的代理通常存储有限数量的示例(真实的或生成的) ),然后将其与新任务的训练数据混合在一起;(b)基于正则化的方法使用其他损失项来巩固先前的知识。 请记住这些,让我们现在深入探讨真正的问题!
1.为什么基于记忆彩排的方法效果更好? (1. Why memory rehearsal-based methods work better?)
A spotlight in the field of CL at ICML 2020 was the work of Knoblauch et al. who by the means of set theory show that an optimal continual learning algorithm needs to solve the NP-hard problem of the set intersection decision problem, i.e., given two tasks A and B, it needs to discern the parameters that are common to learning of both A and B (A ∩ B). However, determining this is at least as hard as determining whether A ∩ B is empty or not (and can possibly be thought of as a generalization of the Hitting Set Problem?) and the solution requires a perfect memory of previous task examples.
Knoblauch等人的工作是在ICML 2020上CL领域的一个亮点。 谁通过集合论证明最佳的持续学习算法需要解决集合相交决策问题的NP-hard问题,即给定两个任务A和B,则需要辨别学习集合的共同参数A和B(A∩B)。 但是,确定这一点至少与确定A∩B是否为空一样困难(并且可能被认为是“命中集问题”的概括),并且该解决方案需要完美地记忆先前的任务示例。
Such a perfect memory facilitates the reconstruction of an approximation for the joint distribution over all observed tasks so that the algorithm now effectively learns to solve a single temporally distributed task, i.e., for a time step t, this amounts to finding common representations across task distributions spanning over 1:t. Our work from the CL workshop further advocates for the empirical effectiveness of replay-based methods in the context of human activity recognition [2].
这样的完美记忆有助于在所有观察到的任务上重建联合分布的近似值,从而使算法现在可以有效地学习以解决单个时间分布的任务,即对于时间步t ,这相当于在整个任务分布中找到共同的表示跨度超过1:t 。 我们来自CL研讨会的工作进一步提倡在人类活动识别的背景下基于重播的方法的经验有效性[2]。
Alongside set theory, the benefit of replay can also be looked at through the dynamics of parameter training by treating learning continuously as a credit assignment problem. As we know, gradient descent works by iteratively updating the parameters of a neural network with the objective of minimizing the overall loss on the train set. The training process can thus be viewed as a tug-of-war game where the objective function leads the values of each parameter to either increase or decrease, with a larger positive value indicating that the parameter should be assigned more credit and is more important.
除了集合论外,重播的好处还可以通过将学习连续视为学分分配问题来通过参数训练的动力学来观察。 众所周知,梯度下降是通过迭代地更新神经网络的参数来实现的,目的是最大程度地减少火车总损失。 因此,训练过程可以看作是拔河比赛,其中目标函数导致每个参数的值增加或减少,较大的正值表示应该为参数分配更多的功劳,并且更重要。
At a given incremental time step, we can thus view each task as a team trying to pull the tug with a tension equivalent to the momentum that the training algorithm requires for minimizing the loss on the task. A repercussion of this is that at each incremental step, the model needs to be evaluated on all previous and current tasks so as to balance the tension. In case a given task is absent at a particular instance, the parameter space of the model will be updated to be occupied by the remaining tasks. The simultaneous presence of data from all previous tasks in experience replay-based methods thus helps at the better balancing of the tension among all sides of the tug-of-war game while no single task objective fully dominates over the training criterion.
因此,在给定的增量时间步长上,我们可以将每个任务视为一个团队,试图以相等于训练算法为使任务损失最小化所需的动量的拉力拉动拖船。 这样的结果是,在每个增量步骤中,都需要在所有先前和当前任务上评估模型,以平衡张力。 如果在特定情况下不存在给定任务,则模型的参数空间将更新为剩余任务所占用。 因此,基于经验重放的方法中所有先前任务的数据同时存在,有助于更好地平衡拔河比赛各方面之间的紧张关系,而没有任何一项任务目标能完全胜任训练标准。
2.任务语义如何影响CL代理的性能? (2. How task semantics affect the performance of a CL agent?)
A yet another highlight from the CL workshop has been the work of Ramesesh et al. (2020) investigating how the similarity between tasks influence the degree of forgetting. They conclude that a network has maximum forgetting when the similarity of representations between a previous task and a subsequent task is intermediate.
CL研讨会的另一个亮点是Ramesesh等人的工作。 (2020)研究任务之间的相似性如何影响遗忘的程度。 他们得出结论,当前一个任务和后一个任务之间的表示相似性处于中间状态时,网络会最大程度地遗忘。
To understand this, we need to think CL of subsequent tasks in terms of the components of weight vectors learned by the model. For tasks that are unrelated, the learned weight vectors remain orthogonal to each other while for those with high similarities, the weight vector components have minimal angular separation. The only component of the weight vector θ that is affected by the gradient descent training is the component that lies in the training data subspace and the one that is least affected by the training is that orthogonal to the train data subspace (see the figure below adapted from their talk).
为了理解这一点,我们需要根据模型学习的权重向量的分量来考虑后续任务的CL。 对于不相关的任务,学习的权重向量保持彼此正交,而对于相似度高的任务,权重向量分量的角度间隔最小。 权重向量θ的唯一受梯度下降训练影响的分量是位于训练数据子空间中的分量,而受训练影响最小的分量是与火车数据子空间正交的分量(请参见下图)从他们的谈话中)。
Ramesesh et al. offer two descriptive CL setups to support their hypothesis. In Setup 1, where the model is trained to classify ship-truck as the first task, and then cat-horse or plane-car as the second task, we see that the cat-horse recognition task suffers more forgetting. In setup 2, where the model is first trained to recognize deer-dog-ship-truck, followed by plane-car recognition, the performance degrades the most for ship-truck.
Ramesesh等。 提供两种描述性的CL设置以支持其假设。 在设置1中,训练模型将船类归为第一任务,然后将猫马或飞机归为第二任务,我们看到猫马识别任务遭受了更多的遗忘。 在设置2中,首先对模型进行训练以识别出“鹿狗船”卡车,然后进行飞机汽车识别,但是性能对于船上卡车的影响最大。
The authors make a point that in setup 1, the model builds its representations only for vehicles, and thus the increasingly dissimilar representations for animals (cat-horse) in the second task cause more forgetting of the previously learned vehicle representations. Setup 2, however, involves training the model simultaneously on vehicles as well as animals, and thus the representations of animals now occupy a different region in the latent space than the vehicles. As a result, when presented with the latter task, the learned representations for animals are orthogonal to those for plane-car and suffer lesser degradation.
作者指出,在设置1中,该模型仅针对车辆建立其表示,因此第二项任务中对动物(猫马)的表示越来越不同,这导致人们对先前学习的车辆表示的遗忘越来越多。 但是,设置2涉及在车辆和动物上同时训练模型,因此,动物的表示现在在潜伏空间中的位置不同于车辆。 结果,当执行后一项任务时,所学习的动物表示与飞机的表示正交,并且退化较小。
The rest of this section tries to explain this from a transfer-interference point of view. Riemer et al. (2019) were the first to look at continual learning from a transfer-interference trade-off viewpoint. To grasp this, let us first dive into the limitations of the stability-plasticity dilemma. As we saw before, the dilemma states that the stability of the learned model can be improved by reducing forgetting, i.e., so far it keeps a check on the transfer of weights due to learning of the current task while minimizing their interference due to the sharing of weights that are important to the previous tasks.
本节的其余部分尝试从传输干扰的角度对此进行解释。 Riemer等。 (2019)是第一个从转移与干扰权衡的角度看待持续学习的人。 为了抓住这一点,让我们首先探讨一下稳定性-可塑性难题的局限性。 正如我们之前所看到的,两难状态表明,可以通过减少遗忘来提高学习模型的稳定性,即到目前为止,它一直在检查由于学习当前任务而产生的权重转移,同时将由于共享而造成的干扰最小化对先前任务很重要的重量。
However, since we have limited knowledge of what the future tasks may look like, minimizing the weight sharing for previous tasks tackles only half the problem — a future task that is quite related to one of the previously learned tasks might demand further sharing of these weights and the model must be able to do so without disrupting the performance on the previous tasks. We notice that there is an obvious need to extend the temporal limitations of the stability-plasticity dilemma so as to account for the uncertainty from the future tasks.
但是,由于我们对未来任务的外观知之甚少,因此,将先前任务的权重分配最小化只能解决一半的问题-与先前学习的任务之一非常相关的未来任务可能需要进一步分配这些权重并且模型必须能够做到这一点,而又不会破坏先前任务的性能。 我们注意到,显然需要扩展稳定性-可塑性困境的时间限制,以解决未来任务的不确定性。
Transfer-Interference trade-off takes care of the backward interference due to learning of an incremental task while also keeping check of the transfer of representations among weights so that they do not harm future learning. Riemer et al. thus show that tasks that are learned using the same weight components have a high potential for both interference and transfer between examples while those learned using dissimilar components suffer lesser transfer and interference.
转移干扰的权衡考虑了由于学习增量任务而引起的后向干扰,同时还检查了权重之间的表示转移,以免损害未来的学习。 Riemer等。 因此表明,使用相同权重分量学习的任务对示例之间的干扰和转移具有很高的潜力,而使用不同分量学习的任务遭受的传递和干扰较小。
Keeping the above point of view in mind, let us now look at the two CL setups of Ramasesh et al. In setup 1, the ship-truck classification task is dissimilar to the incremental cat-horse task and since the model tries to learn them using the same weight component, the high interference causes larger forgetting of the previous task.
牢记上述观点,现在让我们看一下Ramasesh等人的两种CL设置。 在设置1中,轮船分类任务与增量式马任务不同,并且由于模型尝试使用相同的权重分量来学习它们,因此高干扰会导致较大的遗忘。
In setup 2, however, we see that the model is forced to have a dissimilar representation for deer-dog than ship-truck. Since the representations for plane-car are more similar to the ship-truck classification task and are to be learned using the same weight component, this catalyzes the transfer of weights between them thus resulting in larger forgetting. On the other hand, the representations for deer and dog have components orthogonal to those of plane and car, and thus are unaffected by the inhibited transfer of weights between these.
但是,在设置2中,我们看到模型相对于卡车而言被强制具有不同的表示形式。 由于飞机的表示形式更类似于轮船分类任务,并且将使用相同的重量分量进行学习,因此这会催化它们之间的重量传递,从而导致更大的遗忘。 另一方面,鹿和狗的表示具有与飞机和汽车的表示正交的分量,因此不受制于它们之间的重量传递。
Conclusion: Put short, we saw how a continuously learning agent faces a credit assignment problem at each training step and how experience replay reinforces the credibility of each task in hand. Further, the semantics of the tasks play an important role in the amount of forgetting that an agent will suffer and this can be explained from the transfer-interference point of view. As the field continues sprouting towards large-scale and domain-independent learning, a better understanding of these trade-offs is indeed the key to more advanced training strategies such as meta-learners [3].
结论:简而言之,我们看到了一个持续学习的代理商如何在每个培训步骤中面临学分分配问题,以及经验重播如何增强手头每个任务的可信度。 此外,任务的语义在忘记代理将遭受的损失中起着重要的作用,这可以从转移干扰的角度进行解释。 随着该领域继续朝着大规模和领域无关的学习方向发展,对这些折衷的更好理解确实是采用更高级的培训策略(如元学习者)的关键[3]。
翻译自: https://towardsdatascience.com/continual-learning-where-are-we-d5706e78a295
持续学习和多任务学习的区别