Eligibility Traces

Eligibility traces是Reinforcement Learning中的一个基本机制。TD(λ)算法中的λ指的就是对Eligibility traces的运用。几乎所有的TD算法，包括Q Learning、Sarsa算法，可以结合Eligibility trace得到一个通用的能更有效学习的方法。
可以从两种视角看待Eligibility trace，一种是forward（theoretical）的视角，另一种是backward（mechanical）的视角。顾名思义，forward即为向前看，backward即为向后看。forward的方式因其计算量较大，故在真正实践时都是用的backward的方式实现。

n-Step TD prediction

强化学习之Eligibility Traces
图上是TD(1-step)…TD(n-step)、蒙特卡罗的backup图。target分别是：

G (1) t = R t + 1 + γ V (S t + 1)

G (2) t = R t + 1 + γ V (S t + 1) + γ 2 V (S t + 2)

. . .

G (n) t = R t + 1 + γ V (S t + 1) + γ 2 V (S t + 2) + . . . + γ n - 1 R t + n + γ n V (S t + n)

当episode在n步之前终止，则G(n)t=G(T−t)t=Gt

强化学习之Eligibility Traces

Eligibility Traces

n-Step TD prediction

Forward view of TD(λ)

相关推荐