模型无关学习

《强化学习》模型无关方法

Monte-Carlo & Temporal Difference; Q-learning

《强化学习》模型无关方法

《强化学习》模型无关方法

《强化学习》模型无关方法

《强化学习》模型无关方法

《强化学习》模型无关方法

on-policy	off-policy
Agent 可以选择动作	Agent 不能选择动作
Most obvious setup	Learning with exploration,playing without exploration
Agent always follows his own policy	Learning from expert(expert is imperfect)
	Learning from sessions(recorded data)
can’t learn from off-policy	can learn from on-policy
SARSA	Q-learning
more…	Expected Value SARSA

略