Reinforcement Learning(四):Actor-Critic Methods
分类:
文章
•
2025-02-24 23:16:11

主要思想:

Policy Network (Actor)

Value Network (Critic):

形象对比:

Train the Neural Networks

具体步骤:

Update value network q using TD

Update policy network Π using policy gradient

Actor-Critic Method




Summary of Algorithm


Summary
Policy Network and Value Network


Training
