Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第五章SARSA算法描述是否有误 #44

Open
txsniper opened this issue Jul 19, 2022 · 3 comments
Open

第五章SARSA算法描述是否有误 #44

txsniper opened this issue Jul 19, 2022 · 3 comments

Comments

@txsniper
Copy link

SARSA 训练流程:
4. 根据当前策略做抽样: a˜t+1 ∼ πnow( · j st+1)。注意, a˜t+1 只是假想的动作,智能体
不予执行

看其他资料
SARSA算法在本次迭代后,会用 a˜t+1 更新 a(也就是说下一步一定会在s˜t+1 执行a˜t+1):
s = s˜t+1
a = a˜t+1

@wangshusen
Copy link
Owner

不对的。策略随时会更新,不能保证 t+1 时刻的动作是a˜t+1

@txsniper
Copy link
Author

txsniper commented Jul 19, 2022

每次迭代的最后一步就是给s和a赋值;相反,Q-learning才是下一次动作需要重新采样确定的
v2-a7c02634548471ab0fd9df11c2597bda_1440w

实现代码中的写法也是这样
https://hrl.boyuai.com/chapter/1/%E6%97%B6%E5%BA%8F%E5%B7%AE%E5%88%86%E7%AE%97%E6%B3%95#53-sarsa-%E7%AE%97%E6%B3%95

@wangshusen
Copy link
Owner

他们这种写法真的不严谨。。。需要假设policy不变,才能像他们这样实现

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants