Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map_action_inverse for fixing error of storing random action #568

Merged
merged 2 commits into from
Mar 12, 2022

Conversation

Jimenius
Copy link
Contributor

@Jimenius Jimenius commented Mar 12, 2022

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

(Issue #512 )Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods.

This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.

@Jimenius Jimenius changed the title Fix #512 Fix (#512) Mar 12, 2022
@Jimenius Jimenius changed the title Fix (#512) Fix #512 Mar 12, 2022
@Trinkle23897 Trinkle23897 changed the title Fix #512 Add map_action_inverse for fixing error of storing random action Mar 12, 2022
@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2022

Codecov Report

Merging #568 (d7db6fa) into master (9cb74e6) will decrease coverage by 0.16%.
The diff coverage is 31.25%.

@@            Coverage Diff             @@
##           master     #568      +/-   ##
==========================================
- Coverage   93.78%   93.62%   -0.17%     
==========================================
  Files          64       64              
  Lines        4376     4391      +15     
==========================================
+ Hits         4104     4111       +7     
- Misses        272      280       +8     
Flag Coverage Δ
unittests 93.62% <31.25%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/policy/base.py 75.16% <28.57%> (-4.84%) ⬇️
tianshou/data/collector.py 93.82% <50.00%> (-0.35%) ⬇️
tianshou/policy/modelfree/trpo.py 93.44% <0.00%> (+4.91%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Trinkle23897 Trinkle23897 merged commit 39f8391 into thu-ml:master Mar 12, 2022
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
…-ml#568)

(Issue thu-ml#512) Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when the action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods.

This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why is Collector mapping randomly sampled actions using map_action?
3 participants