Add map_action_inverse for fixing error of storing random action #568

Jimenius · 2022-03-12T03:49:51Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

(Issue #512 )Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods.

This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.

codecov-commenter · 2022-03-12T04:28:57Z

Codecov Report

Merging #568 (d7db6fa) into master (9cb74e6) will decrease coverage by 0.16%.
The diff coverage is 31.25%.

@@            Coverage Diff             @@
##           master     #568      +/-   ##
==========================================
- Coverage   93.78%   93.62%   -0.17%     
==========================================
  Files          64       64              
  Lines        4376     4391      +15     
==========================================
+ Hits         4104     4111       +7     
- Misses        272      280       +8

Flag	Coverage Δ
unittests	`93.62% <31.25%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/base.py	`75.16% <28.57%> (-4.84%)`	⬇️
tianshou/data/collector.py	`93.82% <50.00%> (-0.35%)`	⬇️
tianshou/policy/modelfree/trpo.py	`93.44% <0.00%> (+4.91%)`	⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

…-ml#568) (Issue thu-ml#512) Random start in Collector sample actions from the action space, while policies output action in a range (typically [-1, 1]) and map action to the action space. The buffer only stores unmapped actions, so the actions randomly initialized are not correct when the action range is not [-1, 1]. This may influence policy learning and particularly model learning in model-based methods. This PR fixes it by adding an inverse operation before adding random initial actions to the buffer.

Inverse map action

9442f57

Jimenius changed the title ~~Fix #512~~ Fix (#512) Mar 12, 2022

Jimenius changed the title ~~Fix (#512)~~ Fix #512 Mar 12, 2022

Trinkle23897 changed the title ~~Fix #512~~ Add map_action_inverse for fixing error of storing random action Mar 12, 2022

Trinkle23897 linked an issue Mar 12, 2022 that may be closed by this pull request

Why is Collector mapping randomly sampled actions using map_action? #512

Closed

Merge branch 'master' into inverse_map_action

d7db6fa

Trinkle23897 approved these changes Mar 12, 2022

View reviewed changes

Trinkle23897 merged commit 39f8391 into thu-ml:master Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map_action_inverse for fixing error of storing random action #568

Add map_action_inverse for fixing error of storing random action #568

Jimenius commented Mar 12, 2022 •

edited

Loading

codecov-commenter commented Mar 12, 2022 •

edited

Loading

Add map_action_inverse for fixing error of storing random action #568

Add map_action_inverse for fixing error of storing random action #568

Conversation

Jimenius commented Mar 12, 2022 • edited Loading

codecov-commenter commented Mar 12, 2022 • edited Loading

Codecov Report

Jimenius commented Mar 12, 2022 •

edited

Loading

codecov-commenter commented Mar 12, 2022 •

edited

Loading