-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Actor preprocessing network reuse for continuous case, fixes in DQN network #1128
Conversation
94537e3
to
ccc2aa5
Compare
ccc2aa5
to
678bf40
Compare
… the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network
preprocessing network for the critic (must be applied before concatenating the actions)
…s, replay_buffer_stack_num
678bf40
to
8ac6bf5
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1128 +/- ##
==========================================
- Coverage 86.36% 86.07% -0.29%
==========================================
Files 102 103 +1
Lines 8579 8632 +53
==========================================
+ Hits 7409 7430 +21
- Misses 1170 1202 +32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
poe format
poe lint
andpoe type-check
poe test
(or a subset of them with
poe test-reduced
) ,and they passpoe doc-build
This PR fixes a bug in DQN and lifts a limination in reusing the actor's preprocessing network for continuous environments.
atari_network.DQN
:continuous.Critic
:apply_preprocess_net_to_obs_only
to allow thepreprocessing network to be applied to the observations only (without
the actions concatenated), which is essential for the case where we want
to reuse the actor's preprocessing network
preprocessing network for the critic (must be applied before concatenating
the actions)