Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate obs/reward normalization from env.wrappers into Agent itself #205

Open
zuoxingdong opened this issue Jun 5, 2020 · 0 comments
Open
Assignees

Comments

@zuoxingdong
Copy link
Owner

  • Make online statistics as nn.Parameter and registered inside the module. It becomes trackable

    • Similar style with how the BatchNorm is implemented in PyTorch
  • Different behavior between train/eval modes.

    • Train mode: update statistics
    • Eval mode: use current statistics without further updating
  • Unit test: before replacement, benchmark with the old behavior

    • Mujoco environments: HalfCheetah, Hopper, Walker
    • Seeds: 10 seeds
    • Confirm no significant effect on performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant