Migrate obs/reward normalization from env.wrappers into Agent itself #205

zuoxingdong · 2020-06-05T13:33:23Z

Make online statistics as nn.Parameter and registered inside the module. It becomes trackable
- Similar style with how the BatchNorm is implemented in PyTorch
Different behavior between train/eval modes.
- Train mode: update statistics
- Eval mode: use current statistics without further updating
Unit test: before replacement, benchmark with the old behavior
- Mujoco environments: HalfCheetah, Hopper, Walker
- Seeds: 10 seeds
- Confirm no significant effect on performance

The text was updated successfully, but these errors were encountered:

zuoxingdong added design refactor labels Jun 5, 2020

zuoxingdong self-assigned this Jun 5, 2020

Provide feedback