Why Normalization of vf #6

im-Kitsch · 2022-06-15T10:48:41Z

Hello,

thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma) and critic update is implemented by
min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2.

Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.

It holds similar performance with original version.

Best,

The text was updated successfully, but these errors were encountered:

xbpeng · 2022-06-15T23:50:17Z

the value scaling is just mainly a convention, i generally like to keep things normalized between 0 and 1. Training should work just as well without the normalization, but it might just need some tuning for the other hyper parameters like the stepsize.

im-Kitsch changed the title ~~Normalization of vf~~ Why Normalization of vf Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Normalization of vf #6

Why Normalization of vf #6

im-Kitsch commented Jun 15, 2022

xbpeng commented Jun 15, 2022

Why Normalization of vf #6

Why Normalization of vf #6

Comments

im-Kitsch commented Jun 15, 2022

xbpeng commented Jun 15, 2022