a bug about critic update #3

EvergrowHook · 2023-04-04T08:31:33Z

Hello,

I find your work is really helpful and I really appreciate it, however I found a bug at critic update stage which affect the final performance.

It is in TD3HUG.py at L80-L81

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(0, 1)
a_ = (self.actor_target(bs_).detach() + noise1).clamp(0, 1)

I think the first parameter in clamp should be the lower limit rather than 0, and I think noise1 should use the NOISE_CLIP hyperparameter.

I change these two lines into

noise1 = (torch.randn_like(ba) * self.policy_noise).clamp(-self.noise_clip, self.noise_clip) # self.noise_clip refer to the NOISE_CLIP hyperparameter
a_ = (self.actor_target(bs_).detach() + noise1).clamp(-1, 1)

There might be also bugs about clamp elsewhere, but I didn't check.

Would be very appreciated if you look into this.

Monnalo · 2023-04-21T09:21:50Z

Hello, the author's paper mentions that the range of one-dimensional action is from 0 to 1, where 0 to 0.5 represents a left turn and 0.5 to 1 represents a right turn. Therefore, I think it is okay to set the lower bound of the parameter to 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a bug about critic update #3

a bug about critic update #3

EvergrowHook commented Apr 4, 2023

Monnalo commented Apr 21, 2023

a bug about critic update #3

a bug about critic update #3

Comments

EvergrowHook commented Apr 4, 2023

Monnalo commented Apr 21, 2023