Loss of A2C algorithm #21

yingchengyang · 2020-04-06T06:17:28Z

I'd like to ask about the loss of a2c algorithm. There are three part of loss in tianshou's code:
loss = a_loss + self._w_vf * vf_loss - self._w_ent * ent_loss
I'd like to ask about the meaning of the third part of the loss. I try use this to run in CartPole-v0, it converge after 1355 steps,then I try use only the first and second part of loss to run in CartPole-v0, it converge after 770 steps. It seems that only use the first and second part of loss is enough?

Trinkle23897 · 2020-04-06T06:20:03Z

The third loss is about the policy entropy loss. In other repo such as openai baseline, their A2C/A3C implementation has this term, so I add the entropy loss here.
Of course you can set the weight of entropy loss to zero :)

yingchengyang · 2020-04-06T07:41:11Z

thanks！

Trinkle23897 closed this as completed Apr 6, 2020

Trinkle23897 added a commit that referenced this issue Apr 6, 2020

add policy docs (#21)

e0809ff

Trinkle23897 added this to Usage in Issue/PR Categories Apr 10, 2020

Trinkle23897 added the question Further information is requested label May 5, 2020

BFAnas pushed a commit to BFAnas/tianshou that referenced this issue May 5, 2024

add policy docs (thu-ml#21)

357642a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss of A2C algorithm #21

Loss of A2C algorithm #21

yingchengyang commented Apr 6, 2020

Trinkle23897 commented Apr 6, 2020

yingchengyang commented Apr 6, 2020

Loss of A2C algorithm #21

Loss of A2C algorithm #21

Comments

yingchengyang commented Apr 6, 2020

Trinkle23897 commented Apr 6, 2020

yingchengyang commented Apr 6, 2020