Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

Open
wants to merge 1 commit into
base: ppo-nm-early-stop-green-penalty
Choose a base branch
from

Conversation

xeviknal
Copy link
Owner

Playing around with PPO hyperparams we've realized that increasing the value function coeff from 1.0 to 2.0 boosts the training.

The first part of the training (orange) uses the c1 coeff to 1.0 whereas the second part is set to 2.0.
c1-10-half-c1-20-half

However, doing a whole training with c1=2.0 and c2=0.08 shows that the model ends ups overfitting and decreasing the entropy radically.
c1-2-c2-0 08

The early-stop wrapper makes it easy to learn how to go fast and ahead. However, after learning this it is not able to learn how to drive through curves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant