Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RL-baseline] Model v3, experiment #3 #36

Open
wants to merge 3 commits into
base: RL-baseline-v3
Choose a base branch
from

Conversation

ziritrion
Copy link
Collaborator

Policy network is almost identical to v2, but we removed the final FCL from the main network and added a new one to both the actor and the critic to allow them to learn more.

We also tweaked the actions compared to the experiments with v2. The main difference is that a new no-move action was added to allow the network to use the car's momentum. Here is the action list for this experiment:

[0.0, 0.0, 0.0], # no action
[0.0, 0.8, 0.0], # throttle
[0.0, 0.0, 0.6], # break
[-0.9, 0.0, 0.0], # left
[0.9, 0.0, 0.0], # right

Around the 10k episode mark, the entropy collapsed and never recovered. Running reward managed to stay positive up until past the 15k mark, where it finally collapsed to very low numbers. Results are below:

Notification_Center
Notification_Center

Sample video below:
https://user-images.githubusercontent.com/1465235/112736341-4b660700-8f52-11eb-9d19-a1b73e30ec89.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant