Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RL-baseline] Model v3, experiment #2 #35

Open
wants to merge 3 commits into
base: RL-baseline-v3
Choose a base branch
from

Conversation

ziritrion
Copy link
Collaborator

Policy network is almost identical to v2, but we removed the final FCL from the main network and added a new one to both the actor and the critic to allow them to learn more.

We also tweaked the actions compared to the experiments with v2. The main difference is that a new no-move action was added to allow the network to use the car's momentum. Here is the action list for this experiment:

[0.0, 0.0, 0.0], # no action
[0.0, 0.8, 0.0], # throttle
[0.0, 0.0, 0.6], # break
[-0.9, 0.0, 0.0], # left
[-0.5, 0.0, 0.0], # left
[-0.2, 0.0, 0.0], # left
[0.9, 0.0, 0.0], # right
[0.5, 0.0, 0.0], # right
[0.2, 0.0, 0.0], # right

Around the 15k episode mark, the entropy collapsed and never recovered. Running reward for all of the following episodes was negative. Final running reward is -34, with a max reward of 512 around step 9k.

Notification_Center
Notification_Center

Sample video below. Similar results to experiment #1: the car goes forward but never turns.
https://user-images.githubusercontent.com/1465235/112730896-6de92780-8f34-11eb-9438-c7fe699a179c.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant