-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alter the Reward function used in learn.py #6
Comments
Hello @amijeet, These 2 scripts:
And these 2 classes: This is a much simplified take-off and hover scenario with a 2-D obs space (z and velocity in z) and a 1-D action space (the RPM for all motors). The reward is 1 for z between 0.75 and 0.99 and 0 otherwise. In this example, running
Output:
Of course, more complicated tasks, using higher dimensional observations and action vectors can require:
as well as much longer training times. E.g. simply making the input 4-D complicates the problem enough that PPO only collects 1/5 of the reward in 15x the number of iterations:
I don't have all the answers, the purpose of this gym is exactly to try (and let others try) these things. |
Hey @JacopoPan, forgive me if this is a naive question as I am still relatively new to reinforcement learning and your library. I just ran singleagent.py (from the most recent commit) on takeoff and I noticed that my model seems to be far slower than the one you showed. It seems that you were able to break the mean reward threshold after 40000 timesteps. While I am stuck in -30 at around 120000. Do you know why this may be happening and do you have any suggestions on how to speed up the training? Thanks! Here is the output for reference:
|
@ArminBaz the reward function in the latest commit is not the same of when I wrote the message above.
(-30 over the episode should be ok, as there are negative rewards for any point except the desired hover one) |
@JacopoPan That makes a lot of sense, thank you for getting back so quickly! |
Hi Jacopo, could you please share with us the reward function you have used in learn.py. Also could you please suggest how can I alter the already used reward function in learn.py ?
Also how much time does it take to train and reach satisfactory results in learn.py ?
Best Wishes.
The text was updated successfully, but these errors were encountered: