-
-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate occasional NaN in TRPO #26
Comments
@michaelschaarschmidt w.r.t. your comment in gitter:
I'm not seeing it fail gracefully when this is hit. It seems as though a NaN or other unstable update may be making its way through the graph update, as I see my agent behavior change significantly whenever this is encountered.
Here are some results from my custom environment (do nothing is zero reward), which caps episodes at 100 steps. The agent essentially stops acting after encountering this: Finished episode 2060 after 26 timesteps (reward: 2.13) |
So the easiest hack that works for now is to just check in the code whether Another thing I noticed in continuous state spaces is that the standard deviation of the Gaussian (exploration) noise is not parameterized. That seems like a bad default for this kind of on-policy method. It's an easy fix since the required code in the |
So I have a heard time reliably reproducing this (saw it once in 20 runs on 3.6, never in 2.7), so difficult to debug. Skipping update when shs < 0 now in any case. |
TRPO occasionally fails to produce a robust update with the langrange multiplier being None, need to check if gradient computation can produce None
The text was updated successfully, but these errors were encountered: