-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several ways to crash LBFGS++ #23
Comments
Hi @mpayrits would you be so kind as to share your modified version of this library in some accessible repo/fork? Im having troubles with getting this library to work in a way that its results are (more-or-less) on par with (pythons) scipy implementation. I would like to see if your implementation does the job in improving. Thanks |
I sincerely thank all the comments and suggestions to make LBFGS++ a better library. One thing I have to say sorry is that I'm indeed quite busy these days and it may take some time before I have some continuous time to read the details. I will keep this in mind. Thanks! |
@izsahara, you can find a version with the first two issues from the above list resolved here. There's literally four lines of differences from the upstream, which doesn't feel quite like a PR, but I could still open it if it saves you any effort @yixuan. Other issues are more involved and would need some time. Speaking of time, no problem, fully understood, take your time @yixuan and I hope the content of this issue is helpful to you when you can get round to it. By the way, I've found that fixing the first issue also resolves #9. Unfortunately it does not resolve #15 and I don't think the third bullet point will either. Perhaps the fourth, or there's something else going on there. It would also be interesting to see if #14 is resolved. |
Replace LineSearchBacktracking (which is also described as "Mainly for internal use" in its source file) to LineSearchNocedalWright (see yixuan/LBFGSpp#23 ) Fix small bug, reduce required precision.
I've dug into the paper a little deeper, and now I feel more confident to answer some of the questions raised by @mpayrits. For the For the cubic interpolation/extrapolation, yes, I did it wrong in the previous implementation. I have massively rewritten the line search scheme in 9985fd0, and it seems to work well after I tested some CUTEst examples. Of course, more feedback is greatly welcome. Handling the exceptions definitely requires more design work. It can be left for future, I suppose. |
Hi @yixuan, |
Here is the first step to test LBFGS++ on a (large) collection of CUTEst problems. |
This commit has removed major runtime exceptions when maximum number of line searches is reached, which should resolve most of the common issues. There remain some exceptions in the code, which are mostly "serious" issues that are hard to proceed. I think we can tentatively close this issue. |
Hi,
first off, I want to say that this is just a great library and a joy to use. Clean header-only modern C++ with Eigen, what's not to like? :)
So, I've been using LBFGS++ in a private project and got to stress test it a bit. I've come across quite a few situations which I believe should be considered completely valid but which currently trigger a
std::logic_error
to be thrown. I'd like to report on these and suggest a few improvements.Firstly, it's very easy to run into an increasing search direction by minimally modifying the examples. For instance, I've encountered one by modifying
example-rosenbrock.cpp
to have a non-zero starting-value vectorstartVals
withstartVals[n] = 1.0F
andstartVals[n + 1] = 6.0F
. The culprit is in the update procedure. BFGS and friends assume that the Hessian matrix approximation is positive definite. This can only be maintained ify_k \cdot s_k
is positive at each step (I follow Wikipedia, Wikipedia, and these lectures notes in notation, which appears to be standard). This is automatically true for convex functions with arbitrary line searches or for arbitrary functions with line searches that enforce the (strong) Wolfe conditions.example-rosenbrock.cpp
features the non-convex Rosenbrock function and the default Armijo backtracking line search, which can easily causey_k \cdot s_k < 0
, which in turn can and frequently does cause a non-positive-definite (inverse) Hessian estimate, which in turn can and frequently does produce an ascending search direction.For starters, I think the default line search should be changed from
LineSearchBacktracking
(which is also described as "Mainly for internal use" in its source file) toLineSearchNocedalWright
from #6 which respects the Wolfe conditions. That should make the examples less prone to crashing after simple modifications. Btw, usingLineSearchNocedalWright
also appears to resolve #21.One can, however, keep meaningfully optimizing after encountering
y_k \cdot s_k < 0
even without enforcing the Wolfe conditions. The canonical trick, according to the very end of Chapter 8 in the previously referenced notes, is to simply restart the optimization from the current point, i.e., throw away all of your Hessian information (they
ands
vectors) and start over. The first BFGS step is always just gradient descent, so this should avoid the ascending search direction.I believe
y_k \cdot s_k < 0
can occur with non-convex functions even when trying to enforce the Wolfe conditions when the problem has bounds - as a simple example, imagine a 1D concave function defined over a bounded interval.For consistency, non-convex functions should either be prohibited when using line searches that don't enforce the Wolfe conditions or when there are bounds, or the restarting logic should be added to
BFGSMat
. The latter is much more attractive from a user's point of view, but it's probably not a one-liner :)Even with all this in place, exceptional situations, which currently cause a
std::logic_error
to be thrown, can probably still occur due to rounding errors, functions that are not bounded from below, bugs in the functor implementations, etc. These situations should therefore continue being handled, though not necessarily by means of exception throwing. I agree with #17 that other mechanisms of reporting the manner of termination would be more user-friendly. But even if exceptions stay, I strongly feel that they should be a custom exception type derived fromstd::runtime_error
rather than astd::logic_error
. The latter exception type is intended to mark preventable programmer errors, which is not always the cause in LBFGS++. Since exceptions propagate through the entire program, which might not even care that LBFGS++ is called somewhere down the stack, I'd argue it's best to stick to conventions.It's also possible to get the box-constrained optimizer to crash on a lightly modified example. I managed to do so by modifying
example-rosenbrock.cpp
to useLBFGSB*
rather thanLBFGS*
types, having a non-zero starting-value vectorstartVals
withstartVals[n] = 1.0F
andstartVals[n + 1] = 95.0F
, and by bounding all of the variables to[0, \infty]
(but you never know with floating point arithmetic - playing around with starting values might be necessary to get a crash on your system).I've traced the crash to a big old typo - I'm pretty sure the
fI_lo
on this line should beI_lo
.I haven't yet stared at the Moré-Thuente paper long enough to be confident about these, but I also find two other details potentially troublesome. First, the second paragraph of section 4 mentions that the values and derivatives of
\psi
should be evaluated when generating trial step values until\psi(\alpha) \le 0
and\phi'(\alpha) \ge 0
is satisfied, after which\phi
should be evaluated instead of\psi
, and I don't see that switch in the code. And second, at the top of page 300 it is stated that the formula that is used for case 3 instep_selection
beforedelta
-scaling should only be used "If the cubic tends to infinity in the direction of the step and the minimum of the cubic is beyond\alpha_k
" and that one should set\alpha_k = \alpha_s
otherwise. I do not see any special handling of the cubic in case 3 in the code. But I might be misunderstanding some part of the code. What do you think?To sum up, here are my suggestions ordered roughly by decreasing simplicity and urgency.
fI_lo
typo.LineSearchNocedalWright
the default BFGS line search.BFGSMat
on encounteringy_k \cdot s_k < 0
.\phi
/\psi
and case-3 cubic issues of the Moré-Thuente search.Thanks again for putting this great library out there. I'm looking forward to seeing it become even greater with some of the corner cases above getting covered.
The text was updated successfully, but these errors were encountered: