Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several ways to crash LBFGS++ #23

Closed
mpayrits opened this issue Nov 15, 2021 · 8 comments
Closed

Several ways to crash LBFGS++ #23

mpayrits opened this issue Nov 15, 2021 · 8 comments

Comments

@mpayrits
Copy link

mpayrits commented Nov 15, 2021

Hi,

first off, I want to say that this is just a great library and a joy to use. Clean header-only modern C++ with Eigen, what's not to like? :)
So, I've been using LBFGS++ in a private project and got to stress test it a bit. I've come across quite a few situations which I believe should be considered completely valid but which currently trigger a std::logic_error to be thrown. I'd like to report on these and suggest a few improvements.
Firstly, it's very easy to run into an increasing search direction by minimally modifying the examples. For instance, I've encountered one by modifying example-rosenbrock.cpp to have a non-zero starting-value vector startVals with startVals[n] = 1.0F and startVals[n + 1] = 6.0F. The culprit is in the update procedure. BFGS and friends assume that the Hessian matrix approximation is positive definite. This can only be maintained if y_k \cdot s_k is positive at each step (I follow Wikipedia, Wikipedia, and these lectures notes in notation, which appears to be standard). This is automatically true for convex functions with arbitrary line searches or for arbitrary functions with line searches that enforce the (strong) Wolfe conditions.
example-rosenbrock.cpp features the non-convex Rosenbrock function and the default Armijo backtracking line search, which can easily cause y_k \cdot s_k < 0, which in turn can and frequently does cause a non-positive-definite (inverse) Hessian estimate, which in turn can and frequently does produce an ascending search direction.
For starters, I think the default line search should be changed from LineSearchBacktracking (which is also described as "Mainly for internal use" in its source file) to LineSearchNocedalWright from #6 which respects the Wolfe conditions. That should make the examples less prone to crashing after simple modifications. Btw, using LineSearchNocedalWright also appears to resolve #21.
One can, however, keep meaningfully optimizing after encountering y_k \cdot s_k < 0 even without enforcing the Wolfe conditions. The canonical trick, according to the very end of Chapter 8 in the previously referenced notes, is to simply restart the optimization from the current point, i.e., throw away all of your Hessian information (the y and s vectors) and start over. The first BFGS step is always just gradient descent, so this should avoid the ascending search direction.
I believe y_k \cdot s_k < 0 can occur with non-convex functions even when trying to enforce the Wolfe conditions when the problem has bounds - as a simple example, imagine a 1D concave function defined over a bounded interval.
For consistency, non-convex functions should either be prohibited when using line searches that don't enforce the Wolfe conditions or when there are bounds, or the restarting logic should be added to BFGSMat. The latter is much more attractive from a user's point of view, but it's probably not a one-liner :)
Even with all this in place, exceptional situations, which currently cause a std::logic_error to be thrown, can probably still occur due to rounding errors, functions that are not bounded from below, bugs in the functor implementations, etc. These situations should therefore continue being handled, though not necessarily by means of exception throwing. I agree with #17 that other mechanisms of reporting the manner of termination would be more user-friendly. But even if exceptions stay, I strongly feel that they should be a custom exception type derived from std::runtime_error rather than a std::logic_error. The latter exception type is intended to mark preventable programmer errors, which is not always the cause in LBFGS++. Since exceptions propagate through the entire program, which might not even care that LBFGS++ is called somewhere down the stack, I'd argue it's best to stick to conventions.
It's also possible to get the box-constrained optimizer to crash on a lightly modified example. I managed to do so by modifying example-rosenbrock.cpp to use LBFGSB* rather than LBFGS* types, having a non-zero starting-value vector startVals with startVals[n] = 1.0F and startVals[n + 1] = 95.0F, and by bounding all of the variables to [0, \infty] (but you never know with floating point arithmetic - playing around with starting values might be necessary to get a crash on your system).
I've traced the crash to a big old typo - I'm pretty sure the fI_lo on this line should be I_lo.
I haven't yet stared at the Moré-Thuente paper long enough to be confident about these, but I also find two other details potentially troublesome. First, the second paragraph of section 4 mentions that the values and derivatives of \psi should be evaluated when generating trial step values until \psi(\alpha) \le 0 and \phi'(\alpha) \ge 0 is satisfied, after which \phi should be evaluated instead of \psi, and I don't see that switch in the code. And second, at the top of page 300 it is stated that the formula that is used for case 3 in step_selection before delta-scaling should only be used "If the cubic tends to infinity in the direction of the step and the minimum of the cubic is beyond \alpha_k" and that one should set \alpha_k = \alpha_s otherwise. I do not see any special handling of the cubic in case 3 in the code. But I might be misunderstanding some part of the code. What do you think?
To sum up, here are my suggestions ordered roughly by decreasing simplicity and urgency.

  • Fix the fI_lo typo.
  • Make LineSearchNocedalWright the default BFGS line search.
  • Add restarting logic to BFGSMat on encountering y_k \cdot s_k < 0.
  • Check the potential \phi/\psi and case-3 cubic issues of the Moré-Thuente search.
  • Replace exceptions with something more lightweight or at least change exception type.

Thanks again for putting this great library out there. I'm looking forward to seeing it become even greater with some of the corner cases above getting covered.

@izsahara
Copy link

Hi @mpayrits would you be so kind as to share your modified version of this library in some accessible repo/fork? Im having troubles with getting this library to work in a way that its results are (more-or-less) on par with (pythons) scipy implementation. I would like to see if your implementation does the job in improving. Thanks

@yixuan
Copy link
Owner

yixuan commented Nov 16, 2021

I sincerely thank all the comments and suggestions to make LBFGS++ a better library. One thing I have to say sorry is that I'm indeed quite busy these days and it may take some time before I have some continuous time to read the details. I will keep this in mind. Thanks!

@mpayrits
Copy link
Author

mpayrits commented Nov 16, 2021

@izsahara, you can find a version with the first two issues from the above list resolved here. There's literally four lines of differences from the upstream, which doesn't feel quite like a PR, but I could still open it if it saves you any effort @yixuan. Other issues are more involved and would need some time. Speaking of time, no problem, fully understood, take your time @yixuan and I hope the content of this issue is helpful to you when you can get round to it.

By the way, I've found that fixing the first issue also resolves #9. Unfortunately it does not resolve #15 and I don't think the third bullet point will either. Perhaps the fourth, or there's something else going on there. It would also be interesting to see if #14 is resolved.

benfulton pushed a commit to hahnlab/CAGEE that referenced this issue Feb 13, 2022
Replace LineSearchBacktracking (which is also described as "Mainly
for internal use" in its source file) to LineSearchNocedalWright
(see yixuan/LBFGSpp#23 )

Fix small bug, reduce required precision.
@yixuan
Copy link
Owner

yixuan commented Apr 28, 2022

Most of the issues have been fixed by @mpayrits in #25, and I'm looking deeper at the Moré-Thuente paper to see if there is any additional bug in the code.

@yixuan
Copy link
Owner

yixuan commented Apr 30, 2022

I've dug into the paper a little deeper, and now I feel more confident to answer some of the questions raised by @mpayrits.

For the phi/psi issue, I have actually explained it in the LineSearchMoreThuente documentation before (and I forgot this as it was written long ago 😂). The point is that if we assume eta > mu, then the "psi stage" suffices to finish the line search, and so there is no need to switch to the "phi stage", which is designed for the general case including eta < mu. The strong Wolfe condition indeed assumes that 0 < mu < eta < 1.

For the cubic interpolation/extrapolation, yes, I did it wrong in the previous implementation. I have massively rewritten the line search scheme in 9985fd0, and it seems to work well after I tested some CUTEst examples. Of course, more feedback is greatly welcome.

Handling the exceptions definitely requires more design work. It can be left for future, I suppose.

@mpayrits
Copy link
Author

mpayrits commented May 2, 2022

Hi @yixuan,
thanks for the helpful explanation of the influence of eta and mu values on the convergence properties. I agree that for mu < eta we just need to plug psi into the line search rather than phi, and that supporting the somewhat obscure case of mu > eta wouldn't really benefit anyone.
Great to hear about the search scheme rewrite too! And that the testing looks promising.
By the way, have you considered sticking some of those CUTEst tests into a test suite? (testing is obviously anything but a by-the-way matter but this issue is probably not the place for a lengthy discussion of it)
Exceptions are also a much lesser issue now that increasing search directions are less common (let's hope even fully eradicated *knock on wood*). So as far as I'm concerned, all the points of this issue have been addressed. Thanks!

@yixuan
Copy link
Owner

yixuan commented May 3, 2023

Here is the first step to test LBFGS++ on a (large) collection of CUTEst problems.

@yixuan
Copy link
Owner

yixuan commented May 25, 2023

This commit has removed major runtime exceptions when maximum number of line searches is reached, which should resolve most of the common issues.

There remain some exceptions in the code, which are mostly "serious" issues that are hard to proceed.

I think we can tentatively close this issue.

@yixuan yixuan closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants