Failure to converge (again...) #9

Svalorzen · 2020-04-03T16:14:48Z

Hi, sorry again. I'm having trouble with convergence, again using the code you helped me to fix. I have tested the code with larger problems successfully, but I have just found a small case where it fails.

It looks relatively simple, but the minimization process fails due to the line search step becoming too small after 2 iterations. Small variations in the setup make the problem solvable (for example, modifying the initial guess from 1.0, 1.0 to 1.1, 1.1; or by adding more points to the training set or modifying some of them).

Is this type of failure a common occurrence with this type of optimizers (as you may have guessed I'm not really an expert in this field)? Using scipy, with the same algorithm, does not seem to usually have problems. Should I just put some try/catch blocks around the optimizer, and if it fails try again with different parameters (which I should do anyway to ensure a global min is found, but I'm asking specifically about errors)? Or maybe it's just a bug and I can keep reporting them :)

Thanks again in advance for your help.

#include <iostream>
#include <Eigen/Eigenvalues>

#include <LBFGSB.h>

constexpr double PI = 3.141592653589793238462643383279502884197169399375105820974944;

using Matrix2D = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor | Eigen::AutoAlign>;
using Vector = Eigen::Matrix<double, Eigen::Dynamic, 1>;

Matrix2D makeDist(const Matrix2D & x1, const Matrix2D & x2) {
    return ((-2.0 * (x1 * x2.transpose())).colwise() + x1.rowwise().squaredNorm()).rowwise() + x2.rowwise().squaredNorm().transpose();
}

Matrix2D kernel(const Matrix2D & dist, double l = 1.0, double sigma = 1.0) {
    return std::pow(sigma, 2) * ((-0.5 / std::pow(l, 2)) * dist).array().exp();
}

struct Likelihood {
    Likelihood(const Matrix2D & xData, const Vector & muData, double noise) :
            xData_(xData), muData_(muData), noise_(noise) {}

    const Matrix2D & xData_;
    const Vector & muData_;
    double noise_;

    double operator()(const Eigen::VectorXd& x, Eigen::VectorXd& grad) {
        const auto l = x[0];
        const auto sigma = x[1];

        auto dist = makeDist(xData_, xData_);
        auto k = kernel(dist, l, sigma);

        Matrix2D dkl = k.cwiseProduct(dist / std::pow(l, 3));
        Matrix2D dks = (2.0 / sigma) * k;

        k.diagonal().array() += std::pow(noise_, 2);

        auto L = k.llt();
        Matrix2D kinv = L.solve(Matrix2D::Identity(muData_.size(), muData_.size()));
        Vector alpha = L.solve(muData_);

        // Negative gradient to minimize negative log-likelihood
        grad[0] = -(0.5 * alpha.transpose() * dkl * alpha - 0.5 * kinv.cwiseProduct(dkl).sum());
        grad[1] = -(0.5 * alpha.transpose() * dks * alpha - 0.5 * kinv.cwiseProduct(dks).sum());

        // Log-likelihood
        const double ll = -0.5 * muData_.transpose() * alpha - L.matrixL().selfadjointView().diagonal().array().log().sum() - 0.5 * muData_.size() * std::log(2.0 * PI);
        std::cout << l << ", " << sigma << " ==> " << -ll << "  -  " << grad.transpose() << std::endl;
        return -ll;
    }
};

int main() {
    const double noise = 1e-8;

    // Training points to test on. Small modifications seem to make the problem
    // solvable again (for example replacing the 0 with a 1).
    Matrix2D X_train(7, 1);
    X_train << -4, -3, -2, -1, 0, 2, 4;

    Vector Y_train = X_train.array().sin().array();

    // Log-likelihood function with set data/noise
    Likelihood like(X_train, Y_train, noise);

    // Optimizer params
    LBFGSpp::LBFGSBParam<double> param;
    param.epsilon = 1e-5;
    param.max_iterations = 100;

    // Bounds
    Vector lb = Vector::Constant(2, 1e-8);
    Vector ub = Vector::Constant(2, std::numeric_limits<double>::infinity());

    // Initial guess (replacing 1.0 with other numbers seem to make the problem
    // solvable as well).
    Vector x = Vector::Constant(2, 1.0);

    LBFGSpp::LBFGSBSolver<double> solver(param);

    double log;
    solver.minimize(like, x, log, lb, ub);

    return 0;
}

The output is:

1, 1 ==> 6.96291  -  -3.03909  3.49729
1.9499, 0.68744 ==> 5.99037  -   4.90625 -14.3289
1.28814, 0.905188 ==> 5.76892  -  -3.16873  2.10955
terminate called after throwing an instance of 'std::runtime_error'
  what():  the line search step became smaller than the minimum value allowed
Aborted (core dumped)

If the starting point [1.1, 1.1] is used, it works:

1.1, 1.1 ==> 6.99387  -  -3.4496 3.58888
2.05273, 0.796194 ==> 5.19971  -   4.32208 -9.35253
1.46782, 0.982711 ==> 5.34336  -  -3.34543  2.06644
6.15734, 0.336145 ==> 1.88086e+06  -   3.82257e+06 -1.11909e+07
2.89361, 0.78613 ==> 39.5965  -   131.639 -109.667
1.86985, 0.927282 ==> 4.3294  -  -0.92675 -1.49038
2.08611, 0.992819 ==> 4.19746  -   1.20536 -2.77274
2.10575, 1.08705 ==> 4.02656  -  0.314624 -1.26704
2.18967, 1.21189 ==> 3.94199  -   0.334645 -0.571357
2.20651, 1.27356 ==> 3.92379  -  0.00729654  -0.127709
2.22573, 1.30522 ==> 3.92164  -   0.0325654 -0.0340186
2.2261, 1.30993 ==> 3.92157  -  -0.00439909  0.00135147
2.22656, 1.31028 ==> 3.92156  -   0.000116421 -1.42964e-05
2.22655, 1.31027 ==> 3.92156  -  -3.84116e-07  4.18181e-07

Same if the starting point [0.9, 0.9] is used.

0.9, 0.9 ==> 6.90955  -  -2.61983  3.17243
1.84575, 0.575104 ==> 7.52613  -   6.48836 -24.6506
1.37833, 0.735678 ==> 5.43282  -  -2.02235 -1.69158
2.20017, 0.694899 ==> 8.04254  -    13.791 -24.1841
1.6203, 0.723671 ==> 5.12384  -  -0.641197  -4.86297
1.95251, 0.837837 ==> 4.62563  -   1.38858 -5.23043
2.05031, 1.00839 ==> 4.13246  -  0.396067 -1.95868
2.14692, 1.12796 ==> 3.9945  -  0.522762 -1.14357
2.16689, 1.23065 ==> 3.9341  -  -0.221894 -0.153314
2.23342, 1.30105 ==> 3.9229  -   0.202664 -0.141067
2.22493, 1.30545 ==> 3.92161  -   0.0169154 -0.0246754
2.22623, 1.30971 ==> 3.92157  -  -0.000161013  -0.00168692
2.22654, 1.31025 ==> 3.92156  -  -9.30085e-05  9.63869e-06
2.22655, 1.31026 ==> 3.92156  -  -1.78201e-06  9.67718e-07

The text was updated successfully, but these errors were encountered:

yixuan · 2020-04-06T17:14:42Z

It might be numerical issues. The Gaussian kernel computes the exponential function of the squared distance. For example, in your example the distance between -4 and 4 is 8, so when l=1, the kernel computes exp(-8^2/2)=1.266417e-14. Then when you invert the kernel matrix you get pretty large gradient, and the line search may fail. I agree that some more mature software may handle this better, but given the nature of the kernel function, there can be lots of instabilities. A good initial value is always appreciated.

Specific to your problem, let me think of a better way to computing the gradient.

Svalorzen · 2020-04-07T16:26:55Z

I think you might be right. Now I realize that the Python code actually uses numerical gradients (as I don't pass it the derivative), so it might be able to avoid the numerical instabilities at the price of performance?

I am looking at GPy's code also to see if I can find a different way to compute the gradient..

mpayrits mentioned this issue Nov 16, 2021

Several ways to crash LBFGS++ #23

Closed

mpayrits mentioned this issue Apr 25, 2022

A few critical fixes #25

Merged

yixuan closed this as completed in #25 Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to converge (again...) #9

Failure to converge (again...) #9

Svalorzen commented Apr 3, 2020 •

edited

Loading

yixuan commented Apr 6, 2020

Svalorzen commented Apr 7, 2020

Failure to converge (again...) #9

Failure to converge (again...) #9

Comments

Svalorzen commented Apr 3, 2020 • edited Loading

yixuan commented Apr 6, 2020

Svalorzen commented Apr 7, 2020

Svalorzen commented Apr 3, 2020 •

edited

Loading