Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LAPACK] Unexpected exception handling behavior in getrf #642

Open
vlad-perevezentsev opened this issue Mar 4, 2025 · 0 comments
Open

Comments

@vlad-perevezentsev
Copy link

Summary

After renaming from oneMKL interfaces to oneMath , there is an issue with exception handling when calling oneapi::mkl::lapack::getrf

In oneMKL (v0.6) exception handling worked correctly. When processing a singular matrix, the function threw oneapi::mkl::lapack::computation_error which allowed to extract info code.
In oneMath (develop branch), the function now throws oneapi::mkl::computation_error

Similarly, if an invalid argument is passed, the expected behavior was to throw oneapi::mkl::lapack::exception but instead the function now throws oneapi::mkl::invalid_argument

According to the documentation getrf should throw oneapi::math::lapack::computation_error.
However, the current implementation does not follow this and oneapi::mkl::computation_error lacks info() method breaking existing exception-handling logic.

At the same time getrf_batch correctly throws oneapi::mkl::lapack::batch_error when passing a singular matrix.
Is this behavior of getrf expected or is it an unintended change?

Version

oneMath version: develop branch
Last known good version : v0.6

Environment

Running on PVC ( GPU Max 1100) with the oneAPI base toolkit 2025.0.
OS is Ubuntu 22.04.

sycl-ls

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.31294+9]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8480+ OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO  [24.39.31294]

Steps to reproduce

Building:

cmake .. -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DMKL_ROOT=${MKLROOT} -DBUILD_FUNCTIONAL_TESTS=OFF

# build
cmake --build .

# install
export MATH_INTERFACE_ROOT=$(pwd)/../install
cmake --install . --prefix=${MATH_INTERFACE_ROOT}

I used 2 ENV 
MATH_INTERFACE_ROOT (build develop branch)
MKL_INTERFACE_ROOT(build v0.6)

Compiling:

icpx -fsycl getrf_repro.cpp -o getrf_repro_onemath -I${MATH_INTERFACE_ROOT}/include -L${MATH_INTERFACE_ROOT}/lib -lonemath -Wl,-rpath,${MATH_INTERFACE_ROOT}/lib

icpx -fsycl getrf_repro.cpp -o getrf_repro_onemkl -I${MKL_INTERFACE_ROOT}/include -L${MKL_INTERFACE_ROOT}/lib -lonemkl -Wl,-rpath,${MKL_INTERFACE_ROOT}/lib

icpx -fsycl -I${MKLROOT}/include -o getrf_repro_oneapi getrf_repro.cpp -L${MKLROOT}/lib -lmkl_sycl -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lsycl

Running:

./getrf_repro_onemath
./getrf_repro_onemkl
./getrf_repro_oneapi 

Reproducer here
#include <iostream>
#include <vector>
#include "oneapi/mkl.hpp"
#include <sycl/sycl.hpp>

int main() {

    sycl::queue q(sycl::default_selector_v);

    std::cout << "Device: " << q.get_device().get_info<sycl::info::device::name>() << std::endl;
    std::cout << "Driver: " << q.get_device().get_info<sycl::info::device::driver_version>() << std::endl;

    using T = float;
    std::int64_t n = 2;
    const std::int64_t lda = 2;

    std::vector<T> h_A = {1.0f, 2.0f, 2.0f, 4.0f};
    std::vector<std::int64_t> ipiv(n);
    std::vector<std::int64_t> dev_info(1);

    T* d_A = sycl::malloc_device<T>(n * n, q);
    std::int64_t* d_ipiv = sycl::malloc_device<std::int64_t>(n, q);

    q.memcpy(d_A, h_A.data(), h_A.size() * sizeof(T)).wait();

    const std::int64_t scratchpad_size =
        oneapi::mkl::lapack::getrf_scratchpad_size<T>(q, n, n, lda);
    std::cout << scratchpad_size << std::endl;
    T* scratchpad = sycl::malloc_device<T>(100, q);

    bool is_exception_caught = false;
    std::ostringstream error_msg;
    std::int64_t info = 0;

    try {
        oneapi::mkl::lapack::getrf(q, n, n, d_A, lda, d_ipiv, scratchpad, scratchpad_size).wait();
    } catch (oneapi::mkl::lapack::computation_error const &e) {
        is_exception_caught = false;
        info = e.info();
        dev_info[0] = info;
        std::cout << "Handled oneapi::mkl::lapack::computation_error exception" << std::endl;
    } catch (oneapi::mkl::lapack::exception const &e) {
        is_exception_caught = true;
        info = e.info();
        if (info < 0) {
            error_msg << "Parameter number " << -info << " had an illegal value.";
        } else if (info == scratchpad_size && e.detail() != 0) {
            error_msg << "Insufficient scratchpad size. Required size is at least " << e.detail();
        } else if (info > 0) { // the same logic as for computation_error
            is_exception_caught = false;
            dev_info[0] = info;
            std::cout << "Handled oneapi::mkl::lapack::exception exception" << std::endl;
        } else {
            error_msg << "Unexpected MKL exception caught during getrf() call:\nreason: "
                      << e.what() << "\ninfo: " << e.info();
        }
    } catch (sycl::exception const &e) {
        is_exception_caught = true;
        error_msg << "Unexpected SYCL exception caught during getrf() call:\n" << e.what();
    }

    if (is_exception_caught) {
        sycl::free(scratchpad, q);
        sycl::free(d_A, q);
        sycl::free(d_ipiv, q);
        throw std::runtime_error(error_msg.str());
    }

    std::vector<T> result_A(n * n);
    q.memcpy(result_A.data(), d_A, n * n * sizeof(T)).wait();
    std::cout << "Matrix A after LU decomposition:" << std::endl;
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
            std::cout << result_A[i * n + j] << " ";
        }
        std::cout << std::endl;
    }

    std::vector<std::int64_t> result_ipiv(n);
    q.memcpy(result_ipiv.data(), d_ipiv, n * sizeof(std::int64_t)).wait();
    std::cout << "Pivot indices:" << std::endl;
    for (int i = 0; i < n; ++i) {
        std::cout << result_ipiv[i] << " ";
    }
    std::cout << std::endl;

    std::cout << "dev_info: " << dev_info[0] << std::endl;

    sycl::free(d_A, q);
    sycl::free(d_ipiv, q);
    sycl::free(scratchpad, q);
    return 0;
}

Observed behavior

$ ./getrf_repro_onemath

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
terminate called after throwing an instance of 'oneapi::math::computation_error'
  what():  oneapi::mkl::lapack::getrf: computation error: info = 2
Aborted (core dumped)

Expected behavior

$ ./getrf_repro_onemkl

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
Handled oneapi::mkl::lapack::computation_error exception
Matrix A after LU decomposition:
2 0.5 
4 0 
Pivot indices:
2 2 
dev_info: 2

$ ./getrf_repro_oneapi

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
Handled oneapi::mkl::lapack::computation_error exception
Matrix A after LU decomposition:
2 0.5 
4 0 
Pivot indices:
2 2 
dev_info: 2

vlad-perevezentsev added a commit to IntelPython/dpnp that referenced this issue Mar 7, 2025
This PR suggests adding a temporary workaround to a problem in oneMath
[#642 ](uxlfoundation/oneMath#642) where
exceptions are no longer thrown in `lapack` namespace for `getrf`
function as expected.

In oneMath develop branch `oneapi::mkl::lapack::computation_error` is
not thrown.
Instead, `oneapi::mkl::computation_error` from `mkl` namespace is used
so existing catch block `mkl_lapack::exception` does not handle singular
matrix errors.

A workaround has been added to explicitly catch
`oneapi::mkl::computation_error` and update `dev_info` ensuring that
singular matrices are handled correctly.
github-actions bot added a commit to IntelPython/dpnp that referenced this issue Mar 7, 2025
This PR suggests adding a temporary workaround to a problem in oneMath
[#642 ](uxlfoundation/oneMath#642) where
exceptions are no longer thrown in `lapack` namespace for `getrf`
function as expected.

In oneMath develop branch `oneapi::mkl::lapack::computation_error` is
not thrown.
Instead, `oneapi::mkl::computation_error` from `mkl` namespace is used
so existing catch block `mkl_lapack::exception` does not handle singular
matrix errors.

A workaround has been added to explicitly catch
`oneapi::mkl::computation_error` and update `dev_info` ensuring that
singular matrices are handled correctly. 1b0ce60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant