Skip to content

[LAPACK] Unexpected exception handling behavior in getrf #642

Closed
@vlad-perevezentsev

Description

@vlad-perevezentsev

Summary

After renaming from oneMKL interfaces to oneMath , there is an issue with exception handling when calling oneapi::mkl::lapack::getrf

In oneMKL (v0.6) exception handling worked correctly. When processing a singular matrix, the function threw oneapi::mkl::lapack::computation_error which allowed to extract info code.
In oneMath (develop branch), the function now throws oneapi::mkl::computation_error

Similarly, if an invalid argument is passed, the expected behavior was to throw oneapi::mkl::lapack::exception but instead the function now throws oneapi::mkl::invalid_argument

According to the documentation getrf should throw oneapi::math::lapack::computation_error.
However, the current implementation does not follow this and oneapi::mkl::computation_error lacks info() method breaking existing exception-handling logic.

At the same time getrf_batch correctly throws oneapi::mkl::lapack::batch_error when passing a singular matrix.
Is this behavior of getrf expected or is it an unintended change?

Version

oneMath version: develop branch
Last known good version : v0.6

Environment

Running on PVC ( GPU Max 1100) with the oneAPI base toolkit 2025.0.
OS is Ubuntu 22.04.

sycl-ls

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.31294+9]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8480+ OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO  [24.39.31294]

Steps to reproduce

Building:

cmake .. -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DMKL_ROOT=${MKLROOT} -DBUILD_FUNCTIONAL_TESTS=OFF

# build
cmake --build .

# install
export MATH_INTERFACE_ROOT=$(pwd)/../install
cmake --install . --prefix=${MATH_INTERFACE_ROOT}

I used 2 ENV 
MATH_INTERFACE_ROOT (build develop branch)
MKL_INTERFACE_ROOT(build v0.6)

Compiling:

icpx -fsycl getrf_repro.cpp -o getrf_repro_onemath -I${MATH_INTERFACE_ROOT}/include -L${MATH_INTERFACE_ROOT}/lib -lonemath -Wl,-rpath,${MATH_INTERFACE_ROOT}/lib

icpx -fsycl getrf_repro.cpp -o getrf_repro_onemkl -I${MKL_INTERFACE_ROOT}/include -L${MKL_INTERFACE_ROOT}/lib -lonemkl -Wl,-rpath,${MKL_INTERFACE_ROOT}/lib

icpx -fsycl -I${MKLROOT}/include -o getrf_repro_oneapi getrf_repro.cpp -L${MKLROOT}/lib -lmkl_sycl -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lsycl

Running:

./getrf_repro_onemath
./getrf_repro_onemkl
./getrf_repro_oneapi 

Reproducer here
#include <iostream>
#include <vector>
#include "oneapi/mkl.hpp"
#include <sycl/sycl.hpp>

int main() {

    sycl::queue q(sycl::default_selector_v);

    std::cout << "Device: " << q.get_device().get_info<sycl::info::device::name>() << std::endl;
    std::cout << "Driver: " << q.get_device().get_info<sycl::info::device::driver_version>() << std::endl;

    using T = float;
    std::int64_t n = 2;
    const std::int64_t lda = 2;

    std::vector<T> h_A = {1.0f, 2.0f, 2.0f, 4.0f};
    std::vector<std::int64_t> ipiv(n);
    std::vector<std::int64_t> dev_info(1);

    T* d_A = sycl::malloc_device<T>(n * n, q);
    std::int64_t* d_ipiv = sycl::malloc_device<std::int64_t>(n, q);

    q.memcpy(d_A, h_A.data(), h_A.size() * sizeof(T)).wait();

    const std::int64_t scratchpad_size =
        oneapi::mkl::lapack::getrf_scratchpad_size<T>(q, n, n, lda);
    std::cout << scratchpad_size << std::endl;
    T* scratchpad = sycl::malloc_device<T>(100, q);

    bool is_exception_caught = false;
    std::ostringstream error_msg;
    std::int64_t info = 0;

    try {
        oneapi::mkl::lapack::getrf(q, n, n, d_A, lda, d_ipiv, scratchpad, scratchpad_size).wait();
    } catch (oneapi::mkl::lapack::computation_error const &e) {
        is_exception_caught = false;
        info = e.info();
        dev_info[0] = info;
        std::cout << "Handled oneapi::mkl::lapack::computation_error exception" << std::endl;
    } catch (oneapi::mkl::lapack::exception const &e) {
        is_exception_caught = true;
        info = e.info();
        if (info < 0) {
            error_msg << "Parameter number " << -info << " had an illegal value.";
        } else if (info == scratchpad_size && e.detail() != 0) {
            error_msg << "Insufficient scratchpad size. Required size is at least " << e.detail();
        } else if (info > 0) { // the same logic as for computation_error
            is_exception_caught = false;
            dev_info[0] = info;
            std::cout << "Handled oneapi::mkl::lapack::exception exception" << std::endl;
        } else {
            error_msg << "Unexpected MKL exception caught during getrf() call:\nreason: "
                      << e.what() << "\ninfo: " << e.info();
        }
    } catch (sycl::exception const &e) {
        is_exception_caught = true;
        error_msg << "Unexpected SYCL exception caught during getrf() call:\n" << e.what();
    }

    if (is_exception_caught) {
        sycl::free(scratchpad, q);
        sycl::free(d_A, q);
        sycl::free(d_ipiv, q);
        throw std::runtime_error(error_msg.str());
    }

    std::vector<T> result_A(n * n);
    q.memcpy(result_A.data(), d_A, n * n * sizeof(T)).wait();
    std::cout << "Matrix A after LU decomposition:" << std::endl;
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
            std::cout << result_A[i * n + j] << " ";
        }
        std::cout << std::endl;
    }

    std::vector<std::int64_t> result_ipiv(n);
    q.memcpy(result_ipiv.data(), d_ipiv, n * sizeof(std::int64_t)).wait();
    std::cout << "Pivot indices:" << std::endl;
    for (int i = 0; i < n; ++i) {
        std::cout << result_ipiv[i] << " ";
    }
    std::cout << std::endl;

    std::cout << "dev_info: " << dev_info[0] << std::endl;

    sycl::free(d_A, q);
    sycl::free(d_ipiv, q);
    sycl::free(scratchpad, q);
    return 0;
}

Observed behavior

$ ./getrf_repro_onemath

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
terminate called after throwing an instance of 'oneapi::math::computation_error'
  what():  oneapi::mkl::lapack::getrf: computation error: info = 2
Aborted (core dumped)

Expected behavior

$ ./getrf_repro_onemkl

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
Handled oneapi::mkl::lapack::computation_error exception
Matrix A after LU decomposition:
2 0.5 
4 0 
Pivot indices:
2 2 
dev_info: 2

$ ./getrf_repro_oneapi

Device: Intel(R) Data Center GPU Max 1100
Driver: 1.6.31294+9
2
Handled oneapi::mkl::lapack::computation_error exception
Matrix A after LU decomposition:
2 0.5 
4 0 
Pivot indices:
2 2 
dev_info: 2

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions