-
Notifications
You must be signed in to change notification settings - Fork 172
Open
Labels
BLAS domainBLAS domain issue/requestBLAS domain issue/requestbugA request to fix an issueA request to fix an issue
Description
Summary
The MKLGPU backend tests can fail when running Trsv on PVC.
Version
Using the tip of develop as of today (6923d40).
Environment
Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04.
apt level-zero package versions:
- level-zero: 1.16.15-881~22.04
- level-zero-dev: 1.16.15-881~22.04
- intel-level-zero-gpu: 1.3.30049.10-950~22.04
Steps to reproduce
cmake -Bbuild-pvc -GNinja -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-pvc
ninja
ctest -R ".*Trsv.*" --output-on-failure
Observed behavior
Full log: log_pvc.txt
The tests are failing with:
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
Abort was called at 287 line in file:
./shared/source/os_interface/linux/drm_neo.cpp
Note the DFT failures are reported in a separate issue: #601
Expected behavior
The tests should pass.
Metadata
Metadata
Assignees
Labels
BLAS domainBLAS domain issue/requestBLAS domain issue/requestbugA request to fix an issueA request to fix an issue