Skip to content

[BLAS][MKLGPU] Trsv tests can fail on PVC #600

@Rbiessy

Description

@Rbiessy

Summary

The MKLGPU backend tests can fail when running Trsv on PVC.

Version

Using the tip of develop as of today (6923d40).

Environment

Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04.
apt level-zero package versions:

  • level-zero: 1.16.15-881~22.04
  • level-zero-dev: 1.16.15-881~22.04
  • intel-level-zero-gpu: 1.3.30049.10-950~22.04

Steps to reproduce

cmake -Bbuild-pvc -GNinja -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-pvc
ninja
ctest -R ".*Trsv.*" --output-on-failure

Observed behavior

Full log: log_pvc.txt
The tests are failing with:

FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
FATAL: Unexpected page fault from GPU at 0x7fa3dc0df000, ctx_id: 1 (CCS) type: 0 (NotPresent), level: 0 (PTE), access: 0 (Read), banned: 1, aborting.
Abort was called at 287 line in file:
./shared/source/os_interface/linux/drm_neo.cpp

Note the DFT failures are reported in a separate issue: #601

Expected behavior

The tests should pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BLAS domainBLAS domain issue/requestbugA request to fix an issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions