-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MueLu: Test failures on CUDA #1699
Labels
Projects
Comments
jhux2
added
pkg: MueLu
impacting: tests
The defect (bug) is primarily a test failure (vs. a build failure)
labels
Sep 6, 2017
@tjfulle These have been failing for sometime now. If they are blocking your checkins or testing, it is safe to disable/ignore them. |
Thanks @jhux2! I figured the failure was not due to my work, but thought I'd post them to be sure. |
tjfulle
added a commit
to tjfulle/Trilinos
that referenced
this issue
Sep 7, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 94% tests passed, 14 tests failed out of 257 Label Time Summary: MueLu = 1690.79 sec (69 tests) Stokhos = 496.32 sec (63 tests) Tpetra = 404.65 sec (126 tests) The following tests FAILED: 158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed) 159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed) 160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed) 161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed) 162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed) 163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed) 164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed) 165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed) 166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed) 167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed) 168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed) 171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed) 172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed) 173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699
I can reproduce the problem on geminga. |
Ok, found the problem and fixed it. Will push it tomorrow. |
tjfulle
added a commit
to tjfulle/Trilinos
that referenced
this issue
Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699
tjfulle
added a commit
to tjfulle/Trilinos
that referenced
this issue
Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see trilinos#1699 The failing Stokhos tests mentioned in trilinos#1655 were fixed with commit e97e37b
mhoemmen
pushed a commit
that referenced
this issue
Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: #797, #800, #802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see #1699 The failing Stokhos tests mentioned in #1655 were fixed with commit e97e37b
tawiesn
added a commit
that referenced
this issue
Sep 8, 2017
…t,serial) This fixes issue #1699 Build/Test Cases Summary Enabled Packages: MueLu Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=446,notpassed=0 (146.33 min)
Closing, as the code base has changed significantly. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
@trilinos/muelu
The following tests fail on CUDA (
ride.sandia.gov
, gcc 5.4, openmpi 1.10.4, cuda 8.0.44):All of the MueLu tests failed with the following error:
The text was updated successfully, but these errors were encountered: