Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

Closed
mhoemmen opened this issue Nov 9, 2016 · 17 comments
Assignees
Labels
pkg: Tpetra story The issue corresponds to a Kanban Story (vs. Epic or Task)

Comments

@mhoemmen
Copy link
Contributor

mhoemmen commented Nov 9, 2016

@trilinos/tpetra
"Superstory": #797

Tpetra::CrsMatrix::transferAndFillComplete implements a specialized pack and unpack for CrsMatrix. Tpetra's sparse matrix-matrix multiply uses this.

Try to share as much code with #800 as possible. See e.g., packRow in Trilinos/packages/tpetra/core/src/Tpetra_Import_Util2.hpp. It would make sense to adapt PackTraits methods for use inside Kokkos::parallel_*. That would call for changes to Stokhos and perhaps also Sacado.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Jul 13, 2017

Task breakdown, discussed today (13 Jul 2017) with @tjfulle , with questions for @csiefer2 and perhaps @jhux2 if he wishes to help:

  1. Does MueLu only need Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802 (transferAndFillComplete) for sparse matrix-matrix multiply? or will they also need ordinary Import / Export?
  2. Does MueLu need the target graph / matrix to be DynamicProfile, StaticProfile, or fixed structure? That is, does the target graph / matrix need to be able to change its structure, based on incoming data from the source graph / matrix? And if so, is the target matrix StaticProfile or DynamicProfile?
  3. Thread-parallelize pack and unpack with PIDs, not counting graph structure changes
  4. If structure changes needed (see (2) above), get rid of state in CrsGraph blocking thread safety of insert{Local,Global}*. This means nodeNumEntries_ and nodeNumAllocated_. Instead, compute them on demand.
  5. If needed (see (2) above), make StaticProfile structure changes thread safe and scalable.
  6. If needed (see (2) above), make DynamicProfile structure changes thread safe and scalable.

NOTE: If we only care about transferAndFillComplete, then the target graph / matrix needs to be fill complete on return anyway. This means we don't actually need to solve (5) and (6). Instead, we can count the number of entries needed in each row, allocate the final fill-complete data structure (KokkosSparse::CrsMatrix), and copy and unpack into that.

@csiefer2
Copy link
Member

  1. TAFC only.
  2. I presume you mean the target of the Import? TAFC assumes that the output matrix will be generated from scratch and fillCompleted on output (aka fixed structure).
  3. This task will collide w/ the reverse comm work by @DrBooom at some point. Coordination will be needed.

Does that answer your questions?

@mhoemmen
Copy link
Contributor Author

Thanks @csiefer2 for the quick reply! :-D

I presume you mean the target of the Import?

Yes, the target graph / matrix of the Import / Export.

TAFC assumes that the output matrix will be generated from scratch and fillCompleted on output (aka fixed structure).

Thanks for reminding us of that distinction -- it means that the target Crs{Graph,Matrix} doesn't even need to exist until TAFC returns it, so there is no such thing as "DynamicProfile" or "StaticProfile" for it. This implies that we can use whatever data structure we want for the target, since users only see the fill complete version of it.

This task will collide w/ the reverse comm work by @DrBooom at some point. Coordination will be needed.

Do you expect interface or pack format changes?

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Aug 2, 2017
@trilinos/tpetra, @mhoemmen

Comments
--------
This commit is a combination of several commits that address a several
issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802

A summary of changes are as follows:

- Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static
  and dynamic profile matrices)
- Refactor packCrsMatrix to pack (optional) PIDs
- Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned
  packCrsMatrix procedure.
- Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION
  and decorating device code with KOKKOS_INLINE_FUNCTION
- Ditto for Stokhos' specialization of PackTraits

Build/Test Cases Summary
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed:
passed=1483,notpassed=0 (79.62 min)
@tjfulle
Copy link
Contributor

tjfulle commented Aug 3, 2017

The packing portion of TAFC now uses Tpetra::Details::packCrsMatrix and friends, which are thread parallel. It remains to thread parallelize unpackAndCombineCrsArrays, repurposing as much of Tpetra::Details::unpackCrsMatrixAndCombine as possible.

@tjfulle
Copy link
Contributor

tjfulle commented Aug 3, 2017

Adding, #1569 addresses the packing portion

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Aug 16, 2017
…aits

@trilinos/tpetra, @mhoemmen

Comments
--------
This single commit is a rebase of several commits that address the following
issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802

A summary of changes are as follows:

- Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static
  and dynamic profile matrices)
- Refactor packCrsMatrix to pack (optional) PIDs
- Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned
  packCrsMatrix procedure.
- Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION
  and decorating device code with KOKKOS_INLINE_FUNCTION
- Ditto for Stokhos' specialization of PackTraits
- Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix
  but to return the unpacked data.  This required allocating enough scratch
  space in to which data could be unpacked.  We used Kokkos::UniqueToken to
  allocate the scratch space and to grab a unique (to each thread) subview of
  the scratch space.
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Aug 16, 2017
…aits

@trilinos/tpetra, @mhoemmen

This single commit is a rebase of several commits that address the following
issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802

A summary of changes are as follows:

- Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static
  and dynamic profile matrices)
- Refactor packCrsMatrix to pack (optional) PIDs
- Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned
  packCrsMatrix procedure.
- Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION
  and decorating device code with KOKKOS_INLINE_FUNCTION
- Ditto for Stokhos' specialization of PackTraits
- Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix
  but to return the unpacked data.  This required allocating enough scratch
  space in to which data could be unpacked.  We used Kokkos::UniqueToken to
  allocate the scratch space and to grab a unique (to each thread) subview of
  the scratch space.

Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages

0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min)

Enabled Packages: TpetraCore
Disabled all Forward Packages

0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Aug 16, 2017
…aits

@trilinos/tpetra, @mhoemmen

This single commit is a rebase of several commits that address the following
issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802

Summary
-------

- Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static
  and dynamic profile matrices)
- Refactor packCrsMatrix to pack (optional) PIDs
- Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned
  packCrsMatrix procedure.
- Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION
  and decorating device code with KOKKOS_INLINE_FUNCTION
- Ditto for Stokhos' specialization of PackTraits
- Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix
  but to return the unpacked data.  This required allocating enough scratch
  space in to which data could be unpacked.  We used Kokkos::UniqueToken to
  allocate the scratch space and to grab a unique (to each thread) subview of
  the scratch space.

Build/Test Case Summaries
-------------------------

Linux/SEMS, gcc 4.8.3, openmpi 1.8.7
------------------------------------

Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages

0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min)

CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44
------------------------------------------

Enabled Packages: TpetraCore
Disabled all Forward Packages

0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0
mhoemmen pushed a commit to mhoemmen/Trilinos that referenced this issue Aug 21, 2017
…aits

@trilinos/tpetra, @mhoemmen

This single commit is a rebase of several commits that address the following
issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802

Summary
-------

- Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static
  and dynamic profile matrices)
- Refactor packCrsMatrix to pack (optional) PIDs
- Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned
  packCrsMatrix procedure.
- Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION
  and decorating device code with KOKKOS_INLINE_FUNCTION
- Ditto for Stokhos' specialization of PackTraits
- Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix
  but to return the unpacked data.  This required allocating enough scratch
  space in to which data could be unpacked.  We used Kokkos::UniqueToken to
  allocate the scratch space and to grab a unique (to each thread) subview of
  the scratch space.

Build/Test Case Summaries
-------------------------

Linux/SEMS, gcc 4.8.3, openmpi 1.8.7
------------------------------------

Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages

0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min)

CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44
------------------------------------------

Enabled Packages: TpetraCore
Disabled all Forward Packages

0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0
@mhoemmen
Copy link
Contributor Author

Fixed in develop. See PR #1569.

@tjfulle
Copy link
Contributor

tjfulle commented Aug 21, 2017

@mhoemmen , it's not totally fixed. The packing portion is, but unpackAndCombineIntoCrsArrays needs to be thread parellelelelized. This part, however, is a much smaller effort because of the work on #798 and #800 (as part of #1569) to make a common pack/unpack interface. I plan on finishing it this week.

@mhoemmen
Copy link
Contributor Author

@tjfulle thanks for the clarification! If you like, you may either open up a new issue for the work left to do, or reopen this issue.

@tjfulle tjfulle reopened this Aug 21, 2017
@tjfulle
Copy link
Contributor

tjfulle commented Aug 30, 2017

I've "completed" thread parallelizing unpackAndCombineIntoCrsArrays. Tests pass on my blade with/without OpenMP node and OMP_NUM_THREADS>1. I am building and CUDA now (ride had a system upgrade that wiped my home directory, so I've got to start from scratch). When the build is done and tested I'll open a new PR.

I say "completed" because there are two parts of the algorithm that are not easily thread parallelizable and are still serial. These parts deal with local matrix rows that have contributions from multiple other processors. Thread parallelizing the current algorithm/data structures results in local row quantities being touched and updated by concurrent threads, leading to clashes and failures. I've got to think about whether a parallel_scan might be possible instead of a parallel_for. The other alternatives are to leave as is (the offending parts not thread parallel) or redo some of the data structures to avoid collisions.

@mhoemmen
Copy link
Contributor Author

@tjfulle Awesome!!! :-D btw watch out for potential merge conflicts with my #1088 fix, coming in soon (possibly today).

btw ride saved home directories in /home_old, if I remember right, so you might not have to start over completely from scratch :-) .

@tjfulle wrote:

These parts deal with local matrix rows that have contributions from multiple other processors.

Could you please clarify? As long as the target matrix's structure does not change, then it sounds like you could just use atomic updates to resolve these thread conflicts.

@tjfulle
Copy link
Contributor

tjfulle commented Aug 30, 2017

@mhoemmen wrote:

Could you please clarify? As long as the target matrix's structure does not change, then it sounds like you could just use atomic updates to resolve these thread conflicts.

Perhaps I can clarify, but it would probably be easier to show you. Let me try. The difficulty is that data for a row is packed sequentially as

[data for row N from proc P0][data for row N from proc P1]...[data for row N from proc PN]

After unpacking [data for row N from proc P0] an offset counter is incremented so that on the next pass [data for row N from proc P1] is unpacked. The current offsets for row N are stored in an array and can probably be updated atomically, but I can't say for certain yet.

Did that make any sense?

@tjfulle
Copy link
Contributor

tjfulle commented Aug 30, 2017

@mhoemmen wrote:

btw ride saved home directories in /home_old, if I remember right, so you might not have to start over completely from scratch :-) .

Now you tell me! I'm running in to other CUDA issues now anyway...

@mhoemmen
Copy link
Contributor Author

Perhaps I can clarify, but it would probably be easier to show you.

Could you send me a meeting invite, say for tomorrow afternooon? That might be easier. Thanks!

@tjfulle
Copy link
Contributor

tjfulle commented Aug 30, 2017

@mhoemmen, what, that didn't make sense?

Meeting invite sent

@tjfulle
Copy link
Contributor

tjfulle commented Aug 31, 2017

Just got my CUDA issues resolved. All Tpetra tests pass on CUDA. The score card is all standard CI tests pass on my blade with/without OpenMP and Tpetra tests pass on CUDA. Two more small sections remain to be thread parallelized (referenced earlier). After discussing them with @mhoemmen, I'll get them fixed, tested, and PR opened.

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Sep 6, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp
  to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp.
- unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it
  was previously one large monolithic function).  Each of the small functions
  was refactored to be thread parallel.
- Race conditions were identified and resolved, mostly by using
  Kokkos::atomic_fetch_add where appropriate.

Addresses: trilinos#797, trilinos#800, trilinos#802
Review: @mhoemmen

Build/Test Cases Summary
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min)

Build/Test Cases Summary [CUDA]
Enabled Packages: Tpetra,MueLu,Stokhos
94% tests passed, 16 tests failed out of 257

Label Time Summary:
MueLu      = 1690.79 sec (69 tests)
Stokhos    = 496.32 sec (63 tests)
Tpetra     = 404.65 sec (126 tests)

The following tests FAILED:
158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed)
159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed)
160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed)
161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed)
162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed)
163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed)
164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed)
165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed)
166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed)
167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed)
168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed)
171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed)
172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed)
173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed)
240 - Stokhos_KokkosCrsMatrixUQPCEUnitTest_Cuda_MPI_1 (Failed)
242 - Stokhos_TpetraCrsMatrixUQPCEUnitTest_Cuda_MPI_4 (Failed)

According to @mhoemmen, the Stokhos failures are known failures.

All of the MueLu tests failed with the following error:

MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Sep 7, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp
  to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp.
- unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it
  was previously one large monolithic function).  Each of the small functions
  was refactored to be thread parallel.
- Race conditions were identified and resolved, mostly by using
  Kokkos::atomic_fetch_add where appropriate.

Addresses: trilinos#797, trilinos#800, trilinos#802
Review: @mhoemmen

Build/Test Cases Summary
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min)

Build/Test Cases Summary [CUDA]
Enabled Packages: Tpetra,MueLu,Stokhos
94% tests passed, 14 tests failed out of 257

Label Time Summary:
MueLu      = 1690.79 sec (69 tests)
Stokhos    = 496.32 sec (63 tests)
Tpetra     = 404.65 sec (126 tests)

The following tests FAILED:
158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed)
159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed)
160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed)
161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed)
162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed)
163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed)
164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed)
165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed)
166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed)
167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed)
168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed)
171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed)
172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed)
173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed)

All of the MueLu tests failed with the following error:

MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed

These tests can be ignored, see trilinos#1699
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp
  to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp.
- unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it
  was previously one large monolithic function).  Each of the small functions
  was refactored to be thread parallel.
- Race conditions were identified and resolved, mostly by using
  Kokkos::atomic_fetch_add where appropriate.

Addresses: trilinos#797, trilinos#800, trilinos#802
Review: @mhoemmen

Build/Test Cases Summary
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min)

All of the MueLu tests failed with the following error:

MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed

These tests can be ignored, see trilinos#1699
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp
  to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp.
- unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it
  was previously one large monolithic function).  Each of the small functions
  was refactored to be thread parallel.
- Race conditions were identified and resolved, mostly by using
  Kokkos::atomic_fetch_add where appropriate.

Addresses: trilinos#797, trilinos#800, trilinos#802
Review: @mhoemmen

Tests were run on two different machines and there results amended to this
commit:

Build/Test Cases Summary [RHEL6, standard checkin script]
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min)

Build/Test Cases Summary [ride.sandia.gov, CUDA]
Enabled Packages: Tpetra,MueLu,Stokhos
0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min)

The 14 failing tests are unrelated MueLu tests that can be ignored, see trilinos#1699

The failing Stokhos tests mentioned in trilinos#1655 were fixed with
commit e97e37b
mhoemmen pushed a commit that referenced this issue Sep 8, 2017
- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp
  to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp.
- unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it
  was previously one large monolithic function).  Each of the small functions
  was refactored to be thread parallel.
- Race conditions were identified and resolved, mostly by using
  Kokkos::atomic_fetch_add where appropriate.

Addresses: #797, #800, #802
Review: @mhoemmen

Tests were run on two different machines and there results amended to this
commit:

Build/Test Cases Summary [RHEL6, standard checkin script]
Enabled Packages: TpetraCore
Disabled Packages: PyTrilinos,Claps,TriKota
Enabled all Forward Packages
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min)
1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min)

Build/Test Cases Summary [ride.sandia.gov, CUDA]
Enabled Packages: Tpetra,MueLu,Stokhos
0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min)

The 14 failing tests are unrelated MueLu tests that can be ignored, see #1699

The failing Stokhos tests mentioned in #1655 were fixed with
commit e97e37b
@tjfulle
Copy link
Contributor

tjfulle commented Sep 22, 2017

@mhoemmen, can this issue be closed? Or, is there more to be done?

@mhoemmen
Copy link
Contributor Author

@tjfulle It's done -- thanks! :-D Great work btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra story The issue corresponds to a Kanban Story (vs. Epic or Task)
Projects
None yet
Development

No branches or pull requests

3 participants