Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

mhoemmen · 2016-11-09T21:03:07Z

@trilinos/tpetra
"Superstory": #797

Tpetra::CrsMatrix::transferAndFillComplete implements a specialized pack and unpack for CrsMatrix. Tpetra's sparse matrix-matrix multiply uses this.

Try to share as much code with #800 as possible. See e.g., packRow in Trilinos/packages/tpetra/core/src/Tpetra_Import_Util2.hpp. It would make sense to adapt PackTraits methods for use inside Kokkos::parallel_*. That would call for changes to Stokhos and perhaps also Sacado.

mhoemmen · 2017-07-13T19:58:53Z

Task breakdown, discussed today (13 Jul 2017) with @tjfulle , with questions for @csiefer2 and perhaps @jhux2 if he wishes to help:

Does MueLu only need Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802 (transferAndFillComplete) for sparse matrix-matrix multiply? or will they also need ordinary Import / Export?
Does MueLu need the target graph / matrix to be DynamicProfile, StaticProfile, or fixed structure? That is, does the target graph / matrix need to be able to change its structure, based on incoming data from the source graph / matrix? And if so, is the target matrix StaticProfile or DynamicProfile?
Thread-parallelize pack and unpack with PIDs, not counting graph structure changes
If structure changes needed (see (2) above), get rid of state in CrsGraph blocking thread safety of insert{Local,Global}*. This means nodeNumEntries_ and nodeNumAllocated_. Instead, compute them on demand.
If needed (see (2) above), make StaticProfile structure changes thread safe and scalable.
If needed (see (2) above), make DynamicProfile structure changes thread safe and scalable.

NOTE: If we only care about transferAndFillComplete, then the target graph / matrix needs to be fill complete on return anyway. This means we don't actually need to solve (5) and (6). Instead, we can count the number of entries needed in each row, allocate the final fill-complete data structure (KokkosSparse::CrsMatrix), and copy and unpack into that.

csiefer2 · 2017-07-13T20:52:45Z

TAFC only.
I presume you mean the target of the Import? TAFC assumes that the output matrix will be generated from scratch and fillCompleted on output (aka fixed structure).
This task will collide w/ the reverse comm work by @DrBooom at some point. Coordination will be needed.

Does that answer your questions?

mhoemmen · 2017-07-13T21:22:22Z

Thanks @csiefer2 for the quick reply! :-D

I presume you mean the target of the Import?

Yes, the target graph / matrix of the Import / Export.

TAFC assumes that the output matrix will be generated from scratch and fillCompleted on output (aka fixed structure).

Thanks for reminding us of that distinction -- it means that the target Crs{Graph,Matrix} doesn't even need to exist until TAFC returns it, so there is no such thing as "DynamicProfile" or "StaticProfile" for it. This implies that we can use whatever data structure we want for the target, since users only see the fill complete version of it.

This task will collide w/ the reverse comm work by @DrBooom at some point. Coordination will be needed.

Do you expect interface or pack format changes?

@mhoemmen

@trilinos/tpetra, @mhoemmen Comments -------- This commit is a combination of several commits that address a several issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1483,notpassed=0 (79.62 min)

tjfulle · 2017-08-03T02:33:59Z

The packing portion of TAFC now uses Tpetra::Details::packCrsMatrix and friends, which are thread parallel. It remains to thread parallelize unpackAndCombineCrsArrays, repurposing as much of Tpetra::Details::unpackCrsMatrixAndCombine as possible.

tjfulle · 2017-08-03T02:35:37Z

Adding, #1569 addresses the packing portion

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen Comments -------- This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space.

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 Summary ------- - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Build/Test Case Summaries ------------------------- Linux/SEMS, gcc 4.8.3, openmpi 1.8.7 ------------------------------------ Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44 ------------------------------------------ Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 Summary ------- - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Build/Test Case Summaries ------------------------- Linux/SEMS, gcc 4.8.3, openmpi 1.8.7 ------------------------------------ Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44 ------------------------------------------ Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

mhoemmen · 2017-08-21T20:43:36Z

Fixed in develop. See PR #1569.

tjfulle · 2017-08-21T21:05:29Z

@mhoemmen , it's not totally fixed. The packing portion is, but unpackAndCombineIntoCrsArrays needs to be thread parellelelelized. This part, however, is a much smaller effort because of the work on #798 and #800 (as part of #1569) to make a common pack/unpack interface. I plan on finishing it this week.

mhoemmen · 2017-08-21T21:49:08Z

@tjfulle thanks for the clarification! If you like, you may either open up a new issue for the work left to do, or reopen this issue.

tjfulle · 2017-08-30T19:31:37Z

I've "completed" thread parallelizing unpackAndCombineIntoCrsArrays. Tests pass on my blade with/without OpenMP node and OMP_NUM_THREADS>1. I am building and CUDA now (ride had a system upgrade that wiped my home directory, so I've got to start from scratch). When the build is done and tested I'll open a new PR.

I say "completed" because there are two parts of the algorithm that are not easily thread parallelizable and are still serial. These parts deal with local matrix rows that have contributions from multiple other processors. Thread parallelizing the current algorithm/data structures results in local row quantities being touched and updated by concurrent threads, leading to clashes and failures. I've got to think about whether a parallel_scan might be possible instead of a parallel_for. The other alternatives are to leave as is (the offending parts not thread parallel) or redo some of the data structures to avoid collisions.

mhoemmen · 2017-08-30T23:11:45Z

@tjfulle Awesome!!! :-D btw watch out for potential merge conflicts with my #1088 fix, coming in soon (possibly today).

btw ride saved home directories in /home_old, if I remember right, so you might not have to start over completely from scratch :-) .

@tjfulle wrote:

These parts deal with local matrix rows that have contributions from multiple other processors.

Could you please clarify? As long as the target matrix's structure does not change, then it sounds like you could just use atomic updates to resolve these thread conflicts.

tjfulle · 2017-08-30T23:22:53Z

@mhoemmen wrote:

Could you please clarify? As long as the target matrix's structure does not change, then it sounds like you could just use atomic updates to resolve these thread conflicts.

Perhaps I can clarify, but it would probably be easier to show you. Let me try. The difficulty is that data for a row is packed sequentially as

[data for row N from proc P0][data for row N from proc P1]...[data for row N from proc PN]

After unpacking [data for row N from proc P0] an offset counter is incremented so that on the next pass [data for row N from proc P1] is unpacked. The current offsets for row N are stored in an array and can probably be updated atomically, but I can't say for certain yet.

Did that make any sense?

tjfulle · 2017-08-30T23:23:50Z

@mhoemmen wrote:

btw ride saved home directories in /home_old, if I remember right, so you might not have to start over completely from scratch :-) .

Now you tell me! I'm running in to other CUDA issues now anyway...

mhoemmen · 2017-08-30T23:27:33Z

Perhaps I can clarify, but it would probably be easier to show you.

Could you send me a meeting invite, say for tomorrow afternooon? That might be easier. Thanks!

tjfulle · 2017-08-30T23:31:19Z

@mhoemmen, what, that didn't make sense?

Meeting invite sent

tjfulle · 2017-08-31T00:39:10Z

Just got my CUDA issues resolved. All Tpetra tests pass on CUDA. The score card is all standard CI tests pass on my blade with/without OpenMP and Tpetra tests pass on CUDA. Two more small sections remain to be thread parallelized (referenced earlier). After discussing them with @mhoemmen, I'll get them fixed, tested, and PR opened.

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 94% tests passed, 16 tests failed out of 257 Label Time Summary: MueLu = 1690.79 sec (69 tests) Stokhos = 496.32 sec (63 tests) Tpetra = 404.65 sec (126 tests) The following tests FAILED: 158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed) 159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed) 160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed) 161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed) 162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed) 163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed) 164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed) 165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed) 166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed) 167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed) 168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed) 171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed) 172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed) 173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed) 240 - Stokhos_KokkosCrsMatrixUQPCEUnitTest_Cuda_MPI_1 (Failed) 242 - Stokhos_TpetraCrsMatrixUQPCEUnitTest_Cuda_MPI_4 (Failed) According to @mhoemmen, the Stokhos failures are known failures. All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 94% tests passed, 14 tests failed out of 257 Label Time Summary: MueLu = 1690.79 sec (69 tests) Stokhos = 496.32 sec (63 tests) Tpetra = 404.65 sec (126 tests) The following tests FAILED: 158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed) 159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed) 160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed) 161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed) 162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed) 163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed) 164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed) 165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed) 166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed) 167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed) 168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed) 171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed) 172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed) 173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see trilinos#1699 The failing Stokhos tests mentioned in trilinos#1655 were fixed with commit e97e37b

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: #797, #800, #802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see #1699 The failing Stokhos tests mentioned in #1655 were fixed with commit e97e37b

tjfulle · 2017-09-22T21:41:37Z

@mhoemmen, can this issue be closed? Or, is there more to be done?

mhoemmen · 2017-09-22T22:13:41Z

@tjfulle It's done -- thanks! :-D Great work btw!

mhoemmen added task pkg: Tpetra labels Nov 9, 2016

mhoemmen self-assigned this Nov 9, 2016

mhoemmen added this to the Tpetra-FY17-Q4 milestone Nov 9, 2016

mhoemmen mentioned this issue Nov 9, 2016

Tpetra: Improve thread scalability of Import/Export #797

Closed

5 tasks

This was referenced Mar 27, 2017

Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

Closed

Tpetra::Details::PackTraits: Change interface to support thread-parallel pack & unpack #798

Closed

tjfulle added story The issue corresponds to a Kanban Story (vs. Epic or Task) and removed task labels Jul 13, 2017

mhoemmen mentioned this issue Jul 13, 2017

Tpetra: Improve MPI+X scalability of sparse matrix-matrix multiply and global assembly #796

Closed

tjfulle mentioned this issue Aug 3, 2017

Tpetra: Refactor CrsMatrix pack/unpack procedures to use PackTraits #1569

Closed

mhoemmen closed this as completed Aug 21, 2017

mhoemmen assigned tjfulle Aug 21, 2017

tjfulle reopened this Aug 21, 2017

mhoemmen closed this as completed Sep 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

mhoemmen commented Nov 9, 2016 •

edited

mhoemmen commented Jul 13, 2017 •

edited

csiefer2 commented Jul 13, 2017

mhoemmen commented Jul 13, 2017

tjfulle commented Aug 3, 2017

tjfulle commented Aug 3, 2017

mhoemmen commented Aug 21, 2017

tjfulle commented Aug 21, 2017

mhoemmen commented Aug 21, 2017

tjfulle commented Aug 30, 2017

mhoemmen commented Aug 30, 2017

tjfulle commented Aug 30, 2017

tjfulle commented Aug 30, 2017

mhoemmen commented Aug 30, 2017

tjfulle commented Aug 30, 2017

tjfulle commented Aug 31, 2017

tjfulle commented Sep 22, 2017

mhoemmen commented Sep 22, 2017

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

Comments

mhoemmen commented Nov 9, 2016 • edited

mhoemmen commented Jul 13, 2017 • edited

csiefer2 commented Jul 13, 2017

mhoemmen commented Jul 13, 2017

tjfulle commented Aug 3, 2017

tjfulle commented Aug 3, 2017

mhoemmen commented Aug 21, 2017

tjfulle commented Aug 21, 2017

mhoemmen commented Aug 21, 2017

tjfulle commented Aug 30, 2017

mhoemmen commented Aug 30, 2017

tjfulle commented Aug 30, 2017

tjfulle commented Aug 30, 2017

mhoemmen commented Aug 30, 2017

tjfulle commented Aug 30, 2017

tjfulle commented Aug 31, 2017

tjfulle commented Sep 22, 2017

mhoemmen commented Sep 22, 2017

mhoemmen commented Nov 9, 2016 •

edited

mhoemmen commented Jul 13, 2017 •

edited