Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

mhoemmen · 2016-11-09T20:54:36Z

@trilinos/tpetra
Story: #797

See notes on #802. Try to share as much code with that solution as possible. For example, it would make sense to have a single pack function, and let callers decide whether they want to pack PIDs.

Steps:

Remove Teuchos::ArrayView from pack & unpack (see Make Teuchos Memory Management Classes thread-safe #229); use raw pointers and/or Kokkos::View all the way through
Host-only thread parallelization, const graph only
Host+GPU thread parallelization, const graph only

It's much harder to do this for dynamic graph, so we can skip that for now.

Thread parallelization of unpack should be over rows, so we should not need atomic updates when updating values in the matrix.

For the host-only thread parallelization of unpack, that's a single parallel_scan over local (row) indices to get offsets from byte counts of the unpack buffer. In the 'final' pass of the scan, actually unpack the data.

For pack, we first have to change packRow so that it goes directly to the KokkosSparse::CrsMatrix if that exists, rather than going through the "generic" getLocalRowView / getGlobalRowView interfaces that return Teuchos::ArrayView instances.

@trilinos/tpetra The versions of replaceGlobalValues and sumIntoGlobalValues that take "raw" input arrays used to convert them to Teuchos::ArrayView and then call the versions of those methods that take Teuchos::ArrayView. The latter in turn would convert _back_ to raw arrays and then to Kokkos::View. This commit removes the intermediate step of conversion to Teuchos::ArrayView. This is a small step towards #800. Thread-parallelizing CrsMatrix pack and unpack must begin with basic thread safety. Since Teuchos::ArrayView is not thread-safe in debug mode (see #229), and since it will never work on a GPU, the first step is to stop using Teuchos::ArrayView in pack and unpack. Instead, we must use pointers / Kokkos::View objects all the way through.

@trilinos/tpetra Add a new nonpublic method, combineGlobalValuesRaw, to Tpetra::CrsMatrix. This method bypasses Teuchos::ArrayView, and is thus thread safe (see also #229) under the following conditions: 1. The matrix has a static graph 2. The CombineMode argument is ADD or REPLACE We now use this method in unpackRow. This is the first step of thread-parallel unpack (see #800).

@trilinos/tpetra Tpetra::CrsMatrix's unpackRow method no longer uses Teuchos::Array* (which is not thread safe; see #229), under the following conditions: 1. The matrix has a static graph 2. The CombineMode is ADD or REPLACE Thus, under these conditions, CrsMatrix's unpackAndCombine method no longer uses Teuchos::Array* either. This brings us one step closer to thread-parallel CrsMatrix unpack (#800). Build/Test Cases Summary Enabled Packages: TpetraCore, Belos, Zoltan2, Ifpack2, Amesos2, Xpetra, MueLu, Stokhos Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL,Panzer 0) MPI_RELEASE_DEBUG_SHARED_PT => Test case MPI_RELEASE_DEBUG_SHARED_PT was not run! => Does not affect push readiness! (-1.00 min) 1) MPI_DEBUG => passed: passed=485,notpassed=0 (63.01 min) 2) SERIAL_RELEASE => passed: passed=432,notpassed=0 (38.21 min) Other local commits for this build/test group: acd76d8, 71725af

@trilinos/tpetra This is related to #800. See my comments there, in particular "we first have to change packRow so that it goes directly to the KokkosSparse::CrsMatrix if that exists, rather than going through the 'generic' getLocalRowView / getGlobalRowView interfaces that return Teuchos::ArrayView instances." This commit is the first step to accomplish that subgoal. Tpetra::CrsMatrix::pack now uses packRowStatic when the graph is static. Otherwise, it falls back to packRow. This ensures that the new method gets tested.

mhoemmen · 2017-05-30T21:23:49Z

See PR #1321. @tjfulle wrote it and I'm done improving it; just need to test downstream and push.

@mhoemmen

Initial implementation of CrsMatrix threaded unpack. Addresses: trilinos#800 Review: @mhoemmen Test Summary: Fri Jul 14 16:13:28 MDT 2017 Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages Build: Passed (58.86 min) Test: Passed (11.80 min) 100% tests passed, 0 tests failed out of 1475 Label Time Summary: Amesos = 20.83 sec (14 tests) Amesos2 = 10.79 sec (8 tests) Anasazi = 121.33 sec (71 tests) Belos = 110.76 sec (70 tests) Domi = 174.66 sec (125 tests) FEI = 46.87 sec (43 tests) Galeri = 4.77 sec (9 tests) Ifpack = 65.05 sec (53 tests) Ifpack2 = 44.34 sec (33 tests) ML = 49.09 sec (34 tests) MueLu = 311.30 sec (56 tests) NOX = 175.08 sec (106 tests) OptiPack = 6.90 sec (5 tests) Panzer = 316.53 sec (129 tests) Pike = 4.30 sec (7 tests) Piro = 30.97 sec (12 tests) ROL = 1038.91 sec (133 tests) Rythmos = 222.97 sec (83 tests) ShyLU = 8.68 sec (5 tests) Stokhos = 131.40 sec (75 tests) Stratimikos = 42.14 sec (39 tests) Teko = 107.08 sec (19 tests) Tempus = 741.42 sec (9 tests) Thyra = 67.89 sec (80 tests) Tpetra = 159.26 sec (132 tests) TrilinosCouplings = 53.78 sec (19 tests) Xpetra = 51.11 sec (17 tests) Zoltan2 = 141.10 sec (97 tests) Total Test time (real) = 708.18 sec Total time for MPI_RELEASE_DEBUG_SHARED_PT = 70.66 min

@mhoemmen

Addresses: trilinos#800 Review: @mhoemmen

@mhoemmen

Addresses: trilinos#800 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1476,notpassed=0 (36.23 min)

mhoemmen · 2017-07-18T21:28:49Z

See #1503. I will squash those three commits into an atomic unit and run tests.

tjfulle · 2017-07-18T21:31:56Z

@mhoemmen wrote:

See #1503. I will squash those three commits into an atomic unit and run tests.

Good idea - two of the intermediate commits don't build/run on cuda :)

mhoemmen · 2017-07-19T00:09:41Z

Argh, I can't build CUDA RELEASE without nvlink crashing....

@tjfulle

@trilinos/tpetra Addresses: #800 Written by: @tjfulle Review: @mhoemmen @mhoemmen formed this commit by squashing the three commits in PR Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1476,notpassed=0 (36.23 min) Tpetra CUDA tests also pass.

mhoemmen · 2017-07-19T00:10:57Z

OK, I pushed this to develop. Thanks!

@mhoemmen

@trilinos/tpetra, @mhoemmen Comments -------- This commit is a combination of several commits that address a several issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1483,notpassed=0 (79.62 min)

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen Comments -------- This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space.

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 A summary of changes are as follows: - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 Summary ------- - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Build/Test Case Summaries ------------------------- Linux/SEMS, gcc 4.8.3, openmpi 1.8.7 ------------------------------------ Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44 ------------------------------------------ Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

@mhoemmen

…aits @trilinos/tpetra, @mhoemmen This single commit is a rebase of several commits that address the following issues: trilinos#797, trilinos#798, trilinos#800, trilinos#802 Summary ------- - Refactor CrsMatrix pack/unpack procedures to use PackTraits (for both static and dynamic profile matrices) - Refactor packCrsMatrix to pack (optional) PIDs - Remove exists packAndPrepareWithOwningPIDs and instead use the aforementioned packCrsMatrix procedure. - Modify PackTraits run on threads by removing calls to TEUCHOS_TEST_FOR_EXCEPTION and decorating device code with KOKKOS_INLINE_FUNCTION - Ditto for Stokhos' specialization of PackTraits - Modify unpackCrsMatrix row to *not* unpack directly in to the local CrsMatrix but to return the unpacked data. This required allocating enough scratch space in to which data could be unpacked. We used Kokkos::UniqueToken to allocate the scratch space and to grab a unique (to each thread) subview of the scratch space. Build/Test Case Summaries ------------------------- Linux/SEMS, gcc 4.8.3, openmpi 1.8.7 ------------------------------------ Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1487,notpassed=0 (102.26 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1490,notpassed=0 (104.52 min) CUDA, gcc 5.4, openmpi 1.10.2, cuda 8.0.44 ------------------------------------------ Enabled Packages: TpetraCore Disabled all Forward Packages 0) MPI_RELEASE_CUDA => passed: passed=124,notpassed=0

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 94% tests passed, 16 tests failed out of 257 Label Time Summary: MueLu = 1690.79 sec (69 tests) Stokhos = 496.32 sec (63 tests) Tpetra = 404.65 sec (126 tests) The following tests FAILED: 158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed) 159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed) 160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed) 161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed) 162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed) 163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed) 164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed) 165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed) 166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed) 167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed) 168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed) 171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed) 172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed) 173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed) 240 - Stokhos_KokkosCrsMatrixUQPCEUnitTest_Cuda_MPI_1 (Failed) 242 - Stokhos_TpetraCrsMatrixUQPCEUnitTest_Cuda_MPI_4 (Failed) According to @mhoemmen, the Stokhos failures are known failures. All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 94% tests passed, 14 tests failed out of 257 Label Time Summary: MueLu = 1690.79 sec (69 tests) Stokhos = 496.32 sec (63 tests) Tpetra = 404.65 sec (126 tests) The following tests FAILED: 158 - MueLu_Navier2DBlocked_Epetra_MPI_4 (Failed) 159 - MueLu_Navier2DBlocked_xml_format_MPI_4 (Failed) 160 - MueLu_Navier2DBlocked_xml_format2_MPI_4 (Failed) 161 - MueLu_Navier2DBlocked_xml_blockdirect_MPI_4 (Failed) 162 - MueLu_Navier2DBlocked_xml_bgs1_MPI_4 (Failed) 163 - MueLu_Navier2DBlocked_xml_bs1_MPI_4 (Failed) 164 - MueLu_Navier2DBlocked_xml_bs2_MPI_4 (Failed) 165 - MueLu_Navier2DBlocked_xml_sim1_MPI_4 (Failed) 166 - MueLu_Navier2DBlocked_xml_sim2_MPI_4 (Failed) 167 - MueLu_Navier2DBlocked_xml_uzawa1_MPI_4 (Failed) 168 - MueLu_Navier2DBlocked_xml_indef1_MPI_4 (Failed) 171 - MueLu_Navier2DBlocked_BraessSarazin_MPI_4 (Failed) 172 - MueLu_Navier2DBlockedReuseAggs_MPI_4 (Failed) 173 - MueLu_Navier2DBlocked_Simple_MPI_4 (Failed) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) All of the MueLu tests failed with the following error: MueLu::EpetraOperator::Comm(): Cast from Xpetra::CrsMatrix to Xpetra::EpetraCrsMatrix failed These tests can be ignored, see trilinos#1699

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: trilinos#797, trilinos#800, trilinos#802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see trilinos#1699 The failing Stokhos tests mentioned in trilinos#1655 were fixed with commit e97e37b

@mhoemmen

- Moved unpackAndCombineIntoCrsArrays (and friends) from Tpetra_Import_Util2.hpp to Tpetra_Details_unpackCrsMatrixAndCombine_de*.hpp. - unpackAndCombineIntoCrsArrays broken up in to many many smaller functions (it was previously one large monolithic function). Each of the small functions was refactored to be thread parallel. - Race conditions were identified and resolved, mostly by using Kokkos::atomic_fetch_add where appropriate. Addresses: #797, #800, #802 Review: @mhoemmen Tests were run on two different machines and there results amended to this commit: Build/Test Cases Summary [RHEL6, standard checkin script] Enabled Packages: TpetraCore Disabled Packages: PyTrilinos,Claps,TriKota Enabled all Forward Packages 0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=1506,notpassed=0 (19.13 min) 1) MPI_RELEASE_DEBUG_SHARED_OPENMP_PT => passed: passed=1509,notpassed=0 (13.75 min) Build/Test Cases Summary [ride.sandia.gov, CUDA] Enabled Packages: Tpetra,MueLu,Stokhos 0) MPI_RELEASE_SHARED_CUDA => passed=233,notpassed=14 (8.76 min) The 14 failing tests are unrelated MueLu tests that can be ignored, see #1699 The failing Stokhos tests mentioned in #1655 were fixed with commit e97e37b

mhoemmen added task pkg: Tpetra labels Nov 9, 2016

mhoemmen self-assigned this Nov 9, 2016

mhoemmen mentioned this issue Nov 9, 2016

Tpetra: Improve thread scalability of Import/Export #797

Closed

5 tasks

mhoemmen added this to the Tpetra-FY17-Q4 milestone Nov 9, 2016

mhoemmen mentioned this issue Nov 11, 2016

Tpetra: Measure thread scaling of solver kernels, without MPI #823

Closed

mhoemmen pushed a commit that referenced this issue Mar 26, 2017

Tpetra::CrsMatrix::allocatePackSpace: Minor refinements (see #800)

3013f1f

tjfulle mentioned this issue Mar 27, 2017

Tpetra::CrsMatrix: Add unit test for Import/Export with static-graph, post-first-fillComplete source matrix; separate out that case into nonmember functions that can be thread-parallelized in the future #1187

Closed

This was referenced Mar 27, 2017

Tpetra: Make CrsMatrix::transferAndFillComplete do thread-parallel pack & unpack #802

Closed

Tpetra::Details::PackTraits: Change interface to support thread-parallel pack & unpack #798

Closed

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Jul 15, 2017

Tpetra: Initial implementation of threaded CrsMatrix unpack

5236fb2

Addresses: trilinos#800 Review: @mhoemmen

tjfulle mentioned this issue Jul 15, 2017

Tpetra: Initial implementation of threaded CrsMatrix unpack #1503

Closed

mhoemmen assigned tjfulle Jul 18, 2017

mhoemmen added the stage: in progress Work on the issue has started label Jul 18, 2017

mhoemmen closed this as completed Jul 19, 2017

mhoemmen removed the stage: in progress Work on the issue has started label Jul 19, 2017

tjfulle mentioned this issue Aug 3, 2017

Tpetra: Refactor CrsMatrix pack/unpack procedures to use PackTraits #1569

Closed

trilinos-autotester mentioned this issue Apr 28, 2020

MueLu RefMaxwell: Changes to rebalancing #7259

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

mhoemmen commented Nov 9, 2016 •

edited

mhoemmen commented May 30, 2017

mhoemmen commented Jul 18, 2017

tjfulle commented Jul 18, 2017

mhoemmen commented Jul 19, 2017

mhoemmen commented Jul 19, 2017 •

edited

Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

Tpetra: Make CrsMatrix do thread-parallel pack & unpack #800

Comments

mhoemmen commented Nov 9, 2016 • edited

mhoemmen commented May 30, 2017

mhoemmen commented Jul 18, 2017

tjfulle commented Jul 18, 2017

mhoemmen commented Jul 19, 2017

mhoemmen commented Jul 19, 2017 • edited

mhoemmen commented Nov 9, 2016 •

edited

mhoemmen commented Jul 19, 2017 •

edited