Add MPI support for conforming `P4estMesh`es in 2D #977

lchristm · 2021-11-12T16:41:57Z

This PR adds containers and solver functions for running simulations on a conforming 2D P4estMesh in parallel. The P4estMesh has been slightly adapted so that parallel and serial P4estMeshes can be distinguished which allows for elixirs using the P4estMesh to "just work" when executed in parallel, similar to the TreeMesh implementation.

There is a new container for interfaces shared by two MPI domains (MPI interfaces). Furthermore, there are new functions for initializing the parallel and serial containers for parallel runs (see containers_parallel.jl and containers_parallel_2d.jl). Similar to the TreeMesh implementation, there is also a new cache for holding data structures related to MPI communication (see dg_parallel.jl).

Todo:

Fix MPI neighbor connectivity initialization
Implement parallel restart file loading properly
Test if solution files from parallel runs can be processed properly with Trixi2Vtk
Parallel tests in CI (might require Compile p4est in P4est_jll.jl with MPI support P4est.jl#4 to be resolved first)

If the p4est library does not support MPI, it is not allowed to pass an MPI communicator when creating a new p4est object. If it does support MPI, an MPI communicator has to be passed even if only one process is used.

This enables adding methods specialized on the parallel type for MPI runs.

Adds a new container for the interfaces at the MPI domain boundaries and methods for initializing such container and adds specialized versions for some initialization methods for the existing containers for parallel runs (e.g. to distinguish between regular and mpi interfaces).

codecov · 2021-12-09T16:47:01Z

Codecov Report

Merging #977 (1981be8) into main (90c4bc8) will decrease coverage by 1.50%.
The diff coverage is 7.53%.

❗ Current head 1981be8 differs from pull request most recent head 096b0a3. Consider uploading reports for the commit 096b0a3 to get more accurate results

@@            Coverage Diff             @@
##             main     #977      +/-   ##
==========================================
- Coverage   93.68%   92.18%   -1.50%     
==========================================
  Files         294      295       +1     
  Lines       22606    22995     +389     
==========================================
+ Hits        21177    21196      +19     
- Misses       1429     1799     +370

Flag	Coverage Δ
unittests	`92.18% <7.53%> (-1.50%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/callbacks_step/analysis_dg2d_parallel.jl	`48.94% <0.00%> (-51.06%)`	⬇️
src/callbacks_step/save_solution_dg.jl	`95.89% <ø> (ø)`
src/callbacks_step/stepsize_dg2d.jl	`87.69% <0.00%> (-12.31%)`	⬇️
src/solvers/dgsem_p4est/containers.jl	`93.38% <ø> (ø)`
src/solvers/dgsem_p4est/containers_parallel.jl	`0.00% <0.00%> (ø)`
src/solvers/dgsem_p4est/containers_parallel_2d.jl	`0.00% <0.00%> (ø)`
src/solvers/dgsem_p4est/dg.jl	`93.75% <ø> (ø)`
src/solvers/dgsem_p4est/dg_2d_parallel.jl	`0.00% <0.00%> (ø)`
src/solvers/dgsem_p4est/dg_parallel.jl	`0.00% <0.00%> (ø)`
src/meshes/mesh_io.jl	`82.40% <30.61%> (-13.43%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90c4bc8...096b0a3. Read the comment docs.

sloede

Great work! I have left a few comments and questions, but nothing struck me as problematic!

src/auxiliary/p4est.jl

src/solvers/dgsem_unstructured/sort_boundary_conditions.jl

src/solvers/dgsem_p4est/containers.jl

src/solvers/dgsem_p4est/containers_parallel.jl

src/solvers/dgsem_p4est/dg_parallel.jl

src/solvers/dgsem_p4est/dg_2d_parallel.jl

This avoids a few unnecessary "where {IsParallel}" and makes the code more readable overall.

- New P4est.jl version supports MPI, hence enable `ParallelP4estMesh` tests - `p4est_has_mpi()` function no longer required as it's not built into P4est.jl

sloede

LGTM! Are there any other changes required or can we remove the WIP and make this ready to be merged?

lchristm · 2022-02-04T12:25:16Z

From my side this is ready once all checks passed.

ranocha · 2022-02-04T13:06:16Z

Nice work, @lchristm 👍

Out of curiosity: Did you run some benchmarks of MPI vs. multithreading?

efaulhaber

I haven't looked into everything in detail, but what I've seen looks very good. Well done!
I'm also curious about performance, although it's only 2D yet.

lchristm · 2022-02-07T07:30:49Z

@ranocha, @efaulhaber I didn't do any extensive benchmarking but I did save some numbers when I checked the correctness of my implementation. The following numbers were obtained by running a modified conforming version of elixir_euler_source_terms_nonconforming_unstructured_flag.jl with initial_refinement_level=4.

Multithreading with 6 threads:

────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              18.3s /  69.2%            192MiB /  98.3%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       5.98k    11.2s   88.2%  1.87ms   18.6MiB    9.9%  3.19KiB
   interface flux           5.98k    3.05s   24.1%   510μs   2.83MiB    1.5%     496B
   volume integral          5.98k    2.88s   22.8%   482μs   1.73MiB    0.9%     304B
   source terms             5.98k    2.81s   22.2%   470μs   3.83MiB    2.0%     672B
   prolong2interfaces       5.98k    782ms    6.2%   131μs   1.37MiB    0.7%     240B
   surface integral         5.98k    574ms    4.5%  96.0μs   1.73MiB    0.9%     304B
   reset ∂u/∂t              5.98k    567ms    4.5%  94.9μs   5.94KiB    0.0%    1.02B
   Jacobian                 5.98k    222ms    1.8%  37.2μs   1.55MiB    0.8%     272B
   boundary flux            5.98k    125ms    1.0%  20.9μs   2.10MiB    1.1%     368B
   ~rhs!~                   5.98k    113ms    0.9%  18.9μs   2.12MiB    1.1%     371B
   prolong2boundaries       5.98k   40.2ms    0.3%  6.73μs   1.37MiB    0.7%     240B
   mortar flux              5.98k   2.00ms    0.0%   334ns     0.00B    0.0%    0.00B
   prolong2mortars          5.98k   1.71ms    0.0%   286ns     0.00B    0.0%    0.00B
 calculate dt               1.20k    1.13s    8.9%   941μs    169KiB    0.1%     144B
 analyze solution              13    191ms    1.5%  14.7ms   16.1MiB    8.5%  1.24MiB
 I/O                           27    184ms    1.5%  6.83ms    153MiB   81.5%  5.68MiB
   save solution               13    128ms    1.0%  9.83ms    117MiB   62.2%  9.01MiB
   ~I/O~                       27   55.9ms    0.4%  2.07ms   36.1MiB   19.2%  1.34MiB
   get element variables       13    719μs    0.0%  55.3μs   35.8KiB    0.0%  2.75KiB
   save mesh                   13   5.24μs    0.0%   403ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

MPI with 6 processes:

────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              14.2s /  91.2%            123MiB /  99.4%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       5.98k    11.5s   88.5%  1.92ms   12.9MiB   10.5%  2.21KiB
   interface flux           5.98k    2.85s   21.9%   476μs     0.00B    0.0%    0.00B
   volume integral          5.98k    2.31s   17.8%   385μs     0.00B    0.0%    0.00B
   source terms             5.98k    2.24s   17.3%   375μs   2.19MiB    1.8%     384B
   finish MPI send          5.98k    1.08s    8.3%   181μs   1.64MiB    1.3%     288B
   surface integral         5.98k    751ms    5.8%   126μs     0.00B    0.0%    0.00B
   prolong2interfaces       5.98k    750ms    5.8%   125μs     0.00B    0.0%    0.00B
   finish MPI receive       5.98k    618ms    4.8%   103μs   3.19MiB    2.6%     560B
   Jacobian                 5.98k    212ms    1.6%  35.4μs     0.00B    0.0%    0.00B
   ~rhs!~                   5.98k    164ms    1.3%  27.4μs   2.21MiB    1.8%     387B
   MPI interface flux       5.98k    155ms    1.2%  25.9μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              5.98k    137ms    1.1%  22.9μs     0.00B    0.0%    0.00B
   start MPI send           5.98k    110ms    0.8%  18.4μs   2.19MiB    1.8%     384B
   start MPI receive        5.98k   56.2ms    0.4%  9.40μs   1.46MiB    1.2%     256B
   prolong2mpiinterfaces    5.98k   41.5ms    0.3%  6.94μs     0.00B    0.0%    0.00B
   prolong2mortars          5.98k   5.07ms    0.0%   848ns     0.00B    0.0%    0.00B
   boundary flux            5.98k   4.59ms    0.0%   767ns     0.00B    0.0%    0.00B
   prolong2boundaries       5.98k   3.28ms    0.0%   548ns     0.00B    0.0%    0.00B
   mortar flux              5.98k   1.92ms    0.0%   320ns     0.00B    0.0%    0.00B
 calculate dt               1.20k    1.03s    7.9%   857μs   37.4KiB    0.0%    32.0B
 I/O                           27    392ms    3.0%  14.5ms    107MiB   87.1%  3.94MiB
   save solution               13    229ms    1.8%  17.6ms   64.3MiB   52.6%  4.95MiB
   ~I/O~                       27    162ms    1.2%  6.00ms   42.1MiB   34.5%  1.56MiB
   get element variables       13    411μs    0.0%  31.6μs   37.0KiB    0.0%  2.84KiB
   save mesh                   13   4.10μs    0.0%   315ns     0.00B    0.0%    0.00B
 analyze solution              13   77.1ms    0.6%  5.93ms   2.88MiB    2.4%   227KiB
 ────────────────────────────────────────────────────────────────────────────────────

ranocha · 2022-02-07T07:43:44Z

Thanks! Did you use multi-threaded or serial RK methods?

sloede · 2022-02-07T10:19:53Z

finish MPI send          5.98k    1.08s    8.3%   181μs   1.64MiB    1.3%     288B
[...]
finish MPI receive       5.98k    618ms    4.8%   103μs   3.19MiB    2.6%     560B

I forgot - does this only time the MPI operation or also the unpacking of buffers? Especially the finish MPI send number, however, is so large that it seems clear that there is room for improved load balancing (which was not part of this PR, so everything is fine here!)

lchristm · 2022-02-07T15:02:05Z

Thanks! Did you use multi-threaded or serial RK methods?

Serial. I think that's where difference in runtime is coming from.

I forgot - does this only time the MPI operation or also the unpacking of buffers?

It includes unpacking the buffers.

lchristm added 10 commits November 4, 2021 13:32

Check if p4est supports MPI

10c1467

If the p4est library does not support MPI, it is not allowed to pass an MPI communicator when creating a new p4est object. If it does support MPI, an MPI communicator has to be passed even if only one process is used.

Distinguish between serial and parallel P4estMeshes

a96a3a4

This enables adding methods specialized on the parallel type for MPI runs.

Add parallel solver functions

ebd47a4

Minor cleanup, hacky fix for unstructed BCs

134ca91

WIP mesh, solution and restart files for parallel P4estMeshes

1442eab

Merge remote-tracking branch 'upstream/main' into parallel-p4est

32a73dc

fix mesh I/O

0971023

unsafe_load. -> unsafe_wrap

aeb0eb4

Merge branch 'main' into parallel-p4est

1cb5ded

lchristm added 3 commits December 10, 2021 10:32

Finish parallel I/O for P4estMesh

6d8c58e

Add parallel p4est tests

b71147a

Merge branch 'main' into parallel-p4est

5bdb370

lchristm marked this pull request as ready for review December 10, 2021 11:34

lchristm requested review from sloede and efaulhaber December 10, 2021 11:34

Merge branch 'main' into parallel-p4est

33a99ea

sloede requested changes Jan 27, 2022

View reviewed changes

This was referenced Jan 28, 2022

Discuss names of serial/parallel mesh types #1046

Open

Adapt to P4est_jll v2.8 with MPI support trixi-framework/P4est.jl#45

Merged

lchristm added 7 commits January 31, 2022 09:23

rename p4est_new_ghost_layer -> new_ghost_layer_p4est

c34b6ff

Rename containers_2d_parallel.jl -> containers_parallel_2d.jl

36f6bd3

Create unified ncells function for P4estMesh

fd5b1f7

Switch order of RealT and IsParallel params in P4estMesh

97457c6

This avoids a few unnecessary "where {IsParallel}" and makes the code more readable overall.

TODO comments

1981be8

Merge remote-tracking branch 'upstream/main' into parallel-p4est

8badb5a

Use P4est.jl v0.3.0

096b0a3

- New P4est.jl version supports MPI, hence enable `ParallelP4estMesh` tests - `p4est_has_mpi()` function no longer required as it's not built into P4est.jl

sloede approved these changes Feb 4, 2022

View reviewed changes

lchristm changed the title ~~WIP: Add MPI support for conforming P4estMeshes in 2D~~ Add MPI support for conforming P4estMeshes in 2D Feb 4, 2022

ranocha approved these changes Feb 4, 2022

View reviewed changes

ranocha enabled auto-merge (squash) February 4, 2022 13:02

efaulhaber approved these changes Feb 4, 2022

View reviewed changes

ranocha disabled auto-merge February 4, 2022 14:42

ranocha merged commit a174184 into trixi-framework:main Feb 4, 2022

lchristm deleted the parallel-p4est branch February 4, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPI support for conforming `P4estMesh`es in 2D #977

Add MPI support for conforming `P4estMesh`es in 2D #977

lchristm commented Nov 12, 2021 •

edited

Loading

codecov bot commented Dec 9, 2021 •

edited

Loading

sloede left a comment

sloede left a comment

lchristm commented Feb 4, 2022

ranocha commented Feb 4, 2022

efaulhaber left a comment

lchristm commented Feb 7, 2022

ranocha commented Feb 7, 2022

sloede commented Feb 7, 2022

lchristm commented Feb 7, 2022 •

edited

Loading

Add MPI support for conforming P4estMeshes in 2D #977

Add MPI support for conforming P4estMeshes in 2D #977

Conversation

lchristm commented Nov 12, 2021 • edited Loading

codecov bot commented Dec 9, 2021 • edited Loading

Codecov Report

sloede left a comment

Choose a reason for hiding this comment

sloede left a comment

Choose a reason for hiding this comment

lchristm commented Feb 4, 2022

ranocha commented Feb 4, 2022

efaulhaber left a comment

Choose a reason for hiding this comment

lchristm commented Feb 7, 2022

ranocha commented Feb 7, 2022

sloede commented Feb 7, 2022

lchristm commented Feb 7, 2022 • edited Loading

Add MPI support for conforming `P4estMesh`es in 2D #977

Add MPI support for conforming `P4estMesh`es in 2D #977

lchristm commented Nov 12, 2021 •

edited

Loading

codecov bot commented Dec 9, 2021 •

edited

Loading

lchristm commented Feb 7, 2022 •

edited

Loading