Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MPI support for conforming P4estMeshes in 2D #977

Merged
merged 21 commits into from Feb 4, 2022
Merged

Add MPI support for conforming P4estMeshes in 2D #977

merged 21 commits into from Feb 4, 2022

Conversation

lchristm
Copy link
Member

@lchristm lchristm commented Nov 12, 2021

This PR adds containers and solver functions for running simulations on a conforming 2D P4estMesh in parallel. The P4estMesh has been slightly adapted so that parallel and serial P4estMeshes can be distinguished which allows for elixirs using the P4estMesh to "just work" when executed in parallel, similar to the TreeMesh implementation.

There is a new container for interfaces shared by two MPI domains (MPI interfaces). Furthermore, there are new functions for initializing the parallel and serial containers for parallel runs (see containers_parallel.jl and containers_parallel_2d.jl). Similar to the TreeMesh implementation, there is also a new cache for holding data structures related to MPI communication (see dg_parallel.jl).

Todo:

  • Fix MPI neighbor connectivity initialization
  • Implement parallel restart file loading properly
  • Test if solution files from parallel runs can be processed properly with Trixi2Vtk
  • Parallel tests in CI (might require Compile p4est in P4est_jll.jl with MPI support P4est.jl#4 to be resolved first)

If the p4est library does not support MPI, it is not allowed to
pass an MPI communicator when creating a new p4est object. If it does
support MPI, an MPI communicator has to be passed even if only one
process is used.
This enables adding methods specialized on the parallel type for MPI runs.
Adds a new container for the interfaces at the MPI domain boundaries
and methods for initializing such container and adds specialized
versions for some initialization methods for the existing containers
for parallel runs (e.g. to distinguish between regular and mpi interfaces).
@codecov
Copy link

codecov bot commented Dec 9, 2021

Codecov Report

Merging #977 (1981be8) into main (90c4bc8) will decrease coverage by 1.50%.
The diff coverage is 7.53%.

❗ Current head 1981be8 differs from pull request most recent head 096b0a3. Consider uploading reports for the commit 096b0a3 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #977      +/-   ##
==========================================
- Coverage   93.68%   92.18%   -1.50%     
==========================================
  Files         294      295       +1     
  Lines       22606    22995     +389     
==========================================
+ Hits        21177    21196      +19     
- Misses       1429     1799     +370     
Flag Coverage Δ
unittests 92.18% <7.53%> (-1.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/callbacks_step/analysis_dg2d_parallel.jl 48.94% <0.00%> (-51.06%) ⬇️
src/callbacks_step/save_solution_dg.jl 95.89% <ø> (ø)
src/callbacks_step/stepsize_dg2d.jl 87.69% <0.00%> (-12.31%) ⬇️
src/solvers/dgsem_p4est/containers.jl 93.38% <ø> (ø)
src/solvers/dgsem_p4est/containers_parallel.jl 0.00% <0.00%> (ø)
src/solvers/dgsem_p4est/containers_parallel_2d.jl 0.00% <0.00%> (ø)
src/solvers/dgsem_p4est/dg.jl 93.75% <ø> (ø)
src/solvers/dgsem_p4est/dg_2d_parallel.jl 0.00% <0.00%> (ø)
src/solvers/dgsem_p4est/dg_parallel.jl 0.00% <0.00%> (ø)
src/meshes/mesh_io.jl 82.40% <30.61%> (-13.43%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90c4bc8...096b0a3. Read the comment docs.

@lchristm lchristm marked this pull request as ready for review December 10, 2021 11:34
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I have left a few comments and questions, but nothing struck me as problematic!

src/auxiliary/p4est.jl Outdated Show resolved Hide resolved
src/auxiliary/p4est.jl Outdated Show resolved Hide resolved
src/auxiliary/p4est.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_p4est/containers.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_parallel.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_parallel.jl Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_parallel.jl Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_2d_parallel.jl Show resolved Hide resolved
This avoids a few unnecessary "where {IsParallel}" and makes the code
more readable overall.
- New P4est.jl version supports MPI, hence enable `ParallelP4estMesh` tests
- `p4est_has_mpi()` function no longer required as it's not built into
  P4est.jl
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Are there any other changes required or can we remove the WIP and make this ready to be merged?

@lchristm lchristm changed the title WIP: Add MPI support for conforming P4estMeshes in 2D Add MPI support for conforming P4estMeshes in 2D Feb 4, 2022
@lchristm
Copy link
Member Author

lchristm commented Feb 4, 2022

From my side this is ready once all checks passed.

@ranocha ranocha enabled auto-merge (squash) February 4, 2022 13:02
@ranocha
Copy link
Member

ranocha commented Feb 4, 2022

Nice work, @lchristm 👍

Out of curiosity: Did you run some benchmarks of MPI vs. multithreading?

Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into everything in detail, but what I've seen looks very good. Well done!
I'm also curious about performance, although it's only 2D yet.

@ranocha ranocha disabled auto-merge February 4, 2022 14:42
@ranocha ranocha merged commit a174184 into trixi-framework:main Feb 4, 2022
@lchristm lchristm deleted the parallel-p4est branch February 4, 2022 15:26
@lchristm
Copy link
Member Author

lchristm commented Feb 7, 2022

@ranocha, @efaulhaber I didn't do any extensive benchmarking but I did save some numbers when I checked the correctness of my implementation. The following numbers were obtained by running a modified conforming version of elixir_euler_source_terms_nonconforming_unstructured_flag.jl with initial_refinement_level=4.

Multithreading with 6 threads:

────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              18.3s /  69.2%            192MiB /  98.3%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       5.98k    11.2s   88.2%  1.87ms   18.6MiB    9.9%  3.19KiB
   interface flux           5.98k    3.05s   24.1%   510μs   2.83MiB    1.5%     496B
   volume integral          5.98k    2.88s   22.8%   482μs   1.73MiB    0.9%     304B
   source terms             5.98k    2.81s   22.2%   470μs   3.83MiB    2.0%     672B
   prolong2interfaces       5.98k    782ms    6.2%   131μs   1.37MiB    0.7%     240B
   surface integral         5.98k    574ms    4.5%  96.0μs   1.73MiB    0.9%     304B
   reset ∂u/∂t              5.98k    567ms    4.5%  94.9μs   5.94KiB    0.0%    1.02B
   Jacobian                 5.98k    222ms    1.8%  37.2μs   1.55MiB    0.8%     272B
   boundary flux            5.98k    125ms    1.0%  20.9μs   2.10MiB    1.1%     368B
   ~rhs!~                   5.98k    113ms    0.9%  18.9μs   2.12MiB    1.1%     371B
   prolong2boundaries       5.98k   40.2ms    0.3%  6.73μs   1.37MiB    0.7%     240B
   mortar flux              5.98k   2.00ms    0.0%   334ns     0.00B    0.0%    0.00B
   prolong2mortars          5.98k   1.71ms    0.0%   286ns     0.00B    0.0%    0.00B
 calculate dt               1.20k    1.13s    8.9%   941μs    169KiB    0.1%     144B
 analyze solution              13    191ms    1.5%  14.7ms   16.1MiB    8.5%  1.24MiB
 I/O                           27    184ms    1.5%  6.83ms    153MiB   81.5%  5.68MiB
   save solution               13    128ms    1.0%  9.83ms    117MiB   62.2%  9.01MiB
   ~I/O~                       27   55.9ms    0.4%  2.07ms   36.1MiB   19.2%  1.34MiB
   get element variables       13    719μs    0.0%  55.3μs   35.8KiB    0.0%  2.75KiB
   save mesh                   13   5.24μs    0.0%   403ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

MPI with 6 processes:

────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              14.2s /  91.2%            123MiB /  99.4%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       5.98k    11.5s   88.5%  1.92ms   12.9MiB   10.5%  2.21KiB
   interface flux           5.98k    2.85s   21.9%   476μs     0.00B    0.0%    0.00B
   volume integral          5.98k    2.31s   17.8%   385μs     0.00B    0.0%    0.00B
   source terms             5.98k    2.24s   17.3%   375μs   2.19MiB    1.8%     384B
   finish MPI send          5.98k    1.08s    8.3%   181μs   1.64MiB    1.3%     288B
   surface integral         5.98k    751ms    5.8%   126μs     0.00B    0.0%    0.00B
   prolong2interfaces       5.98k    750ms    5.8%   125μs     0.00B    0.0%    0.00B
   finish MPI receive       5.98k    618ms    4.8%   103μs   3.19MiB    2.6%     560B
   Jacobian                 5.98k    212ms    1.6%  35.4μs     0.00B    0.0%    0.00B
   ~rhs!~                   5.98k    164ms    1.3%  27.4μs   2.21MiB    1.8%     387B
   MPI interface flux       5.98k    155ms    1.2%  25.9μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              5.98k    137ms    1.1%  22.9μs     0.00B    0.0%    0.00B
   start MPI send           5.98k    110ms    0.8%  18.4μs   2.19MiB    1.8%     384B
   start MPI receive        5.98k   56.2ms    0.4%  9.40μs   1.46MiB    1.2%     256B
   prolong2mpiinterfaces    5.98k   41.5ms    0.3%  6.94μs     0.00B    0.0%    0.00B
   prolong2mortars          5.98k   5.07ms    0.0%   848ns     0.00B    0.0%    0.00B
   boundary flux            5.98k   4.59ms    0.0%   767ns     0.00B    0.0%    0.00B
   prolong2boundaries       5.98k   3.28ms    0.0%   548ns     0.00B    0.0%    0.00B
   mortar flux              5.98k   1.92ms    0.0%   320ns     0.00B    0.0%    0.00B
 calculate dt               1.20k    1.03s    7.9%   857μs   37.4KiB    0.0%    32.0B
 I/O                           27    392ms    3.0%  14.5ms    107MiB   87.1%  3.94MiB
   save solution               13    229ms    1.8%  17.6ms   64.3MiB   52.6%  4.95MiB
   ~I/O~                       27    162ms    1.2%  6.00ms   42.1MiB   34.5%  1.56MiB
   get element variables       13    411μs    0.0%  31.6μs   37.0KiB    0.0%  2.84KiB
   save mesh                   13   4.10μs    0.0%   315ns     0.00B    0.0%    0.00B
 analyze solution              13   77.1ms    0.6%  5.93ms   2.88MiB    2.4%   227KiB
 ────────────────────────────────────────────────────────────────────────────────────

@ranocha
Copy link
Member

ranocha commented Feb 7, 2022

Thanks! Did you use multi-threaded or serial RK methods?

@sloede
Copy link
Member

sloede commented Feb 7, 2022

finish MPI send          5.98k    1.08s    8.3%   181μs   1.64MiB    1.3%     288B
[...]
finish MPI receive       5.98k    618ms    4.8%   103μs   3.19MiB    2.6%     560B

I forgot - does this only time the MPI operation or also the unpacking of buffers? Especially the finish MPI send number, however, is so large that it seems clear that there is room for improved load balancing (which was not part of this PR, so everything is fine here!)

@lchristm
Copy link
Member Author

lchristm commented Feb 7, 2022

Thanks! Did you use multi-threaded or serial RK methods?

Serial. I think that's where difference in runtime is coming from.

I forgot - does this only time the MPI operation or also the unpacking of buffers?

It includes unpacking the buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants