Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make copy_to_coupled_boundary! threaded #1981

Merged
merged 5 commits into from
Jun 18, 2024

Conversation

efaulhaber
Copy link
Member

When running a simulation with 100k DOFs on my laptop with 6 threads (the difference will be much bigger on more threads):
Before (with #1978 and #1979):

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 2736 (accepted), 2736 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               35.9s /  63.7%           2.19GiB /  99.8%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 copy to coupled boundaries    13.7k    13.9s   60.6%  1.01ms   1.91GiB   87.7%   147KiB
 rhs!                          82.1k    7.87s   34.4%  95.8μs   89.3MiB    4.0%  1.11KiB
   volume integral             82.1k    3.67s   16.0%  44.7μs   22.6MiB    1.0%     288B
   interface flux              82.1k    1.51s    6.6%  18.4μs   26.5MiB    1.2%     339B
   surface integral            82.1k    826ms    3.6%  10.1μs   21.3MiB    1.0%     272B
   reset ∂u/∂t                 82.1k    728ms    3.2%  8.87μs   6.36KiB    0.0%    0.08B
   boundary flux               82.1k    643ms    2.8%  7.83μs     0.00B    0.0%    0.00B
   Jacobian                    82.1k    423ms    1.8%  5.15μs   18.9MiB    0.8%     241B
   ~rhs!~                      82.1k   65.8ms    0.3%   801ns   5.14KiB    0.0%    0.06B
   source terms                82.1k   1.04ms    0.0%  12.6ns     0.00B    0.0%    0.00B
 calculate dt                  2.74k    658ms    2.9%   241μs   34.5MiB    1.5%  12.9KiB
 I/O                              50    482ms    2.1%  9.65ms    151MiB    6.8%  3.03MiB
   save solution                 294    480ms    2.1%  1.63ms    150MiB    6.7%   522KiB
   ~I/O~                          50   1.92ms    0.0%  38.3μs    928KiB    0.0%  18.6KiB
   save mesh                      49    139μs    0.0%  2.84μs    602KiB    0.0%  12.3KiB
   get element variables         294   92.8μs    0.0%   316ns     0.00B    0.0%    0.00B
   get node variables            294   5.25μs    0.0%  17.8ns     0.00B    0.0%    0.00B
 ───────────────────────────────────────────────────────────────────────────────────────

This PR (with #1978 and #1979):

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 2736 (accepted), 2736 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               26.4s /  42.6%           2.31GiB /  99.9%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 rhs!                          82.1k    5.83s   51.8%  71.0μs   88.9MiB    3.8%  1.11KiB
   volume integral             82.1k    2.83s   25.2%  34.5μs   22.5MiB    1.0%     288B
   interface flux              82.1k    1.04s    9.2%  12.6μs   26.3MiB    1.1%     336B
   boundary flux               82.1k    595ms    5.3%  7.25μs     0.00B    0.0%    0.00B
   surface integral            82.1k    551ms    4.9%  6.71μs   21.3MiB    0.9%     272B
   reset ∂u/∂t                 82.1k    455ms    4.0%  5.54μs     0.00B    0.0%    0.00B
   Jacobian                    82.1k    293ms    2.6%  3.57μs   18.8MiB    0.8%     240B
   ~rhs!~                      82.1k   59.7ms    0.5%   727ns   5.14KiB    0.0%    0.06B
   source terms                82.1k   1.03ms    0.0%  12.5ns     0.00B    0.0%    0.00B
 copy to coupled boundaries    13.7k    4.32s   38.4%   316μs   2.04GiB   88.4%   156KiB
 calculate dt                  2.74k    713ms    6.3%   261μs   34.5MiB    1.5%  12.9KiB
 I/O                              50    395ms    3.5%  7.90ms    151MiB    6.4%  3.03MiB
   save solution                 294    393ms    3.5%  1.34ms    150MiB    6.3%   522KiB
   ~I/O~                          50   1.86ms    0.0%  37.2μs    928KiB    0.0%  18.6KiB
   save mesh                      49    137μs    0.0%  2.79μs    602KiB    0.0%  12.3KiB
   get element variables         294   96.0μs    0.0%   327ns     0.00B    0.0%    0.00B
   get node variables            294   8.92μs    0.0%  30.3ns     0.00B    0.0%    0.00B
 ───────────────────────────────────────────────────────────────────────────────────────

@efaulhaber efaulhaber added performance We are greedy parallelization Related to MPI, threading, tasks etc. labels Jun 13, 2024
@efaulhaber efaulhaber requested a review from SimonCan June 13, 2024 16:50
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.16%. Comparing base (5398b22) to head (58707b1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1981      +/-   ##
==========================================
- Coverage   96.16%   96.16%   -0.00%     
==========================================
  Files         460      460              
  Lines       36958    36958              
==========================================
- Hits        35539    35538       -1     
- Misses       1419     1420       +1     
Flag Coverage Δ
unittests 96.16% <100.00%> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SimonCan
SimonCan previously approved these changes Jun 14, 2024
Copy link
Contributor

@SimonCan SimonCan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the final comment is resolved. Thanks for your efforts, @efaulhaber!

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
@efaulhaber efaulhaber requested a review from sloede June 17, 2024 09:21
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ranocha ranocha merged commit 75d8c67 into trixi-framework:main Jun 18, 2024
34 of 36 checks passed
@efaulhaber efaulhaber deleted the threaded-copy-coupled branch June 18, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Related to MPI, threading, tasks etc. performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants