Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid type instability in SemidiscretizationCoupled #1979

Merged
merged 8 commits into from
Jun 21, 2024

Conversation

efaulhaber
Copy link
Member

Before:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 720 (accepted), 720 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               4.75s /  98.3%           3.31GiB / 100.0%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 copy to coupled boundaries    3.60k    4.04s   86.5%  1.12ms   3.28GiB   99.3%   956KiB
 rhs!                          21.6k    406ms    8.7%  18.8μs   5.14KiB    0.0%    0.24B
   volume integral             21.6k    215ms    4.6%  10.0μs     0.00B    0.0%    0.00B
   interface flux              21.6k   64.5ms    1.4%  2.99μs     0.00B    0.0%    0.00B
   surface integral            21.6k   48.7ms    1.0%  2.25μs     0.00B    0.0%    0.00B
   boundary flux               21.6k   33.2ms    0.7%  1.54μs     0.00B    0.0%    0.00B
   Jacobian                    21.6k   22.1ms    0.5%  1.02μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                 21.6k   12.2ms    0.3%   566ns     0.00B    0.0%    0.00B
   ~rhs!~                      21.6k   9.90ms    0.2%   458ns   5.14KiB    0.0%    0.24B
   source terms                21.6k    245μs    0.0%  11.3ns     0.00B    0.0%    0.00B
 I/O                              50    210ms    4.5%  4.21ms   13.7MiB    0.4%   281KiB
   save solution                 294    208ms    4.5%   709μs   12.2MiB    0.4%  42.7KiB
   ~I/O~                          50   1.81ms    0.0%  36.1μs    928KiB    0.0%  18.6KiB
   save mesh                      49    112μs    0.0%  2.28μs    602KiB    0.0%  12.3KiB
   get element variables         294    104μs    0.0%   352ns     0.00B    0.0%    0.00B
   get node variables            294   5.04μs    0.0%  17.1ns     0.00B    0.0%    0.00B
 calculate dt                    721   13.6ms    0.3%  18.8μs   9.10MiB    0.3%  12.9KiB
 ───────────────────────────────────────────────────────────────────────────────────────

Now:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0368e6  Time steps: 720 (accepted), 720 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               1.53s /  95.2%            539MiB / 100.0%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 copy to coupled boundaries    3.60k    907ms   62.2%   252μs    516MiB   95.8%   147KiB
 rhs!                          21.6k    382ms   26.2%  17.7μs   5.14KiB    0.0%    0.24B
   volume integral             21.6k    206ms   14.1%  9.52μs     0.00B    0.0%    0.00B
   interface flux              21.6k   59.4ms    4.1%  2.75μs     0.00B    0.0%    0.00B
   surface integral            21.6k   46.1ms    3.2%  2.13μs     0.00B    0.0%    0.00B
   boundary flux               21.6k   30.1ms    2.1%  1.40μs     0.00B    0.0%    0.00B
   Jacobian                    21.6k   20.6ms    1.4%   954ns     0.00B    0.0%    0.00B
   reset ∂u/∂t                 21.6k   11.1ms    0.8%   515ns     0.00B    0.0%    0.00B
   ~rhs!~                      21.6k   8.50ms    0.6%   393ns   5.14KiB    0.0%    0.24B
   source terms                21.6k    250μs    0.0%  11.6ns     0.00B    0.0%    0.00B
 I/O                              50    159ms   10.9%  3.17ms   13.7MiB    2.6%   281KiB
   save solution                 294    157ms   10.8%   534μs   12.2MiB    2.3%  42.7KiB
   ~I/O~                          50   1.47ms    0.1%  29.4μs    928KiB    0.2%  18.6KiB
   save mesh                      49   79.1μs    0.0%  1.61μs    602KiB    0.1%  12.3KiB
   get element variables         294   59.9μs    0.0%   204ns     0.00B    0.0%    0.00B
   get node variables            294   5.12μs    0.0%  17.4ns     0.00B    0.0%    0.00B
 calculate dt                    721   11.7ms    0.8%  16.2μs   9.10MiB    1.7%  12.9KiB
 ───────────────────────────────────────────────────────────────────────────────────────

@efaulhaber efaulhaber added the performance We are greedy label Jun 13, 2024
@efaulhaber efaulhaber requested a review from SimonCan June 13, 2024 16:33
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 95.55556% with 4 lines in your changes missing coverage. Please review.

Project coverage is 96.16%. Comparing base (5398b22) to head (6c92dd1).
Report is 3 commits behind head on main.

Files Patch % Lines
src/time_integration/methods_2N.jl 94.74% 2 Missing ⚠️
src/time_integration/methods_3Sstar.jl 95.24% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1979      +/-   ##
==========================================
- Coverage   96.16%   96.16%   -0.00%     
==========================================
  Files         460      460              
  Lines       36958    36968      +10     
==========================================
+ Hits        35539    35548       +9     
- Misses       1419     1420       +1     
Flag Coverage Δ
unittests 96.16% <95.56%> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ranocha ranocha enabled auto-merge (squash) June 14, 2024 15:18
@ranocha ranocha disabled auto-merge June 14, 2024 15:19
@SimonCan
Copy link
Contributor

Looks fine to me now.
Thanks for the fix.

@ranocha
Copy link
Member

ranocha commented Jun 18, 2024

@efaulhaber
Copy link
Member Author

Okay, so the version on main is apparently allocation-free for simulations where all semidiscretizations have the same mesh, equations, solver and cache type. Then

mesh_other, equations_other, solver_other, cache_other = mesh_equations_solver_cache(other_semi_index,
                                                                                         1,
                                                                                         semi_coupled.semis...)

is type-stable, as the types of the return values will be the same over all values of other_semi_index. In cases where we have different equations or mesh types for the semidiscretizations, this allocates and is very slow, as it allocates inside a performance-relevant function.

The version in this PR always allocates, but one level higher, where the type-instability is not as performance-relevant.

@efaulhaber
Copy link
Member Author

Okay, new version.
Now I store boundary_condition.other_semi_index statically in the type of the BoundaryConditionCoupled, removing all type instabilities and allocations.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unbelievably ugly but I believe that @efaulhaber needs this quickly for a teaching project. IMHO OK to merge if @ranocha signs off as well.

@sloede
Copy link
Member

sloede commented Jun 20, 2024

This is unbelievably ugly but I believe that @efaulhaber needs this quickly for a teaching project. IMHO OK to merge if @ranocha signs off as well.

Just to be clear: Thanks a lot to @efaulhaber for taking the time and figuring out how to make this allocation free. Your help is really appreciated!

Copy link
Contributor

@SimonCan SimonCan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @sloede says it's urgent for a teaching project, I will merge this now. Please create a follow-up PR adding some comments why this structure is required.

@ranocha ranocha merged commit a70522f into trixi-framework:main Jun 21, 2024
34 of 36 checks passed
@efaulhaber efaulhaber deleted the coupled-type-instability branch June 24, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants