Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upwind FDSBP performance improvements #1282

Merged
merged 14 commits into from
Nov 30, 2022
Merged

Upwind FDSBP performance improvements #1282

merged 14 commits into from
Nov 30, 2022

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Nov 29, 2022

  • Combining flux evaluations leads to at least ca.
    • 20% performance improvements in 3D (Taylor-Green vortex)
    • 10% performance improvements in 2D (Kelvin-Helmholtz instability)
  • Rewriting the combined evaluation by copying the uni-directional versions yields ca. 2 % better performance, but is not worth the code duplication from my point of view. Note that we still need only one of the two fluxes in several parts, e.g., for the upwind SATs.
  • Introduce positive_part and negative_part instead of computing abs etc.
  • Some plain common subexpression elimination (CSE) that LLVM does not seem to be able to handle right now 😞
  • use at-muladd for boundary coefficients in fast and threaded mode ranocha/SummationByPartsOperators.jl#159 gives another speed-up of ca. 5 to 10 % for 3D TGV with default setup

All of the changes above (including updating SummationByPartsOperators.jl to v0.5.27) increase the performance of the Taylor-Green vortex

julia> trixi_include("examples/tree_3d_fdsbp/elixir_euler_taylor_green_vortex.jl", tspan=(0.0, 1.0), analysis_interval=1000)

on my system at work as follows:

dev this PR improvement
PID (s) 1.476e-07 1.133e-07 23.2%
runtime total (s) 40.3 31.3 22.3%
runtime rhs! (s) 34.1 25.2 26.1%
runtime volume terms (s) 29.2 20.8 28.7%
rel. time volume terms 80.6% 75.9%

@andrewwinters5000 Could you please check whether this looks good to you?

ranocha and others added 7 commits November 28, 2022 13:13
* strengthen the isentropic vortex initial conditions in TreeMesh elixirs

* update test values for new isentropic vortex initial conditions

* update vortex initial condition in special elixir and docs

* fix typos in the IC

* update tree_mpi test values

* remove comment lines because them seem to break literate

Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
@ranocha ranocha marked this pull request as draft November 29, 2022 15:06
@ranocha ranocha mentioned this pull request Nov 29, 2022
4 tasks
@ranocha ranocha marked this pull request as ready for review November 29, 2022 16:53
@andrewwinters5000
Copy link
Member

Could you please check whether this looks good to you?

Yes, I will have a look tomorrow morning.

Copy link
Member

@andrewwinters5000 andrewwinters5000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall nice work! Returning the pair of fp and fm fluxes was something that Gregor and I had in mind from the beginning but couldn't work out the best way to do it.

I left comments / questions mainly on the 1D versions of the new implementation because the 2D/3D variants are very similar.

src/auxiliary/math.jl Show resolved Hide resolved
src/equations/compressible_euler_1d.jl Show resolved Hide resolved
src/equations/compressible_euler_1d.jl Outdated Show resolved Hide resolved
src/solvers/fdsbp_tree/fdsbp_1d.jl Show resolved Hide resolved
src/solvers/fdsbp_tree/fdsbp_1d.jl Show resolved Hide resolved
@ranocha ranocha changed the title WIP: Upwind FDSBP performance improvements Upwind FDSBP performance improvements Nov 30, 2022
@ranocha ranocha added the performance We are greedy label Nov 30, 2022
Copy link
Member

@andrewwinters5000 andrewwinters5000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ranocha ranocha merged commit e721cac into dev Nov 30, 2022
@ranocha ranocha deleted the hr/fdsbp_performance branch November 30, 2022 12:39
ranocha added a commit that referenced this pull request Nov 30, 2022
* improve some docstrings (#1274)

* fix convergence_test for elixirs without trailing new line (#1280)

* Fix `initial_condition_isentropic_vortex` (#1279)

* strengthen the isentropic vortex initial conditions in TreeMesh elixirs

* update test values for new isentropic vortex initial conditions

* update vortex initial condition in special elixir and docs

* fix typos in the IC

* update tree_mpi test values

* remove comment lines because them seem to break literate

Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>

* improve performance of 3D volume integral with steger_warming_splitting by ca. 1/4 for Taylor-Green (serial)

* lose 2 percent of threaded volume terms performance but simplify code and reduce duplications

* remove flux_upwind stub

* 2D and 1D as well; improvement ca. 20% in 3D, 10% in 2D, not much in 1D

* positive_part, negative_part for 3D Steger-Warming; improves serial PID of TGV by ca. 10%

* positive_part, negative_part for 2D, 1D

* simplify CSE for LLVM (ca. 2 % for TGV)

* comments on f_minus_plus etc.

* improve comment

* comments on positive/negative part

Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se>
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
ranocha added a commit that referenced this pull request Dec 2, 2022
* improve some docstrings (#1274)

* fix convergence_test for elixirs without trailing new line (#1280)

* Fix `initial_condition_isentropic_vortex` (#1279)

* strengthen the isentropic vortex initial conditions in TreeMesh elixirs

* update test values for new isentropic vortex initial conditions

* update vortex initial condition in special elixir and docs

* fix typos in the IC

* update tree_mpi test values

* remove comment lines because them seem to break literate

Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>

* improve performance of 3D volume integral with steger_warming_splitting by ca. 1/4 for Taylor-Green (serial)

* lose 2 percent of threaded volume terms performance but simplify code and reduce duplications

* remove flux_upwind stub

* 2D and 1D as well; improvement ca. 20% in 3D, 10% in 2D, not much in 1D

* positive_part, negative_part for 3D Steger-Warming; improves serial PID of TGV by ca. 10%

* positive_part, negative_part for 2D, 1D

* simplify CSE for LLVM (ca. 2 % for TGV)

* comments on f_minus_plus etc.

* improve comment

* comments on positive/negative part

Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se>
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants