-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: TrixiMPIArray #1104
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1104 +/- ##
=======================================
Coverage 96.75% 96.75%
=======================================
Files 303 305 +2
Lines 23876 23931 +55
=======================================
+ Hits 23099 23153 +54
- Misses 777 778 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great news that you have started thinking about an MPI array implementation for Trixi! I looked through the code and left some comments where I thought it might be helpful. Looking forward to getting something like this to work with the adaptive time integration schemes 😎
Some results from 987407e
julia --check-bounds=no --threads=2 julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0))
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 2.73s / 90.4% 23.5MiB / 97.3%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 4.24k 2.36s 95.9% 558μs 7.57MiB 33.1% 1.83KiB
volume integral 4.24k 1.94s 78.6% 458μs 1.16MiB 5.1% 288B
interface flux 4.24k 251ms 10.2% 59.2μs 1.62MiB 7.1% 400B
prolong2interfaces 4.24k 58.2ms 2.4% 13.7μs 0.97MiB 4.2% 240B
surface integral 4.24k 56.3ms 2.3% 13.3μs 1.23MiB 5.4% 304B
reset ∂u/∂t 4.24k 28.3ms 1.1% 6.68μs 0.00B 0.0% 0.00B
Jacobian 4.24k 22.3ms 0.9% 5.27μs 1.10MiB 4.8% 272B
~rhs!~ 4.24k 8.06ms 0.3% 1.90μs 1.50MiB 6.5% 370B
prolong2boundaries 4.24k 251μs 0.0% 59.2ns 0.00B 0.0% 0.00B
prolong2mortars 4.24k 177μs 0.0% 41.7ns 0.00B 0.0% 0.00B
mortar flux 4.24k 145μs 0.0% 34.3ns 0.00B 0.0% 0.00B
source terms 4.24k 91.7μs 0.0% 21.6ns 0.00B 0.0% 0.00B
boundary flux 4.24k 87.0μs 0.0% 20.5ns 0.00B 0.0% 0.00B
calculate dt 848 50.1ms 2.0% 59.0μs 0.00B 0.0% 0.00B
analyze solution 10 30.6ms 1.2% 3.06ms 174KiB 0.7% 17.4KiB
I/O 11 20.9ms 0.8% 1.90ms 15.1MiB 66.1% 1.38MiB
save solution 10 20.7ms 0.8% 2.07ms 15.1MiB 66.0% 1.51MiB
get element variables 10 97.2μs 0.0% 9.72μs 20.6KiB 0.1% 2.06KiB
~I/O~ 11 26.0μs 0.0% 2.37μs 7.20KiB 0.0% 671B
save mesh 10 785ns 0.0% 78.5ns 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 1.43s / 81.7% 15.5MiB / 86.5%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 2.35k 1.14s 97.7% 487μs 4.20MiB 31.4% 1.83KiB
volume integral 2.35k 924ms 79.0% 394μs 660KiB 4.8% 288B
interface flux 2.35k 121ms 10.3% 51.3μs 917KiB 6.7% 400B
prolong2interfaces 2.35k 32.3ms 2.8% 13.7μs 550KiB 4.0% 240B
surface integral 2.35k 30.9ms 2.6% 13.1μs 697KiB 5.1% 304B
reset ∂u/∂t 2.35k 17.4ms 1.5% 7.42μs 0.00B 0.0% 0.00B
Jacobian 2.35k 12.9ms 1.1% 5.51μs 624KiB 4.6% 272B
~rhs!~ 2.35k 4.41ms 0.4% 1.88μs 853KiB 6.2% 372B
prolong2boundaries 2.35k 158μs 0.0% 67.3ns 0.00B 0.0% 0.00B
prolong2mortars 2.35k 104μs 0.0% 44.2ns 0.00B 0.0% 0.00B
mortar flux 2.35k 79.6μs 0.0% 33.9ns 0.00B 0.0% 0.00B
source terms 2.35k 54.2μs 0.0% 23.1ns 0.00B 0.0% 0.00B
boundary flux 2.35k 50.1μs 0.0% 21.3ns 0.00B 0.0% 0.00B
analyze solution 6 18.3ms 1.6% 3.05ms 105KiB 0.8% 17.5KiB
I/O 7 9.09ms 0.8% 1.30ms 9.08MiB 67.8% 1.30MiB
save solution 6 9.00ms 0.8% 1.50ms 9.06MiB 67.7% 1.51MiB
get element variables 6 73.3μs 0.0% 12.2μs 12.4KiB 0.1% 2.06KiB
~I/O~ 7 16.2μs 0.0% 2.31μs 5.20KiB 0.0% 761B
save mesh 6 448ns 0.0% 74.7ns 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
tmpi 2 julia --check-bounds=no --threads=1 julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0))
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 2.72s / 95.4% 19.2MiB / 98.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 4.24k 2.49s 95.9% 588μs 3.44MiB 18.3% 852B
volume integral 4.24k 2.01s 77.4% 475μs 0.00B 0.0% 0.00B
interface flux 4.24k 277ms 10.6% 65.3μs 0.00B 0.0% 0.00B
surface integral 4.24k 55.9ms 2.1% 13.2μs 0.00B 0.0% 0.00B
prolong2interfaces 4.24k 52.2ms 2.0% 12.3μs 0.00B 0.0% 0.00B
reset ∂u/∂t 4.24k 23.0ms 0.9% 5.44μs 0.00B 0.0% 0.00B
Jacobian 4.24k 19.6ms 0.8% 4.64μs 0.00B 0.0% 0.00B
MPI interface flux 4.24k 13.6ms 0.5% 3.22μs 0.00B 0.0% 0.00B
~rhs!~ 4.24k 11.8ms 0.5% 2.79μs 1.70MiB 9.0% 420B
finish MPI receive 4.24k 11.4ms 0.4% 2.68μs 530KiB 2.8% 128B
start MPI send 4.24k 9.67ms 0.4% 2.28μs 397KiB 2.1% 96.0B
prolong2mpiinterfaces 4.24k 3.17ms 0.1% 749ns 0.00B 0.0% 0.00B
finish MPI send 4.24k 1.03ms 0.0% 243ns 596KiB 3.1% 144B
start MPI receive 4.24k 912μs 0.0% 215ns 265KiB 1.4% 64.0B
prolong2mortars 4.24k 286μs 0.0% 67.5ns 0.00B 0.0% 0.00B
prolong2boundaries 4.24k 256μs 0.0% 60.3ns 0.00B 0.0% 0.00B
MPI mortar flux 4.24k 224μs 0.0% 52.8ns 0.00B 0.0% 0.00B
prolong2mpimortars 4.24k 210μs 0.0% 49.6ns 0.00B 0.0% 0.00B
mortar flux 4.24k 148μs 0.0% 35.0ns 0.00B 0.0% 0.00B
boundary flux 4.24k 91.0μs 0.0% 21.5ns 0.00B 0.0% 0.00B
source terms 4.24k 75.2μs 0.0% 17.8ns 0.00B 0.0% 0.00B
calculate dt 848 70.5ms 2.7% 83.2μs 79.5KiB 0.4% 96.0B
analyze solution 10 22.1ms 0.9% 2.21ms 2.61MiB 13.9% 267KiB
I/O 11 14.6ms 0.6% 1.33ms 12.6MiB 67.4% 1.15MiB
save solution 10 14.4ms 0.6% 1.44ms 12.6MiB 67.2% 1.26MiB
get element variables 10 178μs 0.0% 17.8μs 23.0KiB 0.1% 2.30KiB
~I/O~ 11 21.5μs 0.0% 1.95μs 7.20KiB 0.0% 671B
save mesh 10 991ns 0.0% 99.1ns 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 1.44s / 87.5% 12.3MiB / 90.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 2.35k 1.23s 98.1% 525μs 1.91MiB 17.3% 855B
volume integral 2.35k 978ms 77.7% 416μs 0.00B 0.0% 0.00B
interface flux 2.35k 135ms 10.7% 57.4μs 0.00B 0.0% 0.00B
surface integral 2.35k 31.0ms 2.5% 13.2μs 0.00B 0.0% 0.00B
prolong2interfaces 2.35k 30.3ms 2.4% 12.9μs 0.00B 0.0% 0.00B
reset ∂u/∂t 2.35k 12.6ms 1.0% 5.37μs 0.00B 0.0% 0.00B
finish MPI receive 2.35k 11.5ms 0.9% 4.90μs 294KiB 2.6% 128B
Jacobian 2.35k 11.2ms 0.9% 4.77μs 0.00B 0.0% 0.00B
MPI interface flux 2.35k 7.86ms 0.6% 3.35μs 0.00B 0.0% 0.00B
~rhs!~ 2.35k 7.16ms 0.6% 3.05μs 969KiB 8.6% 423B
start MPI send 2.35k 5.48ms 0.4% 2.33μs 220KiB 1.9% 96.0B
prolong2mpiinterfaces 2.35k 1.91ms 0.2% 813ns 0.00B 0.0% 0.00B
finish MPI send 2.35k 712μs 0.1% 303ns 330KiB 2.9% 144B
start MPI receive 2.35k 547μs 0.0% 233ns 147KiB 1.3% 64.0B
prolong2mortars 2.35k 184μs 0.0% 78.6ns 0.00B 0.0% 0.00B
prolong2mpimortars 2.35k 161μs 0.0% 68.7ns 0.00B 0.0% 0.00B
prolong2boundaries 2.35k 154μs 0.0% 65.5ns 0.00B 0.0% 0.00B
MPI mortar flux 2.35k 120μs 0.0% 51.3ns 0.00B 0.0% 0.00B
mortar flux 2.35k 109μs 0.0% 46.4ns 0.00B 0.0% 0.00B
source terms 2.35k 58.0μs 0.0% 24.7ns 0.00B 0.0% 0.00B
boundary flux 2.35k 47.8μs 0.0% 20.4ns 0.00B 0.0% 0.00B
analyze solution 6 13.3ms 1.1% 2.21ms 1.56MiB 14.1% 267KiB
I/O 7 10.8ms 0.9% 1.54ms 7.58MiB 68.6% 1.08MiB
save solution 6 10.6ms 0.8% 1.76ms 7.57MiB 68.4% 1.26MiB
get element variables 6 169μs 0.0% 28.1μs 13.8KiB 0.1% 2.30KiB
~I/O~ 7 12.8μs 0.0% 1.83μs 5.20KiB 0.0% 761B
save mesh 6 647ns 0.0% 108ns 0.00B 0.0% 0.00B
──────────────────────────────────────────────────────────────────────────────────── TL/DR: Looks reasonable |
# like regular `Array`s in most code, e.g., when looping over an array (which | ||
# should use `eachindex`). At the same time, we want to be able to use adaptive | ||
# time stepping using error estimates in OrdinaryDiffEq.jl. There, the default | ||
# norm `ODE_DEFAULT_NORM` is the one described in the book of Hairer & Wanner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking - can we avoid this local_length
issue if we define a norm
function that works in parallel? That might be an alternative to having to remember to use local_length
.
A potential downside of local_length
- that I just noticed - is that it allows users to create code that works in serial but may fail in spectacularly surprising ways if run in parallel. That is, if someone uses length
where local_length
is required, it works fine in serial but may cause weird issues in parallel (especially if running with --check-bounds=no).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the issue of the minimally invasive approach using a global length
. However, I would argue that users should better use eachindex
in most cases, which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I agree - eachindex
should be used where possible. It makes, however, for difficult-to-understand errors, and the "wrong" use of length
might be hard to spot in reviews. I suggest to continue making it work, but then we should revisit this (or at least capture it in an issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative would be to write our own norm function and pass that as solve(ode, alg; kwargs..., internalnorm=our_new_norm_function)
. However, that requires yet another keyword argument we need to remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, especially since it would fail very late during the initialization (or even worse, just hang) if forgotten. Maybe we need our own trixi_solve
that passes some default options to OrdinaryDiffEq.jl's solve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this or set up some trixi_default_kwargs()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be better. We don't need to solve this right now, though, do we? Maybe we just copy the current discussion to an issue and deal with it later, once we have some more experience with the new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good to me. I'll leave this thread open and we can continue the discussion later (#1108).
New results from Rocinante:
julia --project=. --check-bounds=no --threads=24 julia> using Trixi, OrdinaryDiffEq
julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0),
initial_refinement_level=6, save_solution=TrivialCallback())
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
─────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 8.31s / 44.9% 18.3MiB / 87.6%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────────
rhs! 8.77k 3.00s 80.5% 343μs 15.7MiB 98.0% 1.83KiB
volume integral 8.77k 1.59s 42.5% 181μs 2.41MiB 15.1% 288B
reset ∂u/∂t 8.77k 883ms 23.6% 101μs 0.00B 0.0% 0.00B
interface flux 8.77k 289ms 7.7% 33.0μs 3.35MiB 20.9% 400B
prolong2interfaces 8.77k 92.6ms 2.5% 10.6μs 2.01MiB 12.6% 240B
surface integral 8.77k 89.8ms 2.4% 10.2μs 2.54MiB 15.9% 304B
~rhs!~ 8.77k 32.1ms 0.9% 3.66μs 3.09MiB 19.3% 369B
Jacobian 8.77k 29.6ms 0.8% 3.38μs 2.28MiB 14.2% 272B
prolong2mortars 8.77k 473μs 0.0% 54.0ns 0.00B 0.0% 0.00B
prolong2boundaries 8.77k 469μs 0.0% 53.5ns 0.00B 0.0% 0.00B
mortar flux 8.77k 291μs 0.0% 33.2ns 0.00B 0.0% 0.00B
boundary flux 8.77k 207μs 0.0% 23.5ns 0.00B 0.0% 0.00B
source terms 8.77k 205μs 0.0% 23.4ns 0.00B 0.0% 0.00B
calculate dt 1.75k 554ms 14.8% 316μs 0.00B 0.0% 0.00B
analyze solution 19 175ms 4.7% 9.22ms 328KiB 2.0% 17.3KiB
─────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false, thread=OrdinaryDiffEq.True()), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
─────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 3.56s / 80.3% 19.6MiB / 81.5%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────────
rhs! 8.77k 2.13s 74.5% 243μs 15.7MiB 98.0% 1.83KiB
volume integral 8.77k 1.54s 53.8% 175μs 2.41MiB 15.1% 288B
interface flux 8.77k 286ms 10.0% 32.6μs 3.35MiB 20.9% 400B
prolong2interfaces 8.77k 120ms 4.2% 13.7μs 2.01MiB 12.6% 240B
surface integral 8.77k 87.9ms 3.1% 10.0μs 2.54MiB 15.9% 304B
reset ∂u/∂t 8.77k 33.9ms 1.2% 3.87μs 0.00B 0.0% 0.00B
~rhs!~ 8.77k 31.2ms 1.1% 3.55μs 3.09MiB 19.3% 369B
Jacobian 8.77k 30.8ms 1.1% 3.52μs 2.28MiB 14.2% 272B
prolong2boundaries 8.77k 486μs 0.0% 55.4ns 0.00B 0.0% 0.00B
prolong2mortars 8.77k 378μs 0.0% 43.1ns 0.00B 0.0% 0.00B
mortar flux 8.77k 288μs 0.0% 32.8ns 0.00B 0.0% 0.00B
boundary flux 8.77k 204μs 0.0% 23.2ns 0.00B 0.0% 0.00B
source terms 8.77k 199μs 0.0% 22.7ns 0.00B 0.0% 0.00B
calculate dt 1.75k 555ms 19.4% 316μs 0.00B 0.0% 0.00B
analyze solution 19 172ms 6.0% 9.07ms 328KiB 2.0% 17.3KiB
─────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
─────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 4.52s / 35.2% 16.6MiB / 51.0%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────────
rhs! 4.64k 1.49s 93.7% 322μs 8.29MiB 97.8% 1.83KiB
volume integral 4.64k 687ms 43.2% 148μs 1.27MiB 15.0% 288B
reset ∂u/∂t 4.64k 474ms 29.8% 102μs 0.00B 0.0% 0.00B
interface flux 4.64k 142ms 8.9% 30.7μs 1.77MiB 20.9% 400B
~rhs!~ 4.64k 62.2ms 3.9% 13.4μs 1.64MiB 19.3% 370B
prolong2interfaces 4.64k 54.7ms 3.4% 11.8μs 1.06MiB 12.5% 240B
surface integral 4.64k 50.0ms 3.1% 10.8μs 1.34MiB 15.9% 304B
Jacobian 4.64k 18.5ms 1.2% 4.00μs 1.20MiB 14.2% 272B
prolong2mortars 4.64k 672μs 0.0% 145ns 0.00B 0.0% 0.00B
prolong2boundaries 4.64k 520μs 0.0% 112ns 0.00B 0.0% 0.00B
mortar flux 4.64k 345μs 0.0% 74.3ns 0.00B 0.0% 0.00B
source terms 4.64k 127μs 0.0% 27.4ns 0.00B 0.0% 0.00B
boundary flux 4.64k 108μs 0.0% 23.2ns 0.00B 0.0% 0.00B
analyze solution 11 101ms 6.3% 9.19ms 189KiB 2.2% 17.2KiB
─────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(thread=OrdinaryDiffEq.True()), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
─────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 2.57s / 44.0% 17.8MiB / 47.7%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────────
rhs! 4.64k 1.03s 91.2% 223μs 8.29MiB 97.8% 1.83KiB
volume integral 4.64k 660ms 58.2% 142μs 1.27MiB 15.0% 288B
interface flux 4.64k 142ms 12.5% 30.6μs 1.77MiB 20.9% 400B
reset ∂u/∂t 4.64k 92.8ms 8.2% 20.0μs 0.00B 0.0% 0.00B
prolong2interfaces 4.64k 62.0ms 5.5% 13.4μs 1.06MiB 12.5% 240B
surface integral 4.64k 45.4ms 4.0% 9.79μs 1.34MiB 15.9% 304B
~rhs!~ 4.64k 17.1ms 1.5% 3.69μs 1.64MiB 19.3% 370B
Jacobian 4.64k 14.4ms 1.3% 3.11μs 1.20MiB 14.2% 272B
prolong2boundaries 4.64k 238μs 0.0% 51.3ns 0.00B 0.0% 0.00B
mortar flux 4.64k 189μs 0.0% 40.8ns 0.00B 0.0% 0.00B
prolong2mortars 4.64k 183μs 0.0% 39.5ns 0.00B 0.0% 0.00B
boundary flux 4.64k 108μs 0.0% 23.2ns 0.00B 0.0% 0.00B
source terms 4.64k 105μs 0.0% 22.7ns 0.00B 0.0% 0.00B
analyze solution 11 99.4ms 8.8% 9.04ms 190KiB 2.2% 17.2KiB
─────────────────────────────────────────────────────────────────────────────────
tmpi 2 julia --project=. --check-bounds=no --threads=12 julia> using Trixi, OrdinaryDiffEq
julia> trixi_include("examples/tree_2d_dgsem/elixir_euler_ec.jl", tspan=(0.0, 10.0),
initial_refinement_level=6, save_solution=TrivialCallback())
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 5.61s / 58.3% 46.1MiB / 97.2%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 8.77k 2.84s 86.7% 323μs 25.4MiB 56.7% 2.97KiB
volume integral 8.77k 1.54s 47.2% 176μs 2.81MiB 6.3% 336B
reset ∂u/∂t 8.77k 415ms 12.7% 47.4μs 0.00B 0.0% 0.00B
interface flux 8.77k 282ms 8.6% 32.2μs 3.35MiB 7.5% 400B
finish MPI receive 8.77k 194ms 5.9% 22.1μs 1.07MiB 2.4% 128B
surface integral 8.77k 94.0ms 2.9% 10.7μs 2.54MiB 5.7% 304B
start MPI send 8.77k 93.2ms 2.8% 10.6μs 822KiB 1.8% 96.0B
prolong2interfaces 8.77k 85.0ms 2.6% 9.69μs 2.01MiB 4.5% 240B
~rhs!~ 8.77k 33.6ms 1.0% 3.83μs 3.49MiB 7.8% 418B
MPI interface flux 8.77k 31.2ms 1.0% 3.55μs 3.35MiB 7.5% 400B
Jacobian 8.77k 29.2ms 0.9% 3.33μs 2.41MiB 5.4% 288B
prolong2mpiinterfaces 8.77k 27.1ms 0.8% 3.09μs 1.87MiB 4.2% 224B
finish MPI send 8.77k 2.14ms 0.1% 244ns 1.20MiB 2.7% 144B
start MPI receive 8.77k 1.89ms 0.1% 216ns 548KiB 1.2% 64.0B
prolong2boundaries 8.77k 547μs 0.0% 62.4ns 0.00B 0.0% 0.00B
prolong2mpimortars 8.77k 401μs 0.0% 45.7ns 0.00B 0.0% 0.00B
prolong2mortars 8.77k 388μs 0.0% 44.3ns 0.00B 0.0% 0.00B
MPI mortar flux 8.77k 368μs 0.0% 42.0ns 0.00B 0.0% 0.00B
mortar flux 8.77k 287μs 0.0% 32.7ns 0.00B 0.0% 0.00B
source terms 8.77k 203μs 0.0% 23.2ns 0.00B 0.0% 0.00B
boundary flux 8.77k 201μs 0.0% 22.9ns 0.00B 0.0% 0.00B
calculate dt 1.75k 335ms 10.2% 191μs 165KiB 0.4% 96.0B
analyze solution 19 101ms 3.1% 5.29ms 19.2MiB 42.9% 1.01MiB
────────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, CarpenterKennedy2N54(williamson_condition=false, thread=OrdinaryDiffEq.True()), dt=1.0, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 3.30s / 82.8% 48.5MiB / 92.5%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 8.77k 2.33s 85.5% 266μs 25.4MiB 56.7% 2.97KiB
volume integral 8.77k 1.53s 56.2% 175μs 2.81MiB 6.3% 336B
interface flux 8.77k 297ms 10.9% 33.9μs 3.35MiB 7.5% 400B
prolong2interfaces 8.77k 105ms 3.9% 12.0μs 2.01MiB 4.5% 240B
finish MPI receive 8.77k 98.0ms 3.6% 11.2μs 1.07MiB 2.4% 128B
surface integral 8.77k 86.5ms 3.2% 9.87μs 2.54MiB 5.7% 304B
start MPI send 8.77k 62.5ms 2.3% 7.12μs 822KiB 1.8% 96.0B
~rhs!~ 8.77k 33.9ms 1.2% 3.86μs 3.49MiB 7.8% 418B
MPI interface flux 8.77k 33.0ms 1.2% 3.77μs 3.35MiB 7.5% 400B
Jacobian 8.77k 28.2ms 1.0% 3.22μs 2.41MiB 5.4% 288B
reset ∂u/∂t 8.77k 27.1ms 1.0% 3.09μs 0.00B 0.0% 0.00B
prolong2mpiinterfaces 8.77k 20.6ms 0.8% 2.35μs 1.87MiB 4.2% 224B
finish MPI send 8.77k 2.44ms 0.1% 279ns 1.20MiB 2.7% 144B
start MPI receive 8.77k 1.81ms 0.1% 207ns 548KiB 1.2% 64.0B
prolong2boundaries 8.77k 404μs 0.0% 46.0ns 0.00B 0.0% 0.00B
prolong2mortars 8.77k 380μs 0.0% 43.3ns 0.00B 0.0% 0.00B
MPI mortar flux 8.77k 341μs 0.0% 38.9ns 0.00B 0.0% 0.00B
prolong2mpimortars 8.77k 341μs 0.0% 38.8ns 0.00B 0.0% 0.00B
mortar flux 8.77k 250μs 0.0% 28.5ns 0.00B 0.0% 0.00B
source terms 8.77k 203μs 0.0% 23.1ns 0.00B 0.0% 0.00B
boundary flux 8.77k 201μs 0.0% 22.9ns 0.00B 0.0% 0.00B
calculate dt 1.75k 295ms 10.8% 168μs 165KiB 0.4% 96.0B
analyze solution 19 101ms 3.7% 5.32ms 19.2MiB 42.9% 1.01MiB
────────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 3.01s / 45.0% 29.0MiB / 84.7%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 4.64k 1.30s 95.7% 280μs 13.5MiB 54.7% 2.97KiB
volume integral 4.64k 677ms 50.0% 146μs 1.49MiB 6.0% 336B
reset ∂u/∂t 4.64k 233ms 17.2% 50.3μs 0.00B 0.0% 0.00B
interface flux 4.64k 137ms 10.1% 29.6μs 1.77MiB 7.2% 400B
surface integral 4.64k 47.6ms 3.5% 10.3μs 1.34MiB 5.5% 304B
finish MPI receive 4.64k 47.0ms 3.5% 10.1μs 580KiB 2.3% 128B
prolong2interfaces 4.64k 45.1ms 3.3% 9.72μs 1.06MiB 4.3% 240B
start MPI send 4.64k 44.5ms 3.3% 9.60μs 435KiB 1.7% 96.0B
~rhs!~ 4.64k 18.2ms 1.3% 3.92μs 1.85MiB 7.5% 419B
MPI interface flux 4.64k 15.9ms 1.2% 3.43μs 1.77MiB 7.2% 400B
Jacobian 4.64k 15.1ms 1.1% 3.26μs 1.27MiB 5.2% 288B
prolong2mpiinterfaces 4.64k 13.1ms 1.0% 2.82μs 0.99MiB 4.0% 224B
start MPI receive 4.64k 1.09ms 0.1% 235ns 290KiB 1.2% 64.0B
finish MPI send 4.64k 982μs 0.1% 212ns 652KiB 2.6% 144B
prolong2boundaries 4.64k 284μs 0.0% 61.3ns 0.00B 0.0% 0.00B
prolong2mpimortars 4.64k 238μs 0.0% 51.2ns 0.00B 0.0% 0.00B
prolong2mortars 4.64k 217μs 0.0% 46.8ns 0.00B 0.0% 0.00B
MPI mortar flux 4.64k 196μs 0.0% 42.4ns 0.00B 0.0% 0.00B
mortar flux 4.64k 140μs 0.0% 30.1ns 0.00B 0.0% 0.00B
boundary flux 4.64k 118μs 0.0% 25.4ns 0.00B 0.0% 0.00B
source terms 4.64k 112μs 0.0% 24.0ns 0.00B 0.0% 0.00B
analyze solution 11 57.7ms 4.3% 5.25ms 11.1MiB 45.3% 1.01MiB
────────────────────────────────────────────────────────────────────────────────────
julia> sol = solve(ode, RDPK3SpFSAL35(thread=OrdinaryDiffEq.True()), abstol=1.0e-4, reltol=1.0e-4, save_everystep=false, callback=callbacks); summary_callback()
────────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 2.17s / 55.6% 31.0MiB / 79.2%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────────────────────
rhs! 4.64k 1.11s 92.2% 240μs 13.5MiB 54.7% 2.97KiB
volume integral 4.64k 662ms 54.9% 143μs 1.49MiB 6.0% 336B
interface flux 4.64k 135ms 11.2% 29.1μs 1.77MiB 7.2% 400B
finish MPI receive 4.64k 57.1ms 4.7% 12.3μs 580KiB 2.3% 128B
reset ∂u/∂t 4.64k 56.8ms 4.7% 12.2μs 0.00B 0.0% 0.00B
prolong2interfaces 4.64k 56.7ms 4.7% 12.2μs 1.06MiB 4.3% 240B
surface integral 4.64k 48.3ms 4.0% 10.4μs 1.34MiB 5.5% 304B
start MPI send 4.64k 32.3ms 2.7% 6.97μs 435KiB 1.7% 96.0B
~rhs!~ 4.64k 17.5ms 1.5% 3.78μs 1.85MiB 7.5% 419B
MPI interface flux 4.64k 15.7ms 1.3% 3.38μs 1.77MiB 7.2% 400B
Jacobian 4.64k 15.2ms 1.3% 3.28μs 1.27MiB 5.2% 288B
prolong2mpiinterfaces 4.64k 11.2ms 0.9% 2.42μs 0.99MiB 4.0% 224B
finish MPI send 4.64k 1.35ms 0.1% 292ns 652KiB 2.6% 144B
start MPI receive 4.64k 919μs 0.1% 198ns 290KiB 1.2% 64.0B
prolong2boundaries 4.64k 226μs 0.0% 48.7ns 0.00B 0.0% 0.00B
prolong2mpimortars 4.64k 215μs 0.0% 46.4ns 0.00B 0.0% 0.00B
prolong2mortars 4.64k 203μs 0.0% 43.7ns 0.00B 0.0% 0.00B
MPI mortar flux 4.64k 199μs 0.0% 42.9ns 0.00B 0.0% 0.00B
mortar flux 4.64k 137μs 0.0% 29.6ns 0.00B 0.0% 0.00B
source terms 4.64k 111μs 0.0% 24.0ns 0.00B 0.0% 0.00B
boundary flux 4.64k 106μs 0.0% 22.8ns 0.00B 0.0% 0.00B
analyze solution 11 93.9ms 7.8% 8.54ms 11.1MiB 45.3% 1.01MiB
──────────────────────────────────────────────────────────────────────────────────── Looks okay, doesn't it? In particular, there seems to be an effect of using multi-threading also for the RK solver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before, great work, and thanks for pushing this! I have left a few remarks and suggestions; please ping me if anything is unclear.
Yes, it looks ok. Although it's not clear yet what the performance impact really is (hard to tell with such a small problem size) and whether it makes more sense to use more threads or more ranks. Then again, this is often hardware dependent... |
Positive: Now everything that was "weirdly" broken passes. Negative: macOS tests are still hanging... |
Yeah... but I can't really debug the macOS part (since I don't have a Mac) |
Could you see which test is the issue? If yes, we can try disabling it to check whether it's a singleton issue or a general problem. Although we should try to find the root cause either way. |
Looks like it's |
@andrewwinters5000 It would be great if you could try to reproduce this issue. |
I got rid of the global |
MPI tests pass 🥳 sol = solve(ode, alg; kwargs..., internalnorm=ode_norm, unstable_check=ode_unstable_check) We should probably make it easier to use all this but it seems to be working. |
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side - great work!
This is a rough draft of a possible MPI array type. A lot of TODO notes are left in the draft at the moment.
Partially implemented in a reduced version (only
ode_norm
andode_unstable_check
) in #1113. We will use this reduced version for now and see how it works in the wild.TODO:
sum
) - to docstring or test whether we could also just use localmapreduce
and parallelode_norm
?mpi_parallel
andmpi_isparallel
)Closes #329; closes #339