Optimized `stolarsky_mean` by MarcoArtiano · Pull Request #2274 · trixi-framework/Trixi.jl

MarcoArtiano · 2025-02-10T14:35:48Z

The stolarsky mean will come in handy in Trixi Atmo. Here a faster version:

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp(gamma*log(y))
               expx = exp(gamma*log(x))
               return (gamma - 1) / gamma * (expy - expx) /
                      (expy/y - expx/x)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.982 ns … 61.638 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.163 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.301 ns ±  0.665 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▅██▅             ▁▁                                        ▂
  █████▇▆▅▄▁▃▃▄▄▄▁▄▇█████▇▇▆▆▄▆▅▅▄▅▄▅▅▇▇▇▆▆▄▅▅▅▄▃▅▆▄▅▃▃▄▆▆█▇▇ █
  46 ns        Histogram: log(frequency) by time      49.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.342 ns … 630.474 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.566 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.007 ns ±   6.343 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█    ▂▂▄                                                    ▁
  ██▇▇▇█████▇▇▇█▇▄▅▄▁▃▁▄▃▁▄▄▄▁▄▁▃▃▁▁▃▄▃▃▄▄▄▃▄▅▅▅▅▅▅▆▆▆▆▅▅▆▆▅▅▅ █
  19.3 ns       Histogram: log(frequency) by time      30.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Simulation of EC Polytropic Euler with the previous stolarsky mean:

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.98s /  99.3%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.93s   98.1%  1.36ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.74s   88.4%  1.22ms     0.00B    0.0%    0.00B
  interface flux           1.42k    163ms    8.3%   115μs     0.00B    0.0%    0.00B
  surface integral         1.42k   15.3ms    0.8%  10.8μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   8.10ms    0.4%  5.70μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.72ms    0.2%  2.62μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.00ms    0.1%   705ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   46.3μs    0.0%  32.6ns     0.00B    0.0%    0.00B
  source terms             1.42k   45.5μs    0.0%  32.0ns     0.00B    0.0%    0.00B
calculate dt                 285   26.2ms    1.3%  92.0μs     0.00B    0.0%    0.00B
analyze solution               4   7.33ms    0.4%  1.83ms    314KiB   14.5%  78.5KiB
I/O                            5   4.19ms    0.2%   838μs   1.81MiB   85.3%   370KiB
  save solution                4   3.82ms    0.2%   956μs   1.80MiB   84.9%   460KiB
  ~I/O~                        5    365μs    0.0%  73.1μs   8.83KiB    0.4%  1.77KiB
  get element variables        4    730ns    0.0%   182ns     0.00B    0.0%    0.00B
  save mesh                    4    608ns    0.0%   152ns     0.00B    0.0%    0.00B
  get node variables           4   95.0ns    0.0%  23.8ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Results for the optimized version

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              1.62s /  99.1%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    1.57s   97.7%  1.11ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    1.41s   87.8%   994μs     0.00B    0.0%    0.00B
  interface flux           1.42k    131ms    8.2%  92.4μs     0.00B    0.0%    0.00B
  surface integral         1.42k   14.9ms    0.9%  10.5μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   7.29ms    0.5%  5.13μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   3.83ms    0.2%  2.70μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    956μs    0.1%   673ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   61.6μs    0.0%  43.3ns     0.00B    0.0%    0.00B
  source terms             1.42k   24.5μs    0.0%  17.3ns     0.00B    0.0%    0.00B
calculate dt                 285   26.0ms    1.6%  91.2μs     0.00B    0.0%    0.00B
analyze solution               4   6.36ms    0.4%  1.59ms    314KiB   14.5%  78.6KiB
I/O                            5   5.35ms    0.3%  1.07ms   1.81MiB   85.3%   370KiB
  save solution                4   5.00ms    0.3%  1.25ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5    352μs    0.0%  70.5μs   8.83KiB    0.4%  1.77KiB
  save mesh                    4    755ns    0.0%   189ns     0.00B    0.0%    0.00B
  get element variables        4    477ns    0.0%   119ns     0.00B    0.0%    0.00B
  get node variables           4   86.0ns    0.0%  21.5ns     0.00B    0.0%    0.00B
────────────────────────────────────────────────────────────────────────────────────

Thus on my machine, there's an 18% improvement.

github-actions · 2025-02-10T14:36:02Z

codecov · 2025-02-10T14:59:25Z

Codecov Report

❌ Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.80%. Comparing base (f4bbcd9) to head (24b441c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/auxiliary/math.jl	75.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2274      +/-   ##
==========================================
- Coverage   96.80%   96.80%   -0.00%     
==========================================
  Files         528      528              
  Lines       42655    42660       +5     
==========================================
+ Hits        41292    41295       +3     
- Misses       1363     1365       +2

Flag	Coverage Δ
unittests	`96.80% <75.00%> (-<0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewwinters5000

This is an interesting (and ingenious way) to rewrite the expression and possibly improve performance. Would it be worthwhile to also benchmark it on Roci (or some other machine) just to see its imfluence?

src/auxiliary/math.jl

Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se>

ranocha

Thanks!

src/auxiliary/math.jl

MarcoArtiano · 2025-02-11T10:11:56Z

Hendrik made me realize that for integers the exp(log(...)) is slower. I tried the trick to avoid division, but that doesn't change anything actually. I added a specialization for integers and the results are the following:
Functions

julia> @inline function stolarsky_mean(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               return (gamma - 1) / gamma * (y^gamma - x^gamma) /
                      (y^(gamma - 1) - x^(gamma - 1))
           end
       end
stolarsky_mean (generic function with 1 method)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expx = x^(gamma-1)
               expy = y^(gamma-1)
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_2 (generic function with 1 method)

julia> @inline function stolarsky_mean_3(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_3 (generic function with 1 method)

julia> @inline function stolarsky_mean_4(x::RealT, y::RealT, gamma::RealT) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if isinteger(gamma)
               expy = y^(gamma-1)
               expx = x^(gamma-1)    
               else
               expy = exp((gamma-1)*log(y))
               expx = exp((gamma-1)*log(x))
               end
               return (gamma - 1) / gamma * (expy*y - expx*x) /
                      (expy - expx)
           end
       end
stolarsky_mean_4 (generic function with 1 method)

For real numbers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
 Range (min … max):  45.981 ns … 56.275 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.167 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   46.241 ns ±  0.440 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▄▆██▇▄                      ▁▁▂▁                          ▂
  ▆███████▇▅▅▄▁▁▃▁▃▁▁▃▁▃▃▁▁▁▁▄▅▇████▇▇▆▆▆▄▅▅▅▅▄▅▄▃▄▄▁▄▃▄▃▁▃▃▄ █
  46 ns        Histogram: log(frequency) by time      48.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  23.530 ns … 42.782 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     23.656 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.853 ns ±  0.654 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅█▇▂                                 ▁▃▁                   ▁
  ▆████▆▅▅▅▄▄▄▅▄▄▅▄▅▄▅▄▅▄▃▅▇███▇▇▆▆▆▅▅▆▆████▇▆▅▅▄▃▅▅▅▄▄▄▄▅▃▅▅ █
  23.5 ns      Histogram: log(frequency) by time      26.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.433 ns … 25.629 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.512 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.601 ns ±  0.378 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▇█▅                                     ▁▁                 ▂
  █████▄▅▅▅▅▅▅▅▅▃▄▅▄▅▄▃▄▆▄▅▅▄▄▅▅▇▇███▆▇▆▆▆▇███▇▆▅▅▄▄▃▁▄▅▄▃▆▅▄ █
  19.4 ns      Histogram: log(frequency) by time      21.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.665 ns … 34.485 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.760 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.803 ns ±  0.360 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂█▇                                                        
  ▂▄████▃▂▂▂▁▂▁▁▁▁▁▁▂▂▁▂▂▁▁▁▁▁▁▂▁▁▁▂▁▁▁▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▂
  19.7 ns         Histogram: frequency by time        21.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

For integers

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample.
 Range (min … max):  16.133 ns … 30.958 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     17.224 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.084 ns ±  0.718 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▁  ██       ▃▄   ▄▁  ██ ▁ ▂   ▂▂   ▂   ▅▆   ▃   ▂▃         ▃
  ██▁▁███▅▄▆▃▃▁██▄▄▁██▅█████▇██▇▆██▆▄▅█▇▅▆██▇▇███▆▆██▆▅▅▇▅▅▅█ █
  16.1 ns      Histogram: log(frequency) by time        19 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.857 ns … 22.603 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.162 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.107 ns ±  0.444 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁        ▁█▄▄  ▂   ▁█▄  ▄▃▂  ▁▆   ▂▃         ▁▁  ▂
  ▇▆▁▁▃▆▃▁▁▁█▄▄▆▅▇▄▃▅████▆▇█▆▄▅███▆▆███▆▇██▅▅▄███▇▆▆▇▆▆▅▇██▇ █
  7.86 ns      Histogram: log(frequency) by time     10.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 997 evaluations per sample.
 Range (min … max):  19.421 ns … 25.844 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.508 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.540 ns ±  0.250 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▅█▁                                                       
  ▂▃▇███▄▂▂▂▂▁▂▁▁▁▂▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▂▂▁▁▂▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
  19.4 ns         Histogram: frequency by time        20.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  7.203 ns … 30.064 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.531 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.757 ns ±  0.461 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▂▁     ▁▁       █▇   ▂   █▆ ▁▂▁   ▅   ▂          ▂
  ▄▁▁▃▁▁▄▁▇████▄▃▁▄██▇▅▆█▅▄▅██▇▆▆█▅▆▅██▆███▆▅▅██▆▆█▇▆▅▆▆▇▆▇█ █
  7.2 ns       Histogram: log(frequency) by time       10 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So, basically just by preallocating the power functions a 50% speed up is gained. The exp(log(...)) is a small improvement compared to that. Hendrik made me notice that actually for real numbers julia is exactly calling that exp(log(...)). For some reason I noticed that newer versions of Julia have less noticeable differences between x^a and exp(a*log(x)).

MarcoArtiano · 2025-02-11T10:42:01Z

Results on university machine (Goldstein):

new version

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.07s /  99.5%           2.35MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    2.93s   95.8%  2.06ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    2.65s   86.7%  1.86ms     0.00B    0.0%    0.00B
  interface flux           1.42k    236ms    7.7%   166μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.7ms    0.8%  17.4μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.2ms    0.4%  8.56μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.15ms    0.2%  3.63μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k    863μs    0.0%   607ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   33.6μs    0.0%  23.7ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.3μs    0.0%  21.3ns     0.00B    0.0%    0.00B
calculate dt                 285   76.0ms    2.5%   267μs     0.00B    0.0%    0.00B
I/O                            5   40.0ms    1.3%  8.01ms   1.81MiB   85.2%   370KiB
  save solution                4   33.5ms    1.1%  8.37ms   1.80MiB   84.8%   461KiB
  ~I/O~                        5   6.57ms    0.2%  1.31ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    744ns    0.0%   186ns     0.00B    0.0%    0.00B
  save mesh                    4    605ns    0.0%   151ns     0.00B    0.0%    0.00B
  get node variables           4    371ns    0.0%  92.8ns     0.00B    0.0%    0.00B
analyze solution               4   11.3ms    0.4%  2.83ms    316KiB   14.5%  79.0KiB
────────────────────────────────────────────────────────────────────────────────────

old version:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 2.0  Time steps: 284 (accepted), 284 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────
             Trixi.jl                      Time                    Allocations      
                                  ───────────────────────   ────────────────────────
        Tot / % measured:              3.69s /  99.5%           2.34MiB /  90.4%    

Section                   ncalls     time    %tot     avg     alloc    %tot      avg
────────────────────────────────────────────────────────────────────────────────────
rhs!                       1.42k    3.54s   96.4%  2.49ms   5.14KiB    0.2%    3.70B
  volume integral          1.42k    3.20s   87.2%  2.25ms     0.00B    0.0%    0.00B
  interface flux           1.42k    298ms    8.1%   209μs     0.00B    0.0%    0.00B
  surface integral         1.42k   24.6ms    0.7%  17.3μs     0.00B    0.0%    0.00B
  Jacobian                 1.42k   12.3ms    0.3%  8.63μs     0.00B    0.0%    0.00B
  reset ∂u/∂t              1.42k   5.23ms    0.1%  3.68μs     0.00B    0.0%    0.00B
  ~rhs!~                   1.42k   1.01ms    0.0%   714ns   5.14KiB    0.2%    3.70B
  boundary flux            1.42k   35.8μs    0.0%  25.2ns     0.00B    0.0%    0.00B
  source terms             1.42k   30.8μs    0.0%  21.7ns     0.00B    0.0%    0.00B
calculate dt                 285   75.9ms    2.1%   266μs     0.00B    0.0%    0.00B
I/O                            5   42.0ms    1.1%  8.40ms   1.81MiB   85.3%   370KiB
  save solution                4   35.3ms    1.0%  8.82ms   1.80MiB   84.9%   461KiB
  ~I/O~                        5   6.69ms    0.2%  1.34ms   8.83KiB    0.4%  1.77KiB
  get element variables        4    559ns    0.0%   140ns     0.00B    0.0%    0.00B
  save mesh                    4    426ns    0.0%   106ns     0.00B    0.0%    0.00B
  get node variables           4    222ns    0.0%  55.5ns     0.00B    0.0%    0.00B
analyze solution               4   13.2ms    0.4%  3.30ms    314KiB   14.5%  78.6KiB
────────────────────────────────────────────────────────────────────────────────────

There's still a roughly 17% improvement.

Benchmarks for Goldstein:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 960 evaluations per sample.
 Range (min … max):  87.431 ns …  1.192 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     88.289 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   88.624 ns ± 13.750 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▁▅▅▃▇█▇▅▆▅▅▆▅▄▁                                      
  ▁▂▃▅▆▇██████████████████▅▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  87.4 ns         Histogram: frequency by time        90.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 976 evaluations per sample.
 Range (min … max):  44.864 ns …  1.349 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     45.267 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.460 ns ± 13.072 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▁▅▄▅▅▆▇█▆▄▄▄▃▁                                            
  ▂▃▅██████████████▇▅▄▃▂▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▂▁▁▂▂▂▂ ▄
  44.9 ns         Histogram: frequency by time        47.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.400 ns …  1.062 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.678 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.034 ns ± 10.298 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂█▅                                                          
  ███▇▆▆▄▄▃▃▃▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁▂▂▂ ▃
  37.4 ns         Histogram: frequency by time        44.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.4))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  38.200 ns … 78.343 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     38.464 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.558 ns ±  0.867 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁▂▅██▆▄▃▂▁▂▁                                             
  ▁▂▄▇████████████▇█▇▆▅▅▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  38.2 ns         Histogram: frequency by time        39.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Integers:

julia> @benchmark value = stolarsky_mean($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  24.793 ns … 69.045 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.805 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.874 ns ±  0.858 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▇▆▂                       ▂▃▁▁                            ▂
  ██████▇▇▅▄▃▃▁▄▄▃▁▃▁▁▁▁▃▁▁▁▁▁████▇▁▁▁▁▁▁▁▁▃▁▃▁▁▅▅▁▃▁▄▃▁▄▁▇█▇ █
  24.8 ns      Histogram: log(frequency) by time      25.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.524 ns … 45.145 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.833 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.931 ns ±  0.569 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █             ▂                            ▁                 
  █▇▄▂▂▂▂▂▂▁▁▁▂▁█▅▃▂▂▂▁▂▂▂▂▂▂▂██▅▂▂▂▂▂▂▁▂▂▂▂▂█▄▂▂▂▂▂▂▂▂▂▂▂▁▃▂ ▃
  11.5 ns         Histogram: frequency by time        12.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_3($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 992 evaluations per sample.
 Range (min … max):  37.260 ns …  1.088 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     37.533 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.786 ns ± 10.568 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂██▇████▆▅▃▂                                               
  ▂▅████████████▇▇▅▆▅▅▅▆▆▅▄▅▄▄▄▄▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁ ▄
  37.3 ns         Histogram: frequency by time        38.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_4($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
 Range (min … max):  11.528 ns … 109.493 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.570 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.817 ns ±   1.132 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▂▂▁▁ ▁  ▁          ▆▆▄          ▁          ▁           ▃ ▂
  ███████████████▆▅▆▆▇▆▆▆████▆▅▃▆▅▅▅▁▇█▆▃▄▄▁▃▁▁▄▁██▅▃▃▁▃▁▁▄▁▄█ █
  11.5 ns       Histogram: log(frequency) by time        13 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

src/auxiliary/math.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

MarcoArtiano · 2025-10-08T08:21:52Z

julia> @inline stolarsky_mean_1(x::Real, y::Real, gamma::Real) = stolarsky_mean_1(promote(x, y)..., gamma)
stolarsky_mean_1 (generic function with 2 methods)

julia> @inline function stolarsky_mean_1(x::RealT, y::RealT, gamma::Real) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if gamma isa Integer
                   yg = y^(gamma - 1)
                   xg = x^(gamma - 1)
               else
                   yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers
                   xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers
               end
               return (gamma - 1) * (yg * y - xg * x) / (gamma * (yg - xg))
           end
       end
stolarsky_mean_1 (generic function with 2 methods)

julia> @inline stolarsky_mean_2(x::Real, y::Real, gamma::Real) = stolarsky_mean_1(promote(x, y)..., gamma)
stolarsky_mean_2 (generic function with 2 methods)

julia> @inline function stolarsky_mean_2(x::RealT, y::RealT, gamma::Real) where {RealT <: Real}
           epsilon_f2 = convert(RealT, 1.0e-4)
           f2 = (x * (x - 2 * y) + y * y) / (x * (x + 2 * y) + y * y) # f2 = f^2
           if f2 < epsilon_f2
               # convenience coefficients
               c1 = convert(RealT, 1 / 3) * (gamma - 2)
               c2 = convert(RealT, -1 / 15) * (gamma + 1) * (gamma - 3) * c1
               c3 = convert(RealT, -1 / 21) * (2 * gamma * (gamma - 2) - 9) * c2
               return 0.5f0 * (x + y) * @evalpoly(f2, 1, c1, c2, c3)
           else
               if gamma isa Integer
                   yg = y^(gamma - 1)
                   xg = x^(gamma - 1)
               else
                   yg = exp((gamma - 1) * log(y)) # equivalent to y^gamma but faster for non-integers
                   xg = exp((gamma - 1) * log(x)) # equivalent to x^gamma but faster for non-integers
               end
               return (gamma - 1) / gamma * (yg * y - xg * x) / (yg - xg)
           end
       end
stolarsky_mean_2 (generic function with 2 methods)

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])

BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  22.289 ns … 34.858 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.791 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.045 ns ±  1.610 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄▆██▃▂▂▂ ▁▂▁▂▁ ▁▁▂ ▂▁▂▂▂ ▁▂▁▂ ▂▂▂ ▂▅▅█▃                    ▂
  ▃██████████████▇████████████████████████▅▃▁▆▅▆▇▆▄▃▅▅▅▃▁▅▃▅▅ █
  22.3 ns      Histogram: log(frequency) by time        28 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.0))[])

BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  22.368 ns … 32.981 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.550 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.639 ns ±  0.466 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃██▆▆▃▁                                                    ▂
  ▅███████▃▁▁▁▁▄▆█▆▆▄▃▁▁▁▁▃▄▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▆█▇▇▇▆▄▄▃▃▄▅ █
  22.4 ns      Histogram: log(frequency) by time      25.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_1($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])

BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  22.409 ns … 320.979 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.590 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.693 ns ±   3.015 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ██                                                         
  ▆▄▇██▇█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂ ▃
  22.4 ns         Histogram: frequency by time         25.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark value = stolarsky_mean_2($(Ref(300.1))[], $(Ref(410.7))[], $(Ref(1.7))[])

BenchmarkTools.Trial: 10000 samples with 996 evaluations per sample.
 Range (min … max):  22.430 ns … 318.380 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.560 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.678 ns ±   3.001 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▆█▆▅▂                                                       ▂
  ███████▅▃▁▁▁▄▅█▇▅▅▄▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▃▃▃▃▃▁▄▁▁▁▁▁▁▆▇▇▇▇▅▄▄▅▄▁▅▇ █
  22.4 ns       Histogram: log(frequency) by time      25.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

There are no major differences between these two version. Looking at the median the second one looks slightly faster, so I chose the latter one.

ranocha

Thanks!

* first commit * format * Update src/auxiliary/math.jl Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se> * fix for integers * format * Update src/auxiliary/math.jl Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com> * update stolarsky mean * fix typo --------- Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se> Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

MarcoArtiano added 2 commits February 10, 2025 15:06

first commit

3a2037f

format

5c2b302

MarcoArtiano added the performance We are greedy label Feb 10, 2025

MarcoArtiano requested a review from ranocha February 10, 2025 14:52

andrewwinters5000 requested changes Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

Update src/auxiliary/math.jl

d8c1901

Co-authored-by: Andrew Winters <andrew.ross.winters@liu.se>

ranocha requested changes Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

MarcoArtiano and others added 3 commits February 11, 2025 11:24

fix for integers

8040dbd

format

744a9dd

Merge branch 'main' into opt_stolarsky

008bc8e

ranocha reviewed Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

ranocha reviewed Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Show resolved Hide resolved

ranocha reviewed Feb 11, 2025

View reviewed changes

src/auxiliary/math.jl Outdated Show resolved Hide resolved

MarcoArtiano and others added 3 commits February 11, 2025 14:13

Update src/auxiliary/math.jl

e317708

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

Merge branch 'main' into opt_stolarsky

1688931

update stolarsky mean

8f9e16c

fix typo

24b441c

ranocha approved these changes Oct 9, 2025

View reviewed changes

ranocha enabled auto-merge (squash) October 9, 2025 09:16

ranocha disabled auto-merge October 9, 2025 12:17

ranocha merged commit 3fc9db1 into trixi-framework:main Oct 9, 2025
89 of 93 checks passed

MarcoArtiano mentioned this pull request Oct 9, 2025

Add unit and type tests trixi-framework/TrixiAtmo.jl#117

Merged

DanielDoehring mentioned this pull request Oct 28, 2025

SparseConnectivityTracer ready mean functions #2628

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `stolarsky_mean`#2274

Optimized `stolarsky_mean`#2274
ranocha merged 10 commits intotrixi-framework:mainfrom
MarcoArtiano:opt_stolarsky

MarcoArtiano commented Feb 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 10, 2025

Uh oh!

codecov bot commented Feb 10, 2025 •

edited

Loading

Uh oh!

andrewwinters5000 left a comment

Uh oh!

Uh oh!

ranocha left a comment

Uh oh!

Uh oh!

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

Uh oh!

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcoArtiano commented Oct 8, 2025

Uh oh!

ranocha left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MarcoArtiano commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2025

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

Uh oh!

codecov bot commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andrewwinters5000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcoArtiano commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoArtiano commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcoArtiano commented Oct 8, 2025

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MarcoArtiano commented Feb 10, 2025 •

edited

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading

MarcoArtiano commented Feb 11, 2025 •

edited

Loading

MarcoArtiano commented Feb 11, 2025 •

edited

Loading