In Julia (as of 1.4) immutable objects containing heap-allocated
objects may not be stack-allocated sometimes⁽¹⁾ and that's why using
something like view can degrade performance substantially.
Restacker.jl provides an API
immutable_object = restack(immutable_object)
to put immutable_object in the stack and avoid this performance
pitfall.
⁽¹⁾ It seems that this tends to happen when such an object crosses non-inlined function call boundaries. See also this in-depth StackOverflow answer by Tim Holy, this and this discussions in Discourse and also this old PR JuliaLang/julia#18632.
Consider simple computation kernel
@noinline function f!(ys, xs)
@inbounds for i in eachindex(ys, xs)
x = xs[i]
if -0.5 < x < 0.5
ys[i] = 2x
end
end
endThis works great with raw-Array but the performance with viewed
array is not great:
julia> using BenchmarkTools
julia> xs = randn(10_000);
julia> @benchmark f!($(zero(xs)), $xs)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.989 μs (0.00% GC)
median time: 2.033 μs (0.00% GC)
mean time: 2.189 μs (0.00% GC)
maximum time: 6.785 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
julia> @benchmark f!($(view(zero(xs), :)), $(view(xs, :)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 47.223 μs (0.00% GC)
median time: 49.227 μs (0.00% GC)
mean time: 51.072 μs (0.00% GC)
maximum time: 133.803 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1It turned out that restacking the destination array ys is enough
to fix the problem in f! above:
using Restacker
@noinline function g!(ys, xs)
ys = restack(ys)
@inbounds for i in eachindex(ys, xs)
x = xs[i]
if -0.5 < x < 0.5
ys[i] = 2x
end
end
endCalling this function on view is now as fast as the raw-Vector
version:
julia> @benchmark g!($(view(zero(xs), :)), $(view(xs, :)))
BenchmarkTools.Trial:
memory estimate: 48 bytes
allocs estimate: 1
--------------
minimum time: 2.021 μs (0.00% GC)
median time: 2.097 μs (0.00% GC)
mean time: 2.265 μs (0.00% GC)
maximum time: 6.663 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10Notice the slight increase in the memory consumption. This is because
restack re-creates the object in the stack.
See more examples in
benchmark/
directory.
Consider an immutable type:
struct ABC{A,B,C}
a::A
b::B
c::C
endThen
abc = restack(abc)is equivalent to
abc = ABC(
restack(abc.a),
restack(abc.b),
restack(abc.c),
)For mutable object like x :: Array, restack return the input as-is.
In general, restack is an identity function such that
restack(x) === xNotice the triple-equality ===. It means that restack does not
change the behavior of the program while it may benefit run-time
performance by sacrificing the memory consumption (slightly) and
compile-time.
(Side notes: There is an even more experimental function
Restacker.unsafe_restack to re-construct mutable struct as well.
This is unsafe because it breaks the identity (===) and breaks the
assumption of the code relying on finalize.)
Under the hood, restack on struct types work by directly invoking
the
new expression.
This skips evaluating user-defined constructors and minimizes the
run-time overhead.