You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using this code https://gist.github.com/Sahnvour/829e3d7930964f976ce97de60a644a27
and running with zig run test.zig --release-fast we can see dramatically different results for insert sequential, depending on wether the other 2 benchmarks (initially commented out) are executed or not.
(These benchmarked functions are not really a good way to measure the performance of HashMap but that's not the point here).
Example output from my machine, first only with insert sequential:
As you can see, the average time for insert sequential greatly increased, even though we didn't modify it in any way. The only difference is that more code is executed in main.
I looked quickly at the assembly, and the main difference I spotted is that in the first case, HashMap.put() is inlined into mapInsertSequential while in the second it is not.
Note: my library forces the benchmarked functions to not be inlined when called. So as I understand, the outer scoped (here main) shouldn't have any influence on the inlining decisions taken inside them ?
The text was updated successfully, but these errors were encountered:
I don't have much inside knowledge, but here's an explanation I can come up with:
A function with only a single call site can be inlined and its "detached" implementation discarded, always resulting in a net size reduction. Following that logic, inlining a function only called in one place makes perfect sense, and I would assume all/most other functions (map.init, map.clear, etc.) to be inlined as well.
With multiple call sites, it's the optimizer's decision to balance individual performance in respect to global code bloat. I don't think Zig does any function analysis in this regard "itself" (though I really have no info on this); I assume this decision is solely up to LLVM optimization passes (disregarding specific requests via @inlineCall/@noInlineCall).
Under this premise, while the performance difference is noticeable, to me it doesn't appear to be a "bug" on Zig's part, but simply suboptimal behaviour on the part of LLVM's optimizations.
That leaves several options:
Realizing "hot code" like this is a bottleneck in an application, it's a good idea to try manually forcing inlining via @inlineCall.
The generated LLVM IR could be analyzed and altered by hand, to see if we (by which I mean someone more informed/competent in this than present-me) can
generate more optimizer-friendly IR; this might be a very involved process, where improving one scenario could degrade performance in another.
trace this behaviour back to a past change in LLVM, to file it as an upstream bug and make LLVM smarter about these scenarios.
All that being said, there is probably room to allow more fine-grained tuning from within Zig. --release-fast (in contrast to --release-small) should already prioritize speed over code size (although past a certain application size, smaller code is faster), but maybe there could be a builtin for scope-level flagging, similar to @setFloatMode, to specify "in this loop, inline N levels of function calls".
I was trying various example usages of https://github.com/Sahnvour/zig-benchmark and found something that looks odd.
Using this code https://gist.github.com/Sahnvour/829e3d7930964f976ce97de60a644a27
and running with
zig run test.zig --release-fast
we can see dramatically different results forinsert sequential
, depending on wether the other 2 benchmarks (initially commented out) are executed or not.(These benchmarked functions are not really a good way to measure the performance of HashMap but that's not the point here).
Example output from my machine, first only with
insert sequential
:Then with the other 2 benchmarks:
As you can see, the average time for
insert sequential
greatly increased, even though we didn't modify it in any way. The only difference is that more code is executed in main.I looked quickly at the assembly, and the main difference I spotted is that in the first case,
HashMap.put()
is inlined intomapInsertSequential
while in the second it is not.Note: my library forces the benchmarked functions to not be inlined when called. So as I understand, the outer scoped (here
main
) shouldn't have any influence on the inlining decisions taken inside them ?The text was updated successfully, but these errors were encountered: