Strange behaviour of inlining #1969

Sahnvour · 2019-02-16T18:17:49Z

I was trying various example usages of https://github.com/Sahnvour/zig-benchmark and found something that looks odd.

Using this code https://gist.github.com/Sahnvour/829e3d7930964f976ce97de60a644a27
and running with zig run test.zig --release-fast we can see dramatically different results for insert sequential, depending on wether the other 2 benchmarks (initially commented out) are executed or not.

(These benchmarked functions are not really a good way to measure the performance of HashMap but that's not the point here).

Example output from my machine, first only with insert sequential:

benchmarkArgs("insert sequential", mapInsertSequential, maps);
// benchmarkArgs("insert sequential evens", mapInsertSequentialEvens, maps);
// benchmarkArgs("insert sequential odds", mapInsertSequentialOdds, maps);

$ zig run test.zig --release-fast
insert sequential <HashMap(u32,u32,getAutoHashFn(u32)_hash,getAutoEqlFn(u32)_eql)>: avg 1.429ms (350 iterations)

Then with the other 2 benchmarks:

benchmarkArgs("insert sequential", mapInsertSequential, maps);
benchmarkArgs("insert sequential evens", mapInsertSequentialEvens, maps);
benchmarkArgs("insert sequential odds", mapInsertSequentialOdds, maps);

$ zig run test.zig --release-fast
insert sequential <HashMap(u32,u32,getAutoHashFn(u32)_hash,getAutoEqlFn(u32)_eql)>: avg 2.456ms (204 iterations)
insert sequential evens <HashMap(u32,u32,getAutoHashFn(u32)_hash,getAutoEqlFn(u32)_eql)>: avg 3.493ms (144 iterations)
insert sequential odds <HashMap(u32,u32,getAutoHashFn(u32)_hash,getAutoEqlFn(u32)_eql)>: avg 3.191ms (157 iterations)

As you can see, the average time for insert sequential greatly increased, even though we didn't modify it in any way. The only difference is that more code is executed in main.

I looked quickly at the assembly, and the main difference I spotted is that in the first case, HashMap.put() is inlined into mapInsertSequential while in the second it is not.
Note: my library forces the benchmarked functions to not be inlined when called. So as I understand, the outer scoped (here main) shouldn't have any influence on the inlining decisions taken inside them ?

The text was updated successfully, but these errors were encountered:

rohlem · 2019-02-16T20:38:18Z

I don't have much inside knowledge, but here's an explanation I can come up with:
A function with only a single call site can be inlined and its "detached" implementation discarded, always resulting in a net size reduction. Following that logic, inlining a function only called in one place makes perfect sense, and I would assume all/most other functions (map.init, map.clear, etc.) to be inlined as well.
With multiple call sites, it's the optimizer's decision to balance individual performance in respect to global code bloat. I don't think Zig does any function analysis in this regard "itself" (though I really have no info on this); I assume this decision is solely up to LLVM optimization passes (disregarding specific requests via @inlineCall/@noInlineCall).
Under this premise, while the performance difference is noticeable, to me it doesn't appear to be a "bug" on Zig's part, but simply suboptimal behaviour on the part of LLVM's optimizations.

That leaves several options:

Realizing "hot code" like this is a bottleneck in an application, it's a good idea to try manually forcing inlining via @inlineCall.
The generated LLVM IR could be analyzed and altered by hand, to see if we (by which I mean someone more informed/competent in this than present-me) can
- generate more optimizer-friendly IR; this might be a very involved process, where improving one scenario could degrade performance in another.
- trace this behaviour back to a past change in LLVM, to file it as an upstream bug and make LLVM smarter about these scenarios.

All that being said, there is probably room to allow more fine-grained tuning from within Zig. --release-fast (in contrast to --release-small) should already prioritize speed over code size (although past a certain application size, smaller code is faster), but maybe there could be a builtin for scope-level flagging, similar to @setFloatMode, to specify "in this loop, inline N levels of function calls".

andrewrk modified the milestones: 0.4.0, 0.5.0 Feb 18, 2019

andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019

andrewrk modified the milestones: 0.6.0, 0.7.0 Dec 31, 2019

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 10, 2020

andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020

SpexGuy added the use case Describes a real use case that is difficult or impossible, but does not propose a solution. label Mar 20, 2021

andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour of inlining #1969

Strange behaviour of inlining #1969

Sahnvour commented Feb 16, 2019 •

edited

rohlem commented Feb 16, 2019 •

edited

Strange behaviour of inlining #1969

Strange behaviour of inlining #1969

Comments

Sahnvour commented Feb 16, 2019 • edited

rohlem commented Feb 16, 2019 • edited

Sahnvour commented Feb 16, 2019 •

edited

rohlem commented Feb 16, 2019 •

edited