Zig Version
0.14.0-dev.2540+f857bf72e
Steps to Reproduce and Observed Behavior
Compile memset from compiler-rt:
zig build-lib -OReleaseSmall -fno-builtin lib/compiler_rt/memset.zig -target x86-64_linux -mcpu znver1
This target generates a 1894 byte memset function whereas -OReleaseFast generates a 209 byte function. The same issue is also present for other cpus as well here is the sample of x86_64 architectures that I checked:
| CPU |
ReleaseSmall |
ReleaseFast |
znver1 |
1894 |
209 |
znver2 |
1894 |
209 |
znver3 |
1894 |
209 |
znver4 |
280 |
209 |
znver5 |
280 |
209 |
raptorlake |
1838 |
209 |
alderlake |
1838 |
209 |
rocketlake |
167 |
193 |
tigerlake |
167 |
193 |
skylake |
1894 |
233 |
The above table only has rocketlake and tigerlake producing smaller code in ReleaseSmall.
Expected Behavior
ReleaseSmall should generate less code than ReleaseFast (or at the very least be close).
I'm not sure if this is a strictly LLVM problem (and hence needs to be kicked upstream) or if Zig should be telling LLVM to do something a bit differently, but the code sizes for znver{1,2,3} and {raptor,alder,sky}lake looks worrying.
Zig Version
0.14.0-dev.2540+f857bf72e
Steps to Reproduce and Observed Behavior
Compile
memsetfrom compiler-rt:This target generates a 1894 byte memset function whereas
-OReleaseFastgenerates a 209 byte function. The same issue is also present for other cpus as well here is the sample ofx86_64architectures that I checked:ReleaseSmallReleaseFastznver1znver2znver3znver4znver5raptorlakealderlakerocketlaketigerlakeskylakeThe above table only has
rocketlakeandtigerlakeproducing smaller code inReleaseSmall.Expected Behavior
ReleaseSmallshould generate less code thanReleaseFast(or at the very least be close).I'm not sure if this is a strictly LLVM problem (and hence needs to be kicked upstream) or if Zig should be telling LLVM to do something a bit differently, but the code sizes for
znver{1,2,3}and{raptor,alder,sky}lakelooks worrying.