Skip to content

Conversation

@ltratt
Copy link
Contributor

@ltratt ltratt commented May 11, 2025

These functions contain loops and (1) tend to appear in tight loops (2) are largely or wholly consistent from one iteration to the next. On our benchmark suite, we see the following speed-ups (as percentages, where > 0 is better):

BigLoop 0.22
Bounce 6.61
CD 4.84
DeltaBlue 2.75
HashIds 9.81
Havlak 2.72
Heightmap 0.48
Json 3.85
List -2.08
LuLPeg
Mandelbrot -0.74
NBody -0.73
Permute -6.45
Queens 8.74
Richards 8.01
Sieve 1.35
Storage 1.02
Towers 2.87

Pretty much only bigloop isn't impacted (quelle surprise). 4 benchmarks get slower, but only Permute noticeably so (List a little bit, but less so). 5 benchmarks get much faster (5-10%), 3-4 get fairly faster (2.5-5%). Overall, this seems meaningfully better overall to me.

However! LuLPeg borks with:

LuLPeg/lulpeg.lua:979: bad argument #2 to 's_byte' (number expected, got string)

This commit shouldn't be introducing a new bug (per se), but it probably is highlighting an existing bug that's been previously hidden.

@ptersilie Do you think you can have a look to see what the cause of this might be? Nothing obvious has jumped out at me TBH. The "critical" thing seems to be the annotation to luaD_precall -- remove that and the bug goes away. However, that might not tell us very much because the problem is probably somewhere in (or beneath) precallC (but I might be wrong).

These functions contain loops and (1) tend to appear in tight loops (2)
are largely or wholly consistent from one iteration to the next. On
our benchmark suite, we see the following speed-ups (as percentages,
where > 0 is better):

```
BigLoop     0.22
Bounce      6.61
CD          4.84
DeltaBlue   2.75
HashIds     9.81
Havlak      2.72
Heightmap   0.48
Json        3.85
List        -2.08
LuLPeg
Mandelbrot  -0.74
NBody       -0.73
Permute     -6.45
Queens      8.74
Richards    8.01
Sieve       1.35
Storage     1.02
Towers      2.87
```

Pretty much only bigloop isn't impacted (quelle surprise). 4 benchmarks
get slower, but only Permute noticeably so (List a little bit, but less
so). 5 benchmarks get much faster (5-10%), 3-4 get fairly faster
(2.5-5%). Overall, this seems meaningfully better overall to me.

However! LuLPeg borks with:

```
LuLPeg/lulpeg.lua:979: bad argument ykjit#2 to 's_byte' (number expected, got string)
```

This commit shouldn't be introducing a new bug (per se), but it probably
is highlighting an existing bug that's been previously hidden.
@ltratt ltratt marked this pull request as ready for review May 14, 2025 14:29
@ltratt
Copy link
Contributor Author

ltratt commented May 14, 2025

Assuming we agree (and we should at least wait until it merges!) that ykjit/yk#1747 is the fix holding up this PR, I think this PR is now ready for review.

@vext01
Copy link
Contributor

vext01 commented May 14, 2025

What's the nature of (e.g.) Permute's slowdown? Does this change cause a "trace too long" (or similar) for that benchmark?

@ltratt
Copy link
Contributor Author

ltratt commented May 14, 2025

I no longer remember (if I ever knew). But I think I would put this down as a "we don't expect every optimisation to benefit every program" thing.

@ptersilie ptersilie added this pull request to the merge queue May 14, 2025
Merged via the queue into ykjit:main with commit e271af9 May 14, 2025
2 checks passed
@ltratt ltratt deleted the unrolls branch May 30, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants