Insert locations at the end of a FOR loop. by ltratt · Pull Request #140 · ykjit/yklua

ltratt · 2026-03-13T09:13:22Z

The intuition here is that if we've gone around one iteration of a for loop, we're more likely to close a "full, proper" iteration, whereas if we have the location on entry, we're likely to hit the "nothing to do" case. This is -- from memory! -- the same thing that PyPy does.

There is a trade-off here: it means every time we execute a loop we do one iteration in the interpreter. Probably because of that, benchmarks are mixed, but IMHO show a small improvement. b15:

storage/lua/1000        4.95% faster
richards/lua/100        4.27% faster
sieve/lua/3000          2.12% faster
cd/lua/250              1.19% faster
bounce/lua/1500         1.84% slower
knucleotide/lua/        3.97% slower
permute/lua/1000        4.30% slower

b16:

binarytrees/lua/15      4.93% faster
storage/lua/1000        3.98% faster
queens/lua/1000         3.44% faster
cd/lua/250              2.24% faster
spectralnorm/lua/1000   1.54% faster
json/lua/100            3.04% slower
knucleotide/lua/        5.87% slower
HashIds/lua/6000        6.13% slower
nbody/lua/250000        13.60% slower

nbody is very nondeterministic so it can be hard to draw conclusions; that said, it does seem on b16 to have meaningfully slowed down. On b15, the slowdown is within the margin of noise, though on the edge of it: I think it may well have slowed down, but by perhaps 5-8%. So whether this holds on other machines is a bit unclear.

The intuition here is that if we've gone around one iteration of a for loop, we're more likely to close a "full, proper" iteration, whereas if we have the location on entry, we're likely to hit the "nothing to do" case. This is -- from memory! -- the same thing that PyPy does. There is a trade-off here: it means every time we execute a loop we do one iteration in the interpreter. Probably because of that, benchmarks are mixed, but IMHO show a small improvement. b15: ``` storage/lua/1000 4.95% faster richards/lua/100 4.27% faster sieve/lua/3000 2.12% faster cd/lua/250 1.19% faster bounce/lua/1500 1.84% slower knucleotide/lua/ 3.97% slower permute/lua/1000 4.30% slower ``` b16: ``` binarytrees/lua/15 4.93% faster storage/lua/1000 3.98% faster queens/lua/1000 3.44% faster cd/lua/250 2.24% faster spectralnorm/lua/1000 1.54% faster json/lua/100 3.04% slower knucleotide/lua/ 5.87% slower HashIds/lua/6000 6.13% slower nbody/lua/250000 13.60% slower ``` nbody is very nondeterministic so it can be hard to draw conclusions; that said, it does seem on b16 to have meaningfully slowed down. On b15, the slowdown is within the margin of noise, though on the edge of it: I think it may well have slowed down, but by perhaps 5-8%. So whether this holds on other machines is a bit unclear.

vext01 · 2026-03-13T09:18:19Z

Happy to follow your lead on this one.

ltratt assigned vext01 Mar 13, 2026

vext01 added this pull request to the merge queue Mar 13, 2026

Merged via the queue into ykjit:main with commit 8d5b8f8 Mar 13, 2026
2 checks passed

ltratt deleted the location_at_end_of_for branch March 13, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert locations at the end of a FOR loop.#140

Insert locations at the end of a FOR loop.#140
vext01 merged 1 commit intoykjit:mainfrom
ltratt:location_at_end_of_for

ltratt commented Mar 13, 2026

Uh oh!

vext01 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltratt commented Mar 13, 2026

Uh oh!

vext01 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants