Skip to content

Insert locations at the end of a FOR loop.#140

Merged
vext01 merged 1 commit intoykjit:mainfrom
ltratt:location_at_end_of_for
Mar 13, 2026
Merged

Insert locations at the end of a FOR loop.#140
vext01 merged 1 commit intoykjit:mainfrom
ltratt:location_at_end_of_for

Conversation

@ltratt
Copy link
Contributor

@ltratt ltratt commented Mar 13, 2026

The intuition here is that if we've gone around one iteration of a for loop, we're more likely to close a "full, proper" iteration, whereas if we have the location on entry, we're likely to hit the "nothing to do" case. This is -- from memory! -- the same thing that PyPy does.

There is a trade-off here: it means every time we execute a loop we do one iteration in the interpreter. Probably because of that, benchmarks are mixed, but IMHO show a small improvement. b15:

storage/lua/1000        4.95% faster
richards/lua/100        4.27% faster
sieve/lua/3000          2.12% faster
cd/lua/250              1.19% faster
bounce/lua/1500         1.84% slower
knucleotide/lua/        3.97% slower
permute/lua/1000        4.30% slower

b16:

binarytrees/lua/15      4.93% faster
storage/lua/1000        3.98% faster
queens/lua/1000         3.44% faster
cd/lua/250              2.24% faster
spectralnorm/lua/1000   1.54% faster
json/lua/100            3.04% slower
knucleotide/lua/        5.87% slower
HashIds/lua/6000        6.13% slower
nbody/lua/250000        13.60% slower

nbody is very nondeterministic so it can be hard to draw conclusions; that said, it does seem on b16 to have meaningfully slowed down. On b15, the slowdown is within the margin of noise, though on the edge of it: I think it may well have slowed down, but by perhaps 5-8%. So whether this holds on other machines is a bit unclear.

The intuition here is that if we've gone around one iteration of a for
loop, we're more likely to close a "full, proper" iteration, whereas if
we have the location on entry, we're likely to hit the "nothing to do"
case. This is -- from memory! -- the same thing that PyPy does.

There is a trade-off here: it means every time we execute a loop we do
one iteration in the interpreter. Probably because of that, benchmarks
are mixed, but IMHO show a small improvement. b15:

```
storage/lua/1000        4.95% faster
richards/lua/100        4.27% faster
sieve/lua/3000          2.12% faster
cd/lua/250              1.19% faster
bounce/lua/1500         1.84% slower
knucleotide/lua/        3.97% slower
permute/lua/1000        4.30% slower
```

b16:

```
binarytrees/lua/15      4.93% faster
storage/lua/1000        3.98% faster
queens/lua/1000         3.44% faster
cd/lua/250              2.24% faster
spectralnorm/lua/1000   1.54% faster
json/lua/100            3.04% slower
knucleotide/lua/        5.87% slower
HashIds/lua/6000        6.13% slower
nbody/lua/250000        13.60% slower
```

nbody is very nondeterministic so it can be hard to draw conclusions;
that said, it does seem on b16 to have meaningfully slowed down. On b15,
the slowdown is within the margin of noise, though on the edge of it: I
think it may well have slowed down, but by perhaps 5-8%. So whether this
holds on other machines is a bit unclear.
@vext01
Copy link
Contributor

vext01 commented Mar 13, 2026

Happy to follow your lead on this one.

@vext01 vext01 added this pull request to the merge queue Mar 13, 2026
Merged via the queue into ykjit:main with commit 8d5b8f8 Mar 13, 2026
2 checks passed
@ltratt ltratt deleted the location_at_end_of_for branch March 13, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants