Fix stack corruption caused by Fn call primitives #807

ruby0x1 · 2020-09-18T19:31:57Z

Excerpt from @munificent on the nature of the bug:

In runInterpreter, for performance, the vm caches an IP pointing into some bytecode.

All primitives except for .call do not touch Wren's own callstack. They run a little C code and return, so the array of CallFrames, their IPs, and the IP cached inside run() are not affected at all.

While runInterpreter() is running, the IP in the top CallFrame is not updated, so it gets out of sync. This is deliberate, since storing to a field is slow, but it means the value of that field is stale and doesn't represent where execution actually is at that point in time.

To get that field in sync, we use STORE_FRAME(), which stores the local IP value back into the IP field for the top CallFrame. The interpreter is careful to always call STORE_FRAME() before executing any code that pushes a new CallFrame onto the stack.

In particular, if you look around, you'll see that every place the interpreter calls wrenCallFunction() is preceded by a STORE_FRAME(). That is, except for the call to wrenCallFunction() in the call_fn() primitive. That's the bug.

The .call() method on Fn is special because it does modify the Wren call stack and the C code for that primitive directly calls wrenCallFunction(). When that happens, the correct IP for the current function, which lives only in runInterpreter()'s local variable gets discarded and you're left with a stale IP in the CallFrame.

Giving the function call primitives a different method type and having the case for that method type call STORE_FRAME() before invoking the primitive fixes the bug.

Benchmarks from this fix:

@munificent

Excerpt from @munificent on the nature of the bug: In runInterpreter, for performance, the vm caches an IP pointing into some bytecode. All primitives except for `.call`, do not touch Wren's own callstack. They run a little C code and return, so the array of CallFrames, their IPs, and the IP cached inside run() are not affected at all. While runInterpreter() is running, the IP in the top CallFrame is not updated, so it gets out of sync. This is deliberate, since storing to a field is slow, but it means the value of that field is stale and doesn't represent where execution actually is at that point in time. To get that field in sync, we use STORE_FRAME(), which stores the local IP value back into the IP field for the top CallFrame. The interpreter is careful to always call STORE_FRAME() before executing any code that pushes a new CallFrame onto the stack. In particular, if you look around, you'll see that every place the interpreter calls wrenCallFunction() is preceded by a STORE_FRAME(). That is, except for the call to wrenCallFunction() in the call_fn() primitive. That's the bug. The .call() method on Fn is special because it does modify the Wren call stack and the C code for that primitive directly calls wrenCallFunction(). When that happens, the correct IP for the current function, which lives only in runInterpreter()'s local variable gets discarded and you're left with a stale IP in the CallFrame. Giving the function call primitives a different method type and having the case for that method type call STORE_FRAME() before invoking the primitive fixes the bug.

ruby0x1 · 2020-09-18T19:32:49Z

@mhermier feel free to take a look

mhermier · 2020-09-18T22:19:33Z

This is basically a revert of 5f29a72, that fails tests because it forgets to check for runtime errors after calling method->as.primitive(vm, args);.
That change set should provides some of the earlier failing tests that were documented, to avoid future regressions.
With additional tests with all them passing, and performance numbers should be re-advertised, it is very likely to be good enough for merge for me.

ruby0x1 merged commit 86463ac into main Sep 18, 2020

This was referenced Sep 18, 2020

System.print causes different output #801

Closed

Different behaviour of two semantically identical functions. #770

Closed

Stack corruption #761

Closed

ruby0x1 mentioned this pull request Sep 27, 2020

Bizarre array-of-fn execution results #813

Closed

ruby0x1 mentioned this pull request Jun 20, 2021

bug in 'each' iterator #1040

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stack corruption caused by Fn call primitives #807

Fix stack corruption caused by Fn call primitives #807

ruby0x1 commented Sep 18, 2020 •

edited

ruby0x1 commented Sep 18, 2020

mhermier commented Sep 18, 2020

Fix stack corruption caused by Fn call primitives #807

Fix stack corruption caused by Fn call primitives #807

Conversation

ruby0x1 commented Sep 18, 2020 • edited

ruby0x1 commented Sep 18, 2020

mhermier commented Sep 18, 2020

ruby0x1 commented Sep 18, 2020 •

edited