Trapping sporadically segfaults #70

dhil · 2023-12-30T16:06:23Z

The test unhandled.wast occasionally triggers a segfault in the testsuite, e.g.

Caused by:
  process didn't exit successfully: `/home/dhil/projects/wasmfx/wasmtime/target/debug/deps/all-2197ac4314cc5afd` (signal: 11, SIGSEGV: invalid memory reference)

We should investigate the cause of this segfault. It appears related to trap generation and potentially propagation across multiple linked stacks.

The text was updated successfully, but these errors were encountered:

dhil · 2023-12-30T16:06:49Z

Potentially relevant for PR #58.

frank-emrich · 2024-01-04T21:40:16Z

I've investigated this. The problem is that trace_through_wasm in backtrace.rs breaks due to how the trampolines we use to run fibers overwrite some data in VMRuntimeLimits.

The idea of trace_through_wasm is to follow a chain of frame pointers until hitting the stack frame belonging to the original beginning of the execution of wasm code, called trampoline_sp. Concretely, the function assumes that trampoline_sp denotes the stack pointer of the trampoline where execution of wasm code started, and that the currently running wasm frames are located immediately below it.

The value of trampoline_sp is obtained from the last_wasm_entry_sp field of the VMRuntimeLimits. Unfortunately, this means that whenever we start execution of a continuation with resume, the array calling trampoline that we use overwrites this value with the current stack pointer. In other words, the value of last_wasm_entry_sp may be a pointer inside the last fiber where we started executing a function as a continuation, which means that it can be entirely unrelated to whatever fiber has been switched to in the meantime.

Unfortunately, I don't see a simple way of avoiding that the last_wasm_entry_sp field gets overwritten: This is done by existing trampolines that we re-use, and can't easily change. With last_wasm_entry_sp therefore being overwritten, we don't know where the wasm stack frames end once reaching the main stack.

We could probably use the chain of ContinuationObject pointers in the VMContext to detect such situations and construct a backtrace for all frames in the chain of ContinuationObjects, but this backtrace would be incomplete, because we cannot include frames from the main stack: In order to detect when we reach the outermost wasm frame on the main stack, we would need to have access to the "original" value of last_wasm_entry_sp (i.e., when switching from the host into wasm code initially to start execution).

There is some logic in traphandlers.rs to model (what I think is) a chain of stacks used by the existing async implementation, but it's probably a bad idea to try to use that for our purposes (in particular, we would have to keep the parent pointers in yet another chain of stacks up to date when switching stacks).

dhil · 2024-01-05T08:44:45Z

Thanks for diagnosing this issue! I do recall thinking about trace_through_wasm for linked stacks in the past.

It reads to me that you suggest we implement some bespoke backtracing mechanism rather than trying to shoehorn or piggyback on the existing infrastructure. I think I agree with this sentiment at this stage.

I don't think it will be terrible difficult to implement. The key is to carefully record the last entry pointers, as you suggest, we could attach this on the continuation headers. I'm inclined to believe the backtracing metadata only should be available in debug mode.

dhil · 2024-05-14T11:15:46Z

Fixed in a previous PR by @frank-emrich.

dhil added the bug Something isn't working label Dec 30, 2023

dhil closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trapping sporadically segfaults #70

Trapping sporadically segfaults #70

dhil commented Dec 30, 2023

dhil commented Dec 30, 2023 •

edited

frank-emrich commented Jan 4, 2024

dhil commented Jan 5, 2024

dhil commented May 14, 2024

Trapping sporadically segfaults #70

Trapping sporadically segfaults #70

Comments

dhil commented Dec 30, 2023

dhil commented Dec 30, 2023 • edited

frank-emrich commented Jan 4, 2024

dhil commented Jan 5, 2024

dhil commented May 14, 2024

dhil commented Dec 30, 2023 •

edited