Skip to content

tracee/event: batch waitpid events and sort CLONE before SIGSTOP to fix multithread deadlock#337

Open
daniel-thisnow wants to merge 1 commit intotermux:masterfrom
daniel-thisnow:fix/multithread-sigstop-race
Open

tracee/event: batch waitpid events and sort CLONE before SIGSTOP to fix multithread deadlock#337
daniel-thisnow wants to merge 1 commit intotermux:masterfrom
daniel-thisnow:fix/multithread-sigstop-race

Conversation

@daniel-thisnow
Copy link

Problem

When a multi-threaded process such as Node.js starts its libuv thread pool, it calls clone(CLONE_VM|CLONE_THREAD|...) four times in rapid succession. The kernel delivers two events per thread creation:

  1. PTRACE_EVENT_CLONE to the parent (processed by new_child(), sets tracee->exe)
  2. An initial SIGSTOP to the new child

These two events can be returned by waitpid(2) in either order. PRoot's SIGSTOP_PENDING mechanism correctly handles a single misordering: if the child's SIGSTOP arrives first (tracee->exe == NULL), handle_tracee_event() sets sigstop = SIGSTOP_PENDING and defers the restart to new_child().

However, with four threads created at once, the interleaving of multiple SIGSTOP and PTRACE_EVENT_CLONE events can leave a child permanently in ptrace-stop with no future waitpid(2) report — a deadlock where proot blocks in waitpid() and all tracees sit in tracing-stop.

This manifests as npm install (or any Node.js workload that initialises libuv) hanging non-deterministically inside proot-distro. The hang was reported in #326.

Fix

After waking from the blocking waitpid(2), drain all additional already-pending events using WNOHANG into a small batch (capped at 64 events). Sort the batch with qsort using a simple priority function:

Priority Event
0 (first) PTRACE_EVENT_CLONE / FORK / VFORK
2 All other ptrace-stop events
3 (last stop) SIGSTOP
4 Process exit / signal

Then process the sorted batch sequentially. This guarantees new_child() is always called before the child's initial SIGSTOP is handled, so tracee->exe != NULL and the direct SIGSTOP_IGNORED path is taken.

The correctness argument:

  • Each stopped tracee contributes at most one event per batch (no tracee is restarted until after draining).
  • The sort is stable within each priority level for practical purposes (different pids, no intra-level ordering dependency).
  • No existing behaviour changes for single-threaded tracees (batch size 1, no sort).

Testing

Tested on Android 13 / aarch64 (Linux 5.15.178, Termux + proot-distro Ubuntu):

# Before patch: npm install hangs within seconds, non-deterministically
# After patch:
for i in 1 2 3 4 5; do
  proot --rootfs=<ubuntu> ... npm install lodash
done
# added 1 package in 37s  (exit 0)  ×5

Compiled with clang 21 targeting aarch64, no new warnings.

Relates to #326.


Built on Android, from Claude claude android funtimes 🤖

Fix a deadlock that occurs when a multi-threaded process (e.g. Node.js
with its libuv thread pool) rapidly creates several threads via
clone(CLONE_VM|CLONE_THREAD|...).

The kernel delivers two events per thread creation: PTRACE_EVENT_CLONE
to the parent and an initial SIGSTOP to the new child.  PRoot's
SIGSTOP_PENDING mechanism correctly handles the case where a single
child's SIGSTOP arrives before the parent's PTRACE_EVENT_CLONE.
However, when four threads are created in rapid succession (as libuv
does at startup), the interleaving of multiple SIGSTOP and
PTRACE_EVENT_CLONE events can leave a child permanently stopped in
ptrace-stop with no future waitpid(2) report to wake it — a deadlock
where proot blocks in waitpid() and all tracees sit in tracing-stop.

Root cause: when waitpid(2) returns a child's SIGSTOP before the
parent's PTRACE_EVENT_CLONE, handle_tracee_event() sets
tracee->sigstop = SIGSTOP_PENDING and returns signal = -1, deferring
the restart to new_child().  With multiple concurrent clones the
timing window for this misordering is wide enough to be hit reliably.

Fix: after waking from the blocking waitpid(2), drain all additional
already-pending events using WNOHANG into a small batch, sort the
batch so PTRACE_EVENT_CLONE/FORK/VFORK events come before SIGSTOP
events (using qsort with a simple priority function), then process the
sorted batch.  This guarantees new_child() is always called before the
child's initial SIGSTOP is handled, so tracee->exe is set and the
direct SIGSTOP_IGNORED path is taken.

Tested on Android 13 / aarch64 (Linux 5.15): npm install inside
proot-distro Ubuntu now completes reliably (5/5 runs, previously hung
non-deterministically within seconds of startup).

Fixes: termux#326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant