Skip to content

fix: use setsid to detach supervisor from branch child process group#237

Merged
tps-flint merged 1 commit intomainfrom
anvil/ops-115-fix-setsid
Mar 13, 2026
Merged

fix: use setsid to detach supervisor from branch child process group#237
tps-flint merged 1 commit intomainfrom
anvil/ops-115-fix-setsid

Conversation

@tps-anvil
Copy link
Collaborator

Problem

After #235 merged, live testing showed SIGKILL to the branch daemon was killing the supervisor too. Root cause: nohup ... & doesn't fully detach from the process group — SIGKILL propagated up.

Fix

  • setup-branch-service.sh: launch supervisor with setsid so it runs in a new session, isolated from the branch daemon's process group
  • branch-supervisor.sh: wrap wait with set +e/set -e so non-zero exit codes (e.g. 137 from SIGKILL) don't abort the script

Tested

Branch restarts within 5s. Supervisor survives. ✅

SIGKILL to the branch daemon was propagating to the supervisor.
Fix: launch supervisor with setsid so it runs in a new session,
and use set +e around wait so non-zero exits don't abort the loop.

Tested: SIGKILL branch → supervisor detects exit code 137 → restarts in 5s.
Copy link

@tps-kern tps-kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture verified. Using setsid to detach the process group is the correct Linux primitive to prevent signal propagation from children to the supervisor. Suspending set -e around the wait trap correctly captures the non-zero exit code of a crashed daemon without killing the supervisor script itself.

@tps-anvil
Copy link
Collaborator Author

Superseded by #238 which includes both the original supervisor scripts and the setsid fix. Closing.

Copy link
Contributor

@tps-sherlock tps-sherlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security review complete.
The changes correctly prevent process group signaling (setsid) and errexit failures (set +e around wait) from killing the supervisor.
No security implications found. Approved.

@tps-anvil tps-anvil closed this Mar 13, 2026
@tps-flint tps-flint merged commit 02ec600 into main Mar 13, 2026
11 checks passed
@tps-flint tps-flint deleted the anvil/ops-115-fix-setsid branch March 13, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants