fix(nighthawk): use pkill -f to stop sandbox server, prevent agent self-termination#79
Conversation
The agent was killing its own awf container after Claude Code's bash gateway rejected `kill -TERM $(cat server.pid)` for command substitution: it improvised by hardcoding a PID seen earlier, which matched a process inside the agent container and tore the run down with exit 143. Replace every server-stop primitive in Steps 10 and 11 with `pkill -TERM/-KILL -f "next start" 2>/dev/null || true`. No PID lookup, no command substitution, no hardcoded PIDs. Strengthen the Step 8 bash- gateway callout with explicit prohibitions and rewrite the durable rule in Important Notes to document the actual outage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR updates the Nighthawk workflow prompt to stop the local Next.js sandbox server via pkill -f "next start" instead of PID-based kill, addressing workflow failures caused by the Bash gateway rejecting command substitution and preventing accidental self-termination of the AWF container.
Changes:
- Remove PID capture/storage guidance (
server.pid,$!) and replace stop/cleanup instructions withpkill -TERM/-KILL -f "next start" ... || true. - Expand the “Bash gateway constraints” section with explicit rules forbidding
$(...), hardcoded PIDs,kill <pid>, andwait. - Update per-iteration and final cleanup steps to use the new server-stop primitive consistently.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -432,7 +433,20 @@ for i in $(seq 1 60); do | |||
| done | |||
There was a problem hiding this comment.
Good catch — fixed in 658b451. Switched the Wait for ready loop to for i in {1..60}; do …; done (brace expansion is parse-time, not command substitution) and added $(seq 1 60) to the explicit list of rejected forms in the bash-gateway callout so the agent doesn't reach for seq again.
… loop Address Copilot PR review feedback on #79. The Step 8 "Wait for ready" loop still used `for i in $(seq 1 60)` — same command-substitution violation that caused the original outage. Switch to `for i in {1..60}` (parse-time brace expansion, no $(...) involved) and add `seq` to the explicit list of rejected forms in the bash-gateway callout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This closes #77 |
Summary
The nighthawk workflow was failing with
##[error]Process completed with exit code 143whenever the agent reached Step 10 or Step 11 and tried to stop the local Next.js sandbox server.Root cause (from job 25277965317, step 23):
kill -TERM \$(cat server.pid) ... && sleep 2 && : > server.log && echo "Server stopped".Error: Contains command_substitution(the$(cat ...)form is forbidden).kill -TERM 179. PID 179 was inside the agent's own awf container — killing it triggered immediate container teardown, which exited 143.A previous attempt at this fix wrapped
kill/waitwith|| truefor SIGTERM, but that addressed the wrong layer: the agent's bash call already returned 0, and the 143 was coming from the awf wrapper exiting after its child process was killed. The new snippets also still used\$(cat server.pid), which would have been rejected the same way.This PR replaces every server-stop primitive with
pkill -f "next start", which:\$(...)),Bash(pkill:*)).Changes are prompt-only (
nighthawk.md);nighthawk.lock.ymlshows no content diff because the prompt body is pulled in at job time via{{#runtime-import .github/workflows/nighthawk.md}}.Edits in
nighthawk.mdserver.pid(no longer used).echo \$! > server.pid; explicit "do not capture or store the PID".\$(...)is banned, never hardcode a PID, never usekill -TERM <pid>, never usewait, usepkill -fexclusively.pkill -TERM -f "next start" 2>/dev/null || true+sleep 2.pkill -TERM -fthensleep 2thenpkill -KILL -f, both with|| true.Test plan
pkill -TERM -f "next start"(notkill -TERM <pid>) when stopping the server between iterations.pkill -TERM+pkill -KILLpair and the awf container shuts down cleanly afterward.server.pidis never written.🤖 Generated with Claude Code