[codex] Prevent EMFILE startup failures and stale workspace WARNs#50
[codex] Prevent EMFILE startup failures and stale workspace WARNs#50brsbl wants to merge 1 commit into
Conversation
brsbl
left a comment
There was a problem hiding this comment.
Code review result: no findings.
I reviewed the watcher scope change for missed STATUS-data broadcasts, accidental over-filtering, and startup behavior with late-created STATUS-data directories. The bounded depth still permits root thread directories, STATUS-data directories, and JSON key files, while excluding unrelated thread-storage artifacts. Existing priming remains shallow and targeted, and the new test covers the important late-directory creation path.
Validation already performed:
pnpm exec turbo run typecheck --filter=@bb/serverpnpm exec turbo run test --filter=@bb/server -- test/services/threads/status-state-watcher.test.tspnpm exec turbo run build --filter=@bb/server- Startup smoke with
BB_THREAD_STORAGE=/Users/brsbl/.bb/thread-storage:/healthreturned OK.
Residual risk: chokidar depth semantics are dependency behavior, but the focused test exercises the required add path through the real watcher.
93eb783 to
d4e58e0
Compare
brsbl
left a comment
There was a problem hiding this comment.
Review result after the latest amend: no blocking findings. I re-reviewed the watcher narrowing and the host-daemon WARN suppression. The watcher still preserves late-created STATUS-data directories and JSON file events, and the host-daemon change only suppresses WARN logging for workspace.status path_not_found while preserving the failed command result for server/UI handling. Validation run: server typecheck, focused server watcher test, server build, production-style startup smoke, host-daemon command-router test, host-daemon typecheck, and host-daemon build.
d4e58e0 to
97b96a6
Compare
brsbl
left a comment
There was a problem hiding this comment.
Follow-up review after addressing the host.read_file_relative warning: no blocking findings. The additional change keeps the daemon command result as ok:false/ENOENT for missing relative reads, but marks the miss as expected so the router does not emit WARN during the intentional STATUS/index.html -> STATUS.html fallback. I rechecked that invalid_path and other non-ENOENT errors are not broadened by this change. Additional validation passed: focused host-daemon command-router + host-files tests, host-daemon typecheck/build, server public-thread-data route suite, and bb-app package rebuild.
Summary
Fixes two startup-facing failure modes that showed up as terminal errors while starting
bb:EMFILE: too many open files, watchbecause the STATUS-data watcher recursively watched the full thread-storage tree.command execution failedWARN forworkspace.statuswhen probing a stale managed worktree path that no longer exists.Bugs And Failure Modes
EMFILE health-check timeout
The server started a recursive watcher at the full thread storage root, usually
~/.bb/thread-storage, even though the watcher only needs durable STATUS state files under each thread'sSTATUS-data/*.jsondirectory.On machines with many retained thread-storage artifacts, generated reports, screenshots, browser user-data directories, or archived worktrees, that recursive watch could consume too many file descriptors or watch handles. The observable failure mode was:
Because the watcher starts before the HTTP server reports healthy, the launcher timed out while waiting for
/health.Stale managed-worktree status warning
After the server starts, the daemon may receive
workspace.statusfor an environment whose managed worktree was already removed, for example an archived/destroyed environment still referenced by retained history. The command correctly returnspath_not_found, and server cleanup/status callers can handle that result, but the host daemon also logged it as:That made an expected stale-workspace probe look like another startup failure even though the server and daemon were already healthy.
Fix
<thread-id>/STATUS-data/<key>.json.STATUS-datadirectories watchable so late-created STATUS-data directories still emit broadcasts.workspace.statusfailures whose structured error code ispath_not_found.workspace.statuscommand result asok: falsewitherrorCode: "path_not_found", so server/UI logic still receives the real failure.Testing And Validation
pnpm exec turbo run typecheck --filter=@bb/serverpnpm exec turbo run test --filter=@bb/server -- test/services/threads/status-state-watcher.test.tspnpm exec turbo run build --filter=@bb/serverBB_THREAD_STORAGE=/Users/brsbl/.bb/thread-storage: server/healthreturned OK.pnpm exec turbo run test --filter=@bb/host-daemon -- test/command/command-router.test.tspnpm exec turbo run typecheck --filter=@bb/host-daemonpnpm exec turbo run build --filter=@bb/host-daemonbb-apprebuild plus packaged-server smoke usingpackages/bb-app/server/dist/index.jswithBB_THREAD_STORAGE=/Users/brsbl/.bb/thread-storage:/healthreturned OK.\n### STATUS fallback read warning\n\nThe unified status route intentionally probesSTATUS/index.htmlbefore falling back toSTATUS.html. When theSTATUS/directory is absent,host.read_file_relativereturnedENOENTcorrectly but classified the miss as a normal command failure, causing the daemon to logWARN: command execution failedeven though the server expected and handled the fallback.\n\nThe fix now classifies missing relative-read roots as expectedENOENTfailures. The command result is unchanged (ok: false,errorCode: "ENOENT"), but the daemon no longer emits a WARN for that expected read miss.\n\nAdditional validation:\n\n-pnpm exec turbo run test --filter=@bb/host-daemon -- test/command/command-router.test.ts src/command-handlers/host-files.test.ts\n-pnpm exec turbo run typecheck --filter=@bb/host-daemon\n-pnpm exec turbo run build --filter=@bb/host-daemon\n-pnpm exec turbo run test --filter=@bb/server -- test/public/public-thread-data.test.ts\n-pnpm exec turbo run build --filter=bb-app