Skip to content

fix: check for existing buildkitd before mounting sticky disk#65

Closed
chadxz wants to merge 2 commits intouseblacksmith:mainfrom
chadxz:fix-double-setup-error
Closed

fix: check for existing buildkitd before mounting sticky disk#65
chadxz wants to merge 2 commits intouseblacksmith:mainfrom
chadxz:fix-double-setup-error

Conversation

@chadxz
Copy link

@chadxz chadxz commented Mar 10, 2026

When setup-docker-builder is invoked twice in the same job (e.g. via a composite action called twice), the second invocation was calling setupStickyDisk() before detecting the already-running buildkitd. This caused a new sticky disk to be mounted on top of /var/lib/buildkit while buildkitd was still running with in-memory metadata referencing snapshot directories from the original disk. The subsequent build then failed with:

ERROR: failed to solve: failed to read dockerfile: failed to walk:
resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N:
no such file or directory

Fix: move the buildkitd process check to the very beginning of startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is already running, log an informational message and return immediately so the fallback path reuses the existing configured builder (from the first invocation) without corrupting its overlayfs snapshot state.


Note

Medium Risk
Changes builder initialization flow to short-circuit on an existing buildkitd, which could affect jobs that rely on re-initialization behavior but should reduce state corruption when the action is invoked multiple times.

Overview
Prevents repeated invocations of the action from remounting the sticky disk over an in-use BuildKit state by checking for an existing buildkitd process at the start of startBlacksmithBuilder() and returning early.

When the early-exit path is taken (or Blacksmith setup fails), the action now proceeds via the existing fallback logic to reuse whatever builder is already configured instead of attempting to reinitialize the Blacksmith builder.

Written by Cursor Bugbot for commit debddcb. This will update automatically on new commits. Configure here.

When setup-docker-builder is invoked twice in the same job (e.g. via a
composite action called twice), the second invocation was calling
setupStickyDisk() before detecting the already-running buildkitd. This
caused a new sticky disk to be mounted on top of /var/lib/buildkit while
buildkitd was still running with in-memory metadata referencing snapshot
directories from the original disk. The subsequent build then failed with:

  ERROR: failed to solve: failed to read dockerfile: failed to walk:
  resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N:
  no such file or directory

Fix: move the buildkitd process check to the very beginning of
startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is
already running, log an informational message and return immediately so
the fallback path reuses the existing configured builder (from the first
invocation) without corrupting its overlayfs snapshot state.
@chadxz
Copy link
Author

chadxz commented Mar 10, 2026

I don't have a buf token, so I wasn't able to do a pnpm install to rebuild the dist/index.ts. So if someone could help with that, would be good 👍

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

`Detected existing buildkitd process (PID: ${stdout.trim()}). ` +
`Skipping builder setup - builder is already initialized.`,
);
return { addr: null, exposeId: "" };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nofallback check bypassed when buildkitd already running

Low Severity

When buildkitd is already running, the early return { addr: null, exposeId: "" } bypasses the nofallback check in the catch block. The fallback path at line 643+ has no nofallback guard of its own, so if toolkit.builder.inspect() returns null, a local docker-container builder is silently created even when nofallback is true. Previously, detecting a running buildkitd threw an error that respected the nofallback flag.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Author

@chadxz chadxz Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, buildkitd is already running from a prior invocation in the same job - the Blacksmith builder setup already succeeded. There's nothing to "fall back" from, so I don't think nofallback applies here. We're reusing the existing (already working) builder, not necessarily falling back to a local one.

If we later enrich the toolkit.builder.inspect() check to validate that the existing builder is specifically a Blacksmith builder, it may make sense to wire in specific nofallback handling at that point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chadxz
Copy link
Author

chadxz commented Mar 16, 2026

Closing in favor of #71

@chadxz chadxz closed this Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants