Skip to content

fix: sync sticky disk implementation with stickydisk action#25

Merged
adityamaru merged 2 commits intomainfrom
fix/sync-with-stickydisk
Nov 15, 2025
Merged

fix: sync sticky disk implementation with stickydisk action#25
adityamaru merged 2 commits intomainfrom
fix/sync-with-stickydisk

Conversation

@adityamaru
Copy link
Copy Markdown

@adityamaru adityamaru commented Nov 15, 2025

Problem

Users are experiencing connection errors when using setup-bazel:

Warning: Failed to mount sticky disk for /home/runner/.cache/bazel-repo: ConnectError: [unavailable] connect ECONNREFUSED 192.168.127.1:5557

Root Cause

The setup-bazel action has drifted from the stickydisk action, and is using the wrong environment variable name (VM_ID instead of BLACKSMITH_VM_ID), which prevents proper VM identification.

Changes

Critical Fix

  • Fix environment variable name: Changed VM_ID to BLACKSMITH_VM_ID in 4 locations
    • This is the primary cause of the connection refused errors

Reliability Improvements

  • Increase timeout: 15s → 45s for sticky disk operations
  • Add sync: Flush pending writes before collecting filesystem usage
  • Add drop_caches: Drop kernel caches before unmount to prevent 'device is busy' errors
  • Add lost+found cleanup: Remove lost+found directory to prevent EACCES errors with build tools (pnpm, yarn, npm, docker buildx)
  • Add debug logging: Log the port being used in createStickyDiskClient
  • Improve mount point creation: Use sudo for system directories like /nix, /mnt

Testing

These changes align setup-bazel with the tested stickydisk action implementation, which has been working successfully in production.

Related

Syncs with: https://github.com/useblacksmith/stickydisk


Note

Switch to BLACKSMITH_VM_ID and add reliability improvements (longer timeouts, safe formatting, mount ownership, sync/drop_caches, and logging) for sticky disk operations.

  • Sticky Disk API:
    • Replace VM_ID with BLACKSMITH_VM_ID in all service calls (getStickyDisk, commitStickyDisk).
  • Mount/Format:
    • Extend abort timeout from 15s to 45s when fetching disk.
    • On first format, remove lost+found after mkfs.ext4 via temporary mount to avoid permission issues.
    • Create mount point with sudo and ensure ownership via chown before and after mount.
  • Unmount/Commit Reliability:
    • Flush writes with sync before measuring usage.
    • Drop caches (echo 3 > /proc/sys/vm/drop_caches) prior to unmount to reduce "device is busy" errors.
    • Report filesystem usage if available; otherwise proceed without it.
  • Observability:
    • Log gRPC client port in createStickyDiskClient.

Written by Cursor Bugbot for commit 2e5fa36. This will update automatically on new commits. Configure here.

@adityamaru adityamaru force-pushed the fix/sync-with-stickydisk branch from 18fd18c to 9c8ee57 Compare November 15, 2025 02:24
This commit fixes critical drift between setup-bazel and the stickydisk action:

Critical fixes:
- fix: use BLACKSMITH_VM_ID instead of VM_ID environment variable (4 places)
  This was causing connection refused errors as the VM couldn't be properly identified

Performance and reliability improvements:
- increase sticky disk timeout from 15s to 45s for slower operations
- add sync before unmount to flush pending writes
- add drop_caches before unmount to prevent 'device is busy' errors
- add lost+found directory cleanup to prevent EACCES errors with build tools
- add debug logging to createStickyDiskClient for better troubleshooting
- improve mount point creation to support system directories like /nix

These changes bring setup-bazel in line with the tested stickydisk action
and should resolve the 'connect ECONNREFUSED 192.168.127.1:5557' errors.
@adityamaru adityamaru force-pushed the fix/sync-with-stickydisk branch from 9c8ee57 to 29d81df Compare November 15, 2025 02:26
…unmount

The sync and drop_caches commands are best-effort operations that
shouldn't block the critical unmount operation. If they fail, the
exception would propagate and prevent filesystem usage collection
and unmount from executing, leaving the disk mounted. This wraps
both operations in try-catch blocks to log warnings but continue
with the unmount process.
@adityamaru adityamaru merged commit 17da723 into main Nov 15, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant