Skip to content

fix: normalize Unicode paths to NFC for macOS compatibility#82

Merged
tobi merged 1 commit intotobi:mainfrom
c-stoeckl:fix/unicode-nfd-normalization
Feb 1, 2026
Merged

fix: normalize Unicode paths to NFC for macOS compatibility#82
tobi merged 1 commit intotobi:mainfrom
c-stoeckl:fix/unicode-nfd-normalization

Conversation

@c-stoeckl
Copy link
Copy Markdown
Contributor

@c-stoeckl c-stoeckl commented Jan 31, 2026

Summary

Workaround for Bun UTF-8 path corruption bug in Bun.file().stat() that causes ENOENT errors when indexing files with non-ASCII characters.

Bug Discovery

Only Bun.file().stat() is affected. Investigation revealed:

  • Bun.file(path).stat() - BROKEN - corrupts UTF-8 paths internally
  • Bun.file(path).text() - WORKS - uses different code path, not affected

The bug causes mojibake when Bun converts the JavaScript string to a system call path, resulting in ENOENT errors for files with non-ASCII characters (e.g., German umlauts like ö, ü, ä).

Changes

While only .stat() was buggy, we standardized on Node.js fs module for consistency:

  • src/qmd.ts: Use statSync() instead of Bun.file().stat() (fixes the bug)
  • src/qmd.ts: Use readFileSync() instead of Bun.file().text() (for API consistency)
  • src/store.ts: Use statSync() for SQLite custom path detection

Impact

  • Fixes indexing for files with international characters (German, French, etc.)
  • Consistent sync API usage throughout the indexing pipeline
  • No breaking changes - standard Node.js API

Future Work

These workarounds can be reverted once Bun fixes the UTF-8 path handling in Bun.file().stat().

Test Plan

  • Verified with paths containing non-ASCII characters
  • No regressions in ASCII-only filename handling

Fixes #81

@c-stoeckl c-stoeckl marked this pull request as draft January 31, 2026 19:42
Replace Bun.file() async calls with Node.js fs sync methods to work
around a Bun bug that corrupts UTF-8 file paths containing non-ASCII
characters.

Bug: Bun.file(filepath).stat() and Bun.file(filepath).text() internally
mangle UTF-8 encoding, causing ENOENT errors with mojibake paths when
accessing files in iCloud Drive and other locations.

Changes:
- src/qmd.ts: Use readFileSync instead of Bun.file().text()
- src/qmd.ts: Use statSync instead of Bun.file().stat() for file metadata
- src/store.ts: Use statSync for SQLite custom path detection
@c-stoeckl c-stoeckl force-pushed the fix/unicode-nfd-normalization branch from 86f455a to da9d1c3 Compare January 31, 2026 23:23
@c-stoeckl c-stoeckl marked this pull request as ready for review January 31, 2026 23:47
@tobi tobi merged commit 0f87e24 into tobi:main Feb 1, 2026
Anrahya pushed a commit to Anrahya/qmd that referenced this pull request Feb 3, 2026
Replace Bun.file() async calls with Node.js fs sync methods to work
around a Bun bug that corrupts UTF-8 file paths containing non-ASCII
characters.

Bug: Bun.file(filepath).stat() and Bun.file(filepath).text() internally
mangle UTF-8 encoding, causing ENOENT errors with mojibake paths when
accessing files in iCloud Drive and other locations.

Changes:
- src/qmd.ts: Use readFileSync instead of Bun.file().text()
- src/qmd.ts: Use statSync instead of Bun.file().stat() for file metadata
- src/store.ts: Use statSync for SQLite custom path detection
jaylfc added a commit to jaylfc/qmd that referenced this pull request Apr 5, 2026
Replace Bun.file() async calls with Node.js fs sync methods to work
around a Bun bug that corrupts UTF-8 file paths containing non-ASCII
characters.

Bug: Bun.file(filepath).stat() and Bun.file(filepath).text() internally
mangle UTF-8 encoding, causing ENOENT errors with mojibake paths when
accessing files in iCloud Drive and other locations.

Changes:
- src/qmd.ts: Use readFileSync instead of Bun.file().text()
- src/qmd.ts: Use statSync instead of Bun.file().stat() for file metadata
- src/store.ts: Use statSync for SQLite custom path detection
jaylfc added a commit to jaylfc/qmd that referenced this pull request Apr 5, 2026
Replace Bun.file() async calls with Node.js fs sync methods to work
around a Bun bug that corrupts UTF-8 file paths containing non-ASCII
characters.

Bug: Bun.file(filepath).stat() and Bun.file(filepath).text() internally
mangle UTF-8 encoding, causing ENOENT errors with mojibake paths when
accessing files in iCloud Drive and other locations.

Changes:
- src/qmd.ts: Use readFileSync instead of Bun.file().text()
- src/qmd.ts: Use statSync instead of Bun.file().stat() for file metadata
- src/store.ts: Use statSync for SQLite custom path detection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENOENT error on macOS when indexing files with umlaut characters in paths

2 participants