Summary
embed_file currently reads files during sema/comptime/codegen, but the compiler does not appear to register those reads as build graph dependencies before/as they are read. Under the updated spec, embed_file is tracked-input comptime, not ordinary file I/O.
Spec refs:
docs/with-specification.md §17.1a Tracked-Input Comptime
docs/with-specification.md §17.6a embed_file(path)
docs/requirements.md 17.1.2.1 through 17.1.2.12
docs/requirements.md 17.6.2.2 through 17.6.2.10
Current implementation paths
src/SemaCheck.w:90-96 resolves absolute paths directly and source-relative paths by string concatenation.
src/SemaCheck.w:11151-11171 validates embed_file, force-evaluates the path, and checks existence with with_fs_file_exists.
src/ComptimeEval.w:314-320 resolves absolute paths directly and source-relative paths by string concatenation.
src/ComptimeEval.w:4167-4179 evaluates embed_file by reading the file with with_fs_read_file.
src/CodegenTraits.w:860-867 has a parallel resolver.
src/CodegenTraits.w:911-927 can read an embedded file while evaluating constant strings in codegen.
src/CodegenDispatch.w:12586-12603 generates an embed_file string literal by reading the file in codegen.
Five whys / root cause
-
Why can embed_file violate the new tracked-input rule?
It reads files directly from sema/comptime/codegen helpers.
-
Why is that not enough?
The build graph does not learn that the file is an input before/as it is read, so incremental/reproducible builds can miss the dependency.
-
Why are there multiple read sites?
embed_file grew as a special intrinsic in sema, comptime evaluation, and codegen rather than through a single compiler-owned tracked-input API.
-
Why does that matter for self-hosting?
Two builds with the same With source but different untracked embedded-file contents could produce different binaries while the build graph thinks the inputs are unchanged.
-
Root cause:
The compiler lacks a central tracked-input registration path for compile-time file reads, so embed_file is implemented as direct filesystem access instead of declared, authorized, tracked input access.
Required behavior
embed_file path expressions resolve by pure comptime before reading.
- Source-relative paths are allowed only within an authorized package/source root unless an explicit capability grants broader access.
- Absolute/out-of-root paths are rejected by default with a diagnostic that names the missing authority.
- The resolved file path is recorded as a build dependency before or as it is read.
- Rebuilds are triggered when the embedded file changes.
- Missing files remain compile errors.
embed_file does not glob, list directories, consult the environment, or discover files from ambient filesystem state.
- All sema/comptime/codegen paths go through one tracked-input API or share one recorded dependency mechanism.
Acceptance criteria
- Add a compiler-owned tracked-input registry/API for compile-time file inputs.
- Route all
embed_file reads through that API.
- Add tests showing that an
embed_file read is recorded as a build dependency and invalidates/rebuilds when the file changes.
- Add tests rejecting absolute/out-of-root paths without explicit authority.
- Add tests preserving the existing positive source-relative behavior and missing-file diagnostic.
Summary
embed_filecurrently reads files during sema/comptime/codegen, but the compiler does not appear to register those reads as build graph dependencies before/as they are read. Under the updated spec,embed_fileis tracked-input comptime, not ordinary file I/O.Spec refs:
docs/with-specification.md§17.1a Tracked-Input Comptimedocs/with-specification.md§17.6aembed_file(path)docs/requirements.md17.1.2.1through17.1.2.12docs/requirements.md17.6.2.2through17.6.2.10Current implementation paths
src/SemaCheck.w:90-96resolves absolute paths directly and source-relative paths by string concatenation.src/SemaCheck.w:11151-11171validatesembed_file, force-evaluates the path, and checks existence withwith_fs_file_exists.src/ComptimeEval.w:314-320resolves absolute paths directly and source-relative paths by string concatenation.src/ComptimeEval.w:4167-4179evaluatesembed_fileby reading the file withwith_fs_read_file.src/CodegenTraits.w:860-867has a parallel resolver.src/CodegenTraits.w:911-927can read an embedded file while evaluating constant strings in codegen.src/CodegenDispatch.w:12586-12603generates anembed_filestring literal by reading the file in codegen.Five whys / root cause
Why can
embed_fileviolate the new tracked-input rule?It reads files directly from sema/comptime/codegen helpers.
Why is that not enough?
The build graph does not learn that the file is an input before/as it is read, so incremental/reproducible builds can miss the dependency.
Why are there multiple read sites?
embed_filegrew as a special intrinsic in sema, comptime evaluation, and codegen rather than through a single compiler-owned tracked-input API.Why does that matter for self-hosting?
Two builds with the same With source but different untracked embedded-file contents could produce different binaries while the build graph thinks the inputs are unchanged.
Root cause:
The compiler lacks a central tracked-input registration path for compile-time file reads, so
embed_fileis implemented as direct filesystem access instead of declared, authorized, tracked input access.Required behavior
embed_filepath expressions resolve by pure comptime before reading.embed_filedoes not glob, list directories, consult the environment, or discover files from ambient filesystem state.Acceptance criteria
embed_filereads through that API.embed_fileread is recorded as a build dependency and invalidates/rebuilds when the file changes.