Skip to content

Conversation

@andrewrk
Copy link
Member

@andrewrk andrewrk commented Dec 30, 2021

The two CacheMode values are whole and incremental.
incremental is what we had before; whole is new.
Whole cache mode uses everything as inputs to the cache hash;
and when a hit occurs it skips everything including linking.
This is ideal for when source files change rarely and for backends that
do not have good incremental compilation support, for example
compiler-rt or libc compiled with LLVM with optimizations on.
This is the main motivation for the additional mode, so that we can have
LLVM-optimized compiler-rt/libc builds, without waiting for the LLVM
backend every single time Zig is invoked.

Incremental cache mode hashes only the input file path and a few target
options, intentionally relying on collisions to locate already-existing
build artifacts which can then be incrementally updated.

The bespoke logic for caching stage1 backend build artifacts
is removed since we now have a global caching mechanism for
when we want to cache the entire compilation, including linking.
Previously we had to get "creative" with libs.txt and a special
byte in the hash id to communicate flags, so that when the cached
artifacts were re-linked, we had this information from stage1
even though we didn't actually run it. Now that CacheMode.whole
includes linking, this extra information does not need to be
preserved for cache hits. So although this changeset introduces
complexity, it also removes complexity.

The main trickiness here comes from the inherent differences between the
two modes: incremental wants a directory immediately to operate on,
while whole doesn't know the output directory until the compilation is
complete. This commit deals with this problem mostly inside update(),
where, on a cache miss, it replaces zig_cache_artifact_directory with a
temporary directory, and then renames it into place once the compilation is
complete.

Items remaining before this branch can be merged:

  • make sure @import files make it into the cache manifest
  • make sure @embedFile files make it into the cache manifest
  • make sure C files make it into the cache manifest
  • double check that the emit paths of other things besides the binary are working correctly.
  • test -fno-emit-bin + -fstage1
  • test -femit-bin=foo + -fstage1
  • implib emit directory copies bin_file_emit directory in create() and needs to be adjusted to be overridden as well.
  • make sure emit-h is handled correctly in the cache hash
  • Cache: detect duplicate files added to the manifest
  • macOS: object file paths in the binary need to point to the final renamed directory, not the tmp directory
  • windows: object file paths in the binary need to point to the final renamed directory, not the tmp directory
  • builtin.zig is getting the wrong value for zig_is_stage if you run stage2 behavior tests followed by stage1 behavior tests.

Some preliminary performance measurements of wall clock time and
peak RSS used:

stage1 behavior (1077 tests), llvm backend, release build:

  • cold global cache: 4.6s, 1.1 GiB
  • warm global cache: 3.4s, 980 MiB

stage2 master branch behavior (575 tests), llvm backend, release build:

  • cold global cache: 0.62s, 191 MiB
  • warm global cache: 0.40s, 128 MiB

stage2 this branch behavior (575 tests), llvm backend, release build:

  • cold global cache: 0.62s, 179 MiB
  • warm global cache: 0.27s, 90 MiB

@andrewrk andrewrk force-pushed the cache-mode branch 4 times, most recently from 10eb5f4 to 556906d Compare December 31, 2021 06:19
@andrewrk
Copy link
Member Author

The macOS failure looks like this:

Test 1/13 stack-trace return (Debug)...
========= Expected this output: =========
error: TheSkyIsFalling
source.zig:2:5: [address] in main (test)
    return error.TheSkyIsFalling;
    ^

================================================
error: TheSkyIsFalling
???:?:?: 0x107d80082 in _main.0 (???)

error: TestFailed

It can be reproduced with this test.zig file:

pub fn main() !void {
    return error.TheSkyIsFalling;
}

and running this command:

./zig build-exe test.zig --enable-cache
# then run the test binary from the printed directory

The problem is this code here:

const o_file_path = mem.sliceTo(self.strings[symbol.ofile..], 0);

The o_file_path that it gets from the binary contains something like zig-cache/tmp/[random]/test.o but with these new CacheMode changes, in whole mode, that directory gets renamed to zig-cache/o/[digest] after linking. So the o_file_path has a wrong absolute path.

I'm not quite sure how the object file paths get set in the binary. Some different ideas to address this problem:

  • After renaming the tmp dir to the final cache dir, tell the linker to update the object file paths in the binary.
  • We can know the final cache dir before calling flush(); perhaps the file paths can be delayed to be written until flush().
  • Adjust the object file paths to be relative to the binary rather than absolute.
  • Leave the compilation/linking alone, and introduce logic in our stack trace dumping code to notice when a binary has been moved, and use the new file system destination as a "patch" to adjust object file paths. The downside here is that this only works for zig stack traces, but we want gdb/lldb and other system tools to work as well. So arguably we should explore the other ideas first.

One thing that would help unblock me here would be to have a chat with @kubkon about where exactly is the logic that decides how those object file paths get computed and populated. But that will have to wait until holiday festivities are over 🙂 Happy New Year! 🥳

The two CacheMode values are `whole` and `incremental`.
`incremental` is what we had before; `whole` is new.
Whole cache mode uses everything as inputs to the cache hash;
and when a hit occurs it skips everything including linking.
This is ideal for when source files change rarely and for backends that
do not have good incremental compilation support, for example
compiler-rt or libc compiled with LLVM with optimizations on.
This is the main motivation for the additional mode, so that we can have
LLVM-optimized compiler-rt/libc builds, without waiting for the LLVM
backend every single time Zig is invoked.

Incremental cache mode hashes only the input file path and a few target
options, intentionally relying on collisions to locate already-existing
build artifacts which can then be incrementally updated.

The bespoke logic for caching stage1 backend build artifacts
is removed since we now have a global caching mechanism for
when we want to cache the entire compilation, *including* linking.
Previously we had to get "creative" with libs.txt and a special
byte in the hash id to communicate flags, so that when the cached
artifacts were re-linked, we had this information from stage1
even though we didn't actually run it. Now that `CacheMode.whole`
includes linking, this extra information does not need to be
preserved for cache hits. So although this changeset introduces
complexity, it also removes complexity.

The main trickiness here comes from the inherent differences between the
two modes: `incremental` wants a directory immediately to operate on,
while `whole` doesn't know the output directory until the compilation is
complete. This commit deals with this problem mostly inside `update()`,
where, on a cache miss, it replaces `zig_cache_artifact_directory` with a
temporary directory, and then renames it into place once the compilation is
complete.

Items remaining before this branch can be merged:

* [ ] make sure these things make it into the cache manifest:
  - @import files
  - @embedfile files
  - we already add dep files from c but make sure the main .c files make
    it in there too, not just the included files

* [ ] double check that the emit paths of other things besides the binary
  are working correctly.

* [ ] test `-fno-emit-bin` + `-fstage1`
* [ ] test `-femit-bin=foo` + `-fstage1`

* [ ] implib emit directory copies bin_file_emit directory in create() and needs
  to be adjusted to be overridden as well.

* [ ] make sure emit-h is handled correctly in the cache hash
* [ ] Cache: detect duplicate files added to the manifest

Some preliminary performance measurements of wall clock time and
peak RSS used:

stage1 behavior (1077 tests), llvm backend, release build:
 * cold global cache: 4.6s, 1.1 GiB
 * warm global cache: 3.4s, 980 MiB

stage2 master branch behavior (575 tests), llvm backend, release build:
 * cold global cache: 0.62s, 191 MiB
 * warm global cache: 0.40s, 128 MiB

stage2 this branch behavior (575 tests), llvm backend, release build:
 * cold global cache: 0.62s, 179 MiB
 * warm global cache: 0.27s, 90 MiB
 * Logic to check whether a bin file is not emitted is more complicated
   in between `Compilation.create` and `Compilation.update`. Fixed the
   logic that decides whether to build compiler-rt and other support
   artifacts.
 * Basically, one cannot inspect the value of `comp.bin_file.emit` until
   after update() is called - fixed another instance of this happening
   in the CLI.
 * In the CLI, `runOrTest` is updated to properly use the result value
   of `comp.bin_file.options.emit` rather than guessing whether the
   output binary is.
 * Don't assume that the emit output has no directory components in
   sub_path. In other words, don't assume that the emit directory is the
   final directory; there may be sub-directories.
when using `CacheMode.whole`. Also, I verified that `addDepFilePost` is
in fact including the original C source file in addition to the files it
depends on.
This fixes a regression in this branch that can be reproduced with the
following steps:

1. `zig build-exe hello.zig`
2. delete the "hello" binary
3. `zig build-exe hello.zig`
4. observe that the "hello" binary is missing

This happened because it was a cache hit, but nothing got copied to the
output directory.

This commit sets CacheMode to incremental - even for stage1 - when the
CLI requests `disable_lld_caching` (this option should be renamed),
resulting in the main Compilation to be repeated (uncached) for stage1,
populating the binary into the cwd as expected.

For stage2 the result is even better: the incremental compilation system
will look for build artifacts to incrementally compile, and start fresh
if not found.
libtsan, libunwind, and libcxxabi.
Previously the code asserted source files were already loaded, but this
is not the case when cached ZIR is loaded. Now it will trigger .zig
source code to be loaded for the purposes of hashing the source for
`CacheMode.whole`.

This additionally refactors stat_size, stat_inode, and stat_mtime fields
into using the `Cache.File.Stat` struct.
Need to mark it as "needing cleanup" a bit earlier in the function.
Instead of juggling GPA-allocated sub_path (and ultimately dropping the
ball, in this analogy), `Compilation.create` allocates an
already-exactly-correct size `sub_path` that has the digest unpopulated.

This is then overwritten in place as necessary and used as the
`emit_bin.sub_path` value, and no allocations/frees are performed for
this file path.
I accidentally used errdefer instead of defer for the function-local
arena.
Broken by the introduction of `CacheMode.whole`, they are now working
again by using the same mechanism as emitted binary files.
This improves readability as well as compatibility with stage2. Most of
compiler-rt is now enabled for stage2 with just a few functions disabled
(until stage2 passes more behavior tests).
This allows stage2 to build more of compiler-rt.

I also changed `-%` to `-` for comptime ints in the div and mul
implementations of compiler-rt. This is clearer code and also happens to
work around a bug in stage2.
The semantics of this function are that it moves both files and
directories. Previously we had this `is_dir` boolean field of
`std.os.windows.OpenFile` which required the API user to choose: are we
opening a file or directory? And the other kind would either cause
error.IsDir or error.NotDir. But that is not a limitation of the Windows
file system API; it was self-imposed.

On Windows, rename is implemented internally with `NtCreateFile` so we
need to allow it to open either files or directories. This is now done
by `std.os.windows.OpenFile` accepting enum{file_only,dir_only,any}
instead of a boolean.
Without this, Zig would re-use a compiler-rt built with stage2 when one
built by stage1 was needed.
Doc comments reproduced here:

This function is called by the frontend before flush(). It communicates that
`options.bin_file.emit` directory needs to be renamed from
`[zig-cache]/tmp/[random]` to `[zig-cache]/o/[digest]`.
The frontend would like to simply perform a file system rename, however,
some linker backends care about the file paths of the objects they are linking.
So this function call tells linker backends to rename the paths of object files
to observe the new directory path.
Linker backends which do not have this requirement can fall back to the simple
implementation at the bottom of this function.
This function is only called when CacheMode is `whole`.

This solves stack trace regressions on Windows and macOS because the
linker backends do not observe object file paths until flush().
@andrewrk andrewrk merged commit 81fa31c into master Jan 3, 2022
@andrewrk andrewrk deleted the cache-mode branch January 3, 2022 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant