refactors ad infinitum #20494

mlugg · 2024-07-04T04:32:22Z

Okay, a bunch of boring changes here, although the last commit ended up being pretty bulky. Perf measurements coming soon (tm). Here's a summary of the changes:

AnalSubject -> AnalUnit. I meant to call it this in the first place, I just confused myself. (Any comments bikeshedding this naming will be hidden; I do not care.)
The frontend now has a better representation of exports. Rather than multiple hashmaps containing pointers, I observed that we only actually need the map formerly known as export_owners. By also making the hashmap value an index+length into a larger list rather than an ArrayListUnmanaged, we get a much better memory layout which is more friendly to the CPU and better for serialization. Exports are not quite serializable yet, because Export itself isn't serializable, but the hard bit is done.
The above task also led to a surprising amount of backend changes. This includes minor fixes in some of the self-hosted linkers, a complete rework of how exports are handled in the C backend to deal with @export correctly (thanks @jacobly0!), and a slight refactor of how exports and externs are handled in the LLVM backend to make it a little less broken.
Move a few datastructures towards AnalUnit rather than DeclIndex. When we're trying to key something on a "unit of semantic analysis", AnalUnit is the correct abstraction, because even though a function has an owner Decl, the function has two analyses associated with it: analysis of the function declaration (essentially its signature), and analysis of the function body as a runtime function. Also note that after The Great Decl Split, a generic function instantiation will not have a Cau associated with it, since analysis of the instance's signature occurs a; at callsites and b; before the instance even exists! Anyways, the main subject of this change is compile errors -- failed_decls becomes failed_analysis keyed on AnalUnit.
The reference trace is now tracked very differently. Previously, we had a reference_table in Zcu which mapped from a Decl to the Decl that referenced it. Firstly, there's a bit of conflation of Decl with AnalUnit here, but more importantly, this has a problem when we consider incremental compilation: what if the "referencer" in the table is not actually referenced within the compilation after an update? To give a valid reference trace, we need to be aware of all references, so we know all potential "paths" to something being referenced. In fact, storing these gives us a way to determine what is referenced at the end of an incremental update.

So, here's how the system works now. reference_table maps from the AnalUnit performing a reference to the AnalUnit it references (more precisely, that it triggers semantic analysis of -- immediately or by queuing, it doesn't matter). In fact, it maps to a u32 index into all_references, which contains Reference objects, which form a linked list using the next field (which is another index into all_references). During analysis, when something is analyzed, its entry in reference_table is dropped, since these references will be rediscovered (the corresponding indices in all_references are put into a freelist). After the main work queue is flushed during an update, Zcu.resolveReferences essentially reverses the mapping, returning a hashmap from the AnalUnit that is referenced to the AnalUnit that performed the reference. (We don't persist this generally because it's relatively cheap to just construct at the end of compilation and there's a heavy cost associated with maintaining it across incremental updates.) Since we don't have access during this map in the middle of the update (during Sema.failWithOwnedErrorMsg), resolution of the reference trace is deferred until Compilation gathers errors into the final ErrorBundle.

In future, Zcu.resolveReferences will do a full traversal, starting from the roots of analysis (the main one being the root file of std). That way, we can learn after an incremental update whether any Decl is now unreferenced, and if so, skip its stored error messages (and potentially delete it from the binary).
Zcu.ErrorMsg no longer stores a SrcLoc, but instead a LazySrcLoc. The source location is resolved when the error message is emitted. This is to make it correct across incremental updates.
type.zig becomes Type.zig; @import("type.zig").Type is now @import("Type.zig"). Self-explanatory.
All types are now queued for full resolution at the moment their definition is analyzed. This simplifies the language specification, particularly in regards to incremental compilation.
Code generation of functions is now queued into a separate job after the function body is analyzed. This allows us to delete types_to_resolve and eliminate all "AIR instruction X triggers type resolution level Y" rules, simplifying the language spec and implementation.
Type resolution no longer requires a Sema, because we construct the Sema using the type's owner Decl. This is a necessary change with far-reaching consequences in terms of changing function signatures. It fixes bugs exposed by the previous couple of bullet-points, and is necessary for incremental compilation.
The test case creation logic -- and thus the compiler's build script -- no longer depend on src/Compilation.zig. This became a problem because this new logic caused Compilation to be fully resolved in the build script, so @import("build_options") was analyzed due to its fields, so the build script could not compile. This is fixed by removing this legacy (and quite broken) dependency of the compiler's build script on a random part of its codebase.

mlugg · 2024-07-04T04:42:26Z

The breaking change here is full type resolution. I previously got the go-ahead from Andrew to make this change without a proposal if it passes the existing tests. It means that, for instance, code like this is now disallowed:

const S = struct {
    x: 123, // nonsensical field type
};
comptime {
    _ = S; // previously didn't resolve S at all; now does!
}

mlugg · 2024-07-04T07:02:10Z

Performance Data Points

Analyze Behavior

Benchmark 1 (51 runs): ../master/stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-emit-bin --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           395ms ± 12.4ms     384ms …  435ms          4 ( 8%)        0%
  peak_rss            106MB ±  195KB     105MB …  106MB          1 ( 2%)        0%
  cpu_cycles         1.65G  ± 48.9M     1.61G  … 1.81G           5 (10%)        0%
  instructions       2.99G  ± 8.15K     2.99G  … 2.99G           2 ( 4%)        0%
  cache_references    167M  ± 1.60M      165M  …  173M           4 ( 8%)        0%
  cache_misses       7.63M  ±  315K     7.25M  … 8.83M           3 ( 6%)        0%
  branch_misses      7.06M  ± 70.8K     6.93M  … 7.34M           2 ( 4%)        0%
Benchmark 2 (54 runs): ./stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-emit-bin --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           377ms ± 4.00ms     372ms …  394ms          2 ( 4%)        ⚡-  4.6% ±  0.9%
  peak_rss            109MB ±  225KB     108MB …  109MB          0 ( 0%)        💩+  2.4% ±  0.1%
  cpu_cycles         1.55G  ± 14.9M     1.54G  … 1.62G           3 ( 6%)        ⚡-  5.7% ±  0.8%
  instructions       2.82G  ± 8.33K     2.82G  … 2.82G           0 ( 0%)        ⚡-  5.6% ±  0.0%
  cache_references    162M  ±  885K      160M  …  165M           1 ( 2%)        ⚡-  3.0% ±  0.3%
  cache_misses       7.85M  ±  169K     7.53M  … 8.29M           0 ( 0%)        💩+  2.8% ±  1.3%
  branch_misses      6.77M  ± 42.6K     6.67M  … 6.85M           0 ( 0%)        ⚡-  4.1% ±  0.3%

Compile Behavior (x86_64 selfhosted)

Benchmark 1 (23 runs): ../master/stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           907ms ± 18.6ms     892ms …  976ms          2 ( 9%)        0%
  peak_rss            122MB ±  182KB     121MB …  122MB          0 ( 0%)        0%
  cpu_cycles         3.59G  ± 71.9M     3.55G  … 3.85G           3 (13%)        0%
  instructions       6.56G  ± 49.4K     6.56G  … 6.56G           0 ( 0%)        0%
  cache_references    267M  ± 1.15M      265M  …  269M           0 ( 0%)        0%
  cache_misses       36.5M  ±  569K     35.6M  … 37.9M           0 ( 0%)        0%
  branch_misses      26.7M  ± 97.8K     26.6M  … 26.9M           0 ( 0%)        0%
Benchmark 2 (23 runs): ./stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           870ms ± 6.99ms     857ms …  888ms          0 ( 0%)        ⚡-  4.1% ±  0.9%
  peak_rss            125MB ±  176KB     125MB …  125MB          0 ( 0%)        💩+  2.8% ±  0.1%
  cpu_cycles         3.43G  ± 25.0M     3.40G  … 3.49G           0 ( 0%)        ⚡-  4.7% ±  0.9%
  instructions       6.62G  ± 49.3K     6.62G  … 6.62G           1 ( 4%)          +  0.9% ±  0.0%
  cache_references    250M  ± 2.05M      248M  …  256M           3 (13%)        ⚡-  6.5% ±  0.4%
  cache_misses       23.7M  ±  280K     23.3M  … 24.4M           1 ( 4%)        ⚡- 35.0% ±  0.7%
  branch_misses      22.6M  ± 96.7K     22.5M  … 22.9M           1 ( 4%)        ⚡- 15.4% ±  0.2%

Analyze Compiler

Benchmark 1 (6 runs): /home/mlugg/zig/master/stage4/bin/zig build-exe <snip>
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.37s  ± 69.1ms    5.29s  … 5.48s           0 ( 0%)        0%
  peak_rss            237MB ±  244KB     237MB …  237MB          0 ( 0%)        0%
  cpu_cycles         23.7G  ±  491M     23.3G  … 24.6G           0 ( 0%)        0%
  instructions       42.8G  ± 9.71K     42.8G  … 42.8G           0 ( 0%)        0%
  cache_references   2.13G  ± 10.1M     2.11G  … 2.15G           0 ( 0%)        0%
  cache_misses       92.3M  ± 2.94M     88.3M  … 97.1M           0 ( 0%)        0%
  branch_misses      76.8M  ±  319K     76.2M  … 77.1M           0 ( 0%)        0%
Benchmark 2 (6 runs): /home/mlugg/zig/the-great-decl-split/stage4/bin/zig build-exe <snip>
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.35s  ± 9.06ms    5.33s  … 5.36s           1 (17%)          -  0.4% ±  1.2%
  peak_rss            274MB ±  242KB     274MB …  274MB          0 ( 0%)        💩+ 15.7% ±  0.1%
  cpu_cycles         23.4G  ± 36.6M     23.3G  … 23.4G           0 ( 0%)          -  1.5% ±  1.9%
  instructions       42.4G  ± 13.1K     42.4G  … 42.4G           0 ( 0%)          -  1.0% ±  0.0%
  cache_references   2.12G  ± 4.85M     2.11G  … 2.12G           0 ( 0%)          -  0.7% ±  0.5%
  cache_misses       98.0M  ± 2.26M     95.2M  …  101M           0 ( 0%)        💩+  6.2% ±  3.7%
  branch_misses      75.1M  ±  445K     74.8M  … 76.0M           0 ( 0%)        ⚡-  2.2% ±  0.6%

Overall, it's no worse. Somehow, even though we're surely doing strictly more work, analysis is sometimes faster. (The builds differ only by commits in this branch, and are both identically-created native builds!)

mlugg · 2024-07-04T07:26:15Z

Oh, I did forget to look at the peak RSS in those results. It's a little worse across the board, and noticably worse when rebuilding the compiler.

That's to be expected: we're tracking a lot more state. However, this additional state is necessary for correct incremental compilation, so I don't think this is worth worrying about. Also note that I wasn't using LLVM in those builds, so the absolute peak RSS numbers are relatively low -- that +15% when building the compiler is only actually about 40M more!

andrewrk · 2024-07-04T18:52:17Z

CI failure looks like some linker code got accidentally referenced by an "only_c" build when ensuring that our wasm bootstrap process succeeds. Likely just a missing if to turn some branch into a no-op when build_options.only_c is true.

I meant to call it this originally, I just got mixed up -- sorry!

@jacobly0

This commit reworks our representation of exported Decls and values in Zcu to be memory-optimized and trivially serialized. All exports are now stored in the `all_exports` array on `Zcu`. An `AnalUnit` which performs an export (either through an `export` annotation or by containing an analyzed `@export`) gains an entry into `single_exports` if it performs only one export, or `multi_exports` if it performs multiple. We no longer store a persistent mapping from a `Decl`/value to all exports of that entity; this state is not necessary for the majority of the pipeline. Instead, we construct it in `Zcu.processExports`, just before flush. This does not affect the algorithmic complexity of `processExports`, since this function already iterates all exports in the `Zcu`. The elimination of `decl_exports` and `value_exports` led to a few non-trivial backend changes. The LLVM backend has been wrangled into a more reasonable state in general regarding exports and externs. The C backend is currently disabled in this commit, because its support for `export` was quite broken, and that was exposed by this work -- I'm hoping @jacobly0 will be able to pick this up!

This change seeks to more appropriately model the way semantic analysis works by drawing a more clear line between errors emitted by analyzing a `Decl` (in future a `Cau`) and errors emitted by analyzing a runtime function. This does change a few compile errors surrounding compile logs by adding more "also here" notes. The new notes are more technically correct, but perhaps not so helpful. They're not doing enough harm for me to put extensive thought into this for now.

Previously, `reference_table` mapped from a `Decl` being referenced to the `Decl` that performed the reference. This is convenient for constructing error messages, but problematic for incremental compilation. This is because on an incremental update, we want to efficiently remove all references triggered by an `AnalUnit` which is being re-analyzed. For this reason, `reference_table` now maps the other way: from the `AnalUnit` *performing* the reference, to the `AnalUnit` whose analysis was triggered. As a general rule, any call to any of the following functions should be preceded by a call to `Sema.addReferenceEntry`: * `Zcu.ensureDeclAnalyzed` * `Sema.ensureDeclAnalyzed` * `Zcu.ensureFuncBodyAnalyzed` * `Zcu.ensureFuncBodyAnalysisQueued` This is not just important for error messages, but also more fundamentally for incremental compilation. When an incremental update occurs, we must determine whether any `AnalUnit` has become unreferenced: in this case, we should ignore its associated error messages, and perhaps even remove it from the binary. For this reason, we no longer store only one reference to every `AnalUnit`, but every reference. At the end of an update, `Zcu.resolveReferences` will construct the reverse mapping, and as such identify which `AnalUnit`s are still referenced. The current implementation doesn't quite do what we need for incremental compilation here, but the framework is in place. Note that `Zcu.resolveReferences` does constitute a non-trivial amount of work on every incremental update. However, for incremental compilation, this work -- which will effectively be a graph traversal over all `AnalUnit` references -- seems strictly necessary. At the moment, this work is only done if the `Zcu` has any errors, when collecting them into the final `ErrorBundle`. An unsolved problem here is how to represent inline function calls in the reference trace. If `foo` performs an inline call to `bar` which references `qux`, then ideally, `bar` would be shown on the reference trace between `foo` and `qux`, but this is not currently the case. The solution here is probably for `Zcu.Reference` to store information about the source locations of active inline calls betweeen the referencer and its reference.

This change modifies `Zcu.ErrorMsg` to store a `Zcu.LazySrcLoc` rather than a `Zcu.SrcLoc`. Everything else is dominoes. The reason for this change is incremental compilation. If a failed `AnalUnit` is up-to-date on an update, we want to re-use the old error messages. However, the file containing the error location may have been modified, and `SrcLoc` cannot survive such a modification. `LazySrcLoc` is designed to be correct across incremental updates. Therefore, we defer source location resolution until `Compilation` gathers the compile errors into the `ErrorBundle`.

I'm so sorry. This commit was just meant to be making all types fully resolve by queueing resolution at the moment of their creation. Unfortunately, a lot of dominoes ended up falling. Here's what happened: * I added a work queue job to fully resolve a type. * I realised that from here we could eliminate `Sema.types_to_resolve` if we made function codegen a separate job. This is desirable for simplicity of both spec and implementation. * This led to a new AIR traversal to detect whether any required type is unresolved. If a type in the AIR failed to resolve, then we can't run codegen. * Because full type resolution now occurs by the work queue job, a bug was exposed whereby error messages for type resolution were associated with the wrong `Decl`, resulting in duplicate error messages when the type was also resolved "by" its owner `Decl` (which really *all* resolution should be done on). * A correct fix for this requires using a different `Sema` when performing type resolution: we need a `Sema` owned by the type. Also note that this fix is necessary for incremental compilation. * This means a whole bunch of functions no longer need to take `Sema`s. * First-order effects: `resolveTypeFields`, `resolveTypeLayout`, etc * Second-order effects: `Type.abiAlignmentAdvanced`, `Value.orderAgainstZeroAdvanced`, etc The end result of this is, in short, a more correct compiler and a simpler language specification. This regressed a few error notes in the test cases, but nothing that seems worth blocking this change. Oh, also, I ripped out the old code in `test/src/Cases.zig` which introduced a dependency on `Compilation`. This dependency was problematic at best, and this code has been unused for a while. When we re-enable incremental test cases, we must rewrite their executor to use the compiler server protocol.

Note that the `_ = Address` statements in tests previously were a nop, and now actually check that the type is valid. However, on WASI, the type is *not* valid.

andrewrk

Great work. I have some review feedback, but in the interest of forward progress, I'll proceed with the merge regardless, and you can decide how you want to incorporate the feedback or not in your upcoming changes.

andrewrk · 2024-07-04T23:18:23Z

src/Zcu.zig

+    defer {
+        for (decl_exports.values()) |*exports| {
+            exports.deinit(gpa);
+        }
+        decl_exports.deinit(gpa);
+        for (value_exports.values()) |*exports| {
+            exports.deinit(gpa);
+        }
+        value_exports.deinit(gpa);
+    }


since defer expressions are duplicated at every try and return, it can be beneficial for icache and binary size to put more complex cleanup logic in a function.

Afaiu, having an explicit function scope makes it easier for compiler passes to decide whether to inline or deduplicate the code at every callsite/jump to it.
Then is the current behavior of always inlining/duplicating the right default?
Would there be a downside to changing it to automatically pack defers into functions/blocks? (If so, it could also decide with a simple heuristic like statement count, or maybe whether block syntax is used.)

andrewrk · 2024-07-04T23:32:32Z

src/Zcu.zig

+/// TODO: in future, this must be adapted to traverse from roots of analysis. That way, we can
+/// use the returned map to determine which units have become unreferenced in an incremental update.


why bother determining which units have become unreferenced? we could rely on garbage collection for that. When garbage collection does not occur, there will be extra functions inside the output binary - dead code, but not hurting anything. But that could be useful if those functions become referenced again, saving work.

In the case where all analysis is succeeded, that's fair -- we could probably avoid this work in that case. However, this is necessary at least in the case where there are entries in failed_analysis, because we need to know whether the compile error in question is correct to emit, or whether it's just sitting around from a previous update.

andrewrk · 2024-07-04T23:52:18Z

src/Compilation.zig

+        /// This `Air` is owned by the `Job` and allocated with `gpa`.
+        /// It must be deinited when the job is processed.
+        air: Air,


Nice. One step closer to doing linking/codegen on a separate thread. I think this significantly increases the size of the job queue but probably no big deal.

It is worth noting that it's important for codegen jobs to go onto the main work queue to make sure type resolution is done before codegen begins. However, once we want to thread codegen, the processing of this job can just grab the Air and put it onto the queue for the codegen thread, so the main thread can continue with analysis straight away.

andrewrk · 2024-07-04T23:56:34Z

src/Sema.zig

-pub fn ptrType(sema: *Sema, info: InternPool.Key.PtrType) CompileError!Type {
-    if (info.flags.alignment != .none) {
-        _ = try sema.typeAbiAlignment(Type.fromInterned(info.child));
-    }
-    return sema.mod.ptrType(info);
-}
-


Are you sure there is no dependency on sema here? What about reference traces? It seems like without this we lose track of which AnalUnit caused the type resolution to occur.

Well, since all types resolve fully anyway, we don't really care which AnalUnit caused the early resolution, no? We get a valid reference trace regardless.

This change does, I suppose, make error reporting in subtle dependency-loop-esque cases more awkward. That's something we can workshop a solution to when we determine in practice which errors (if any) are not sufficiently detailed.

andrewrk · 2024-07-05T00:02:10Z

src/Zcu.zig

    defer codegen_prog_node.end();

-    if (comp.bin_file) |lf| {
+    if (!air.typesFullyResolved(zcu)) {


Since you have untangled types from Sema instances, maybe it's not so bad to detect this failure in the backends, avoiding this pass entirely. It seems a shame to have a whole pass that accomplishes nothing in the success case.

Yep, that might be fair. We can investigate that in the future -- indeed, if this branch had a negative performance impact, I was going to investigate that before merge.

andrewrk · 2025-03-04T03:33:39Z

skipping inclusion in release notes; unclear what should be copy pasted

mlugg requested a review from Snektron as a code owner July 4, 2024 04:32

mlugg force-pushed the the-great-decl-split branch from ddbc4f2 to 7555561 Compare July 4, 2024 04:34

mlugg added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. labels Jul 4, 2024

mlugg force-pushed the the-great-decl-split branch from 7555561 to 5761544 Compare July 4, 2024 05:07

mlugg force-pushed the the-great-decl-split branch from 1f54304 to 016d42e Compare July 4, 2024 07:20

mlugg force-pushed the the-great-decl-split branch 2 times, most recently from 1174b3a to eae6c85 Compare July 4, 2024 10:37

mlugg force-pushed the the-great-decl-split branch from eae6c85 to cda6f55 Compare July 4, 2024 20:01

mlugg and others added 11 commits July 4, 2024 21:01

compiler: rename AnalSubject to AnalUnit

bc8cd13

I meant to call it this originally, I just got mixed up -- sorry!

compiler: type.zig -> Type.zig

2f0f1ef

cbe: fix for export changes

00da182

std: avoid references that trigger compile errors

eae9aa8

Note that the `_ = Address` statements in tests previously were a nop, and now actually check that the type is valid. However, on WASI, the type is *not* valid.

Sema: add missing references

a5d5c09

cbe: don't mark exported values/Decls as extern

cda6f55

andrewrk approved these changes Jul 5, 2024

View reviewed changes

andrewrk merged commit 790b842 into ziglang:master Jul 5, 2024

mlugg mentioned this pull request Mar 13, 2025

0.14 regression: Accessing decl forces Struct fields to be analyzed #23219

Closed

mlugg deleted the the-great-decl-split branch May 18, 2025 20:00

		/// TODO: in future, this must be adapted to traverse from roots of analysis. That way, we can
		/// use the returned map to determine which units have become unreferenced in an incremental update.

Uh oh!

refactors ad infinitum #20494

refactors ad infinitum #20494

Uh oh!

Conversation

mlugg commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlugg commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlugg commented Jul 4, 2024

Performance Data Points

Analyze Behavior

Compile Behavior (x86_64 selfhosted)

Analyze Compiler

Uh oh!

mlugg commented Jul 4, 2024

Uh oh!

andrewrk commented Jul 4, 2024

Uh oh!

andrewrk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewrk commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mlugg commented Jul 4, 2024 •

edited

Loading

mlugg commented Jul 4, 2024 •

edited

Loading