Skip to content

Conversation

@mlugg
Copy link
Member

@mlugg mlugg commented Jul 4, 2024

Okay, a bunch of boring changes here, although the last commit ended up being pretty bulky. Perf measurements coming soon (tm). Here's a summary of the changes:

  • AnalSubject -> AnalUnit. I meant to call it this in the first place, I just confused myself. (Any comments bikeshedding this naming will be hidden; I do not care.)

  • The frontend now has a better representation of exports. Rather than multiple hashmaps containing pointers, I observed that we only actually need the map formerly known as export_owners. By also making the hashmap value an index+length into a larger list rather than an ArrayListUnmanaged, we get a much better memory layout which is more friendly to the CPU and better for serialization. Exports are not quite serializable yet, because Export itself isn't serializable, but the hard bit is done.

  • The above task also led to a surprising amount of backend changes. This includes minor fixes in some of the self-hosted linkers, a complete rework of how exports are handled in the C backend to deal with @export correctly (thanks @jacobly0!), and a slight refactor of how exports and externs are handled in the LLVM backend to make it a little less broken.

  • Move a few datastructures towards AnalUnit rather than DeclIndex. When we're trying to key something on a "unit of semantic analysis", AnalUnit is the correct abstraction, because even though a function has an owner Decl, the function has two analyses associated with it: analysis of the function declaration (essentially its signature), and analysis of the function body as a runtime function. Also note that after The Great Decl Split, a generic function instantiation will not have a Cau associated with it, since analysis of the instance's signature occurs a; at callsites and b; before the instance even exists! Anyways, the main subject of this change is compile errors -- failed_decls becomes failed_analysis keyed on AnalUnit.

  • The reference trace is now tracked very differently. Previously, we had a reference_table in Zcu which mapped from a Decl to the Decl that referenced it. Firstly, there's a bit of conflation of Decl with AnalUnit here, but more importantly, this has a problem when we consider incremental compilation: what if the "referencer" in the table is not actually referenced within the compilation after an update? To give a valid reference trace, we need to be aware of all references, so we know all potential "paths" to something being referenced. In fact, storing these gives us a way to determine what is referenced at the end of an incremental update.

    So, here's how the system works now. reference_table maps from the AnalUnit performing a reference to the AnalUnit it references (more precisely, that it triggers semantic analysis of -- immediately or by queuing, it doesn't matter). In fact, it maps to a u32 index into all_references, which contains Reference objects, which form a linked list using the next field (which is another index into all_references). During analysis, when something is analyzed, its entry in reference_table is dropped, since these references will be rediscovered (the corresponding indices in all_references are put into a freelist). After the main work queue is flushed during an update, Zcu.resolveReferences essentially reverses the mapping, returning a hashmap from the AnalUnit that is referenced to the AnalUnit that performed the reference. (We don't persist this generally because it's relatively cheap to just construct at the end of compilation and there's a heavy cost associated with maintaining it across incremental updates.) Since we don't have access during this map in the middle of the update (during Sema.failWithOwnedErrorMsg), resolution of the reference trace is deferred until Compilation gathers errors into the final ErrorBundle.

    In future, Zcu.resolveReferences will do a full traversal, starting from the roots of analysis (the main one being the root file of std). That way, we can learn after an incremental update whether any Decl is now unreferenced, and if so, skip its stored error messages (and potentially delete it from the binary).

  • Zcu.ErrorMsg no longer stores a SrcLoc, but instead a LazySrcLoc. The source location is resolved when the error message is emitted. This is to make it correct across incremental updates.

  • type.zig becomes Type.zig; @import("type.zig").Type is now @import("Type.zig"). Self-explanatory.

  • All types are now queued for full resolution at the moment their definition is analyzed. This simplifies the language specification, particularly in regards to incremental compilation.

  • Code generation of functions is now queued into a separate job after the function body is analyzed. This allows us to delete types_to_resolve and eliminate all "AIR instruction X triggers type resolution level Y" rules, simplifying the language spec and implementation.

  • Type resolution no longer requires a Sema, because we construct the Sema using the type's owner Decl. This is a necessary change with far-reaching consequences in terms of changing function signatures. It fixes bugs exposed by the previous couple of bullet-points, and is necessary for incremental compilation.

  • The test case creation logic -- and thus the compiler's build script -- no longer depend on src/Compilation.zig. This became a problem because this new logic caused Compilation to be fully resolved in the build script, so @import("build_options") was analyzed due to its fields, so the build script could not compile. This is fixed by removing this legacy (and quite broken) dependency of the compiler's build script on a random part of its codebase.

@mlugg mlugg requested a review from Snektron as a code owner July 4, 2024 04:32
@mlugg mlugg force-pushed the the-great-decl-split branch from ddbc4f2 to 7555561 Compare July 4, 2024 04:34
@mlugg mlugg added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. labels Jul 4, 2024
@mlugg
Copy link
Member Author

mlugg commented Jul 4, 2024

The breaking change here is full type resolution. I previously got the go-ahead from Andrew to make this change without a proposal if it passes the existing tests. It means that, for instance, code like this is now disallowed:

const S = struct {
    x: 123, // nonsensical field type
};
comptime {
    _ = S; // previously didn't resolve S at all; now does!
}

@mlugg mlugg force-pushed the the-great-decl-split branch from 7555561 to 5761544 Compare July 4, 2024 05:07
@mlugg
Copy link
Member Author

mlugg commented Jul 4, 2024

Performance Data Points

Analyze Behavior

Benchmark 1 (51 runs): ../master/stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-emit-bin --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           395ms ± 12.4ms     384ms …  435ms          4 ( 8%)        0%
  peak_rss            106MB ±  195KB     105MB …  106MB          1 ( 2%)        0%
  cpu_cycles         1.65G  ± 48.9M     1.61G  … 1.81G           5 (10%)        0%
  instructions       2.99G  ± 8.15K     2.99G  … 2.99G           2 ( 4%)        0%
  cache_references    167M  ± 1.60M      165M  …  173M           4 ( 8%)        0%
  cache_misses       7.63M  ±  315K     7.25M  … 8.83M           3 ( 6%)        0%
  branch_misses      7.06M  ± 70.8K     6.93M  … 7.34M           2 ( 4%)        0%
Benchmark 2 (54 runs): ./stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-emit-bin --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           377ms ± 4.00ms     372ms …  394ms          2 ( 4%)        ⚡-  4.6% ±  0.9%
  peak_rss            109MB ±  225KB     108MB …  109MB          0 ( 0%)        💩+  2.4% ±  0.1%
  cpu_cycles         1.55G  ± 14.9M     1.54G  … 1.62G           3 ( 6%)        ⚡-  5.7% ±  0.8%
  instructions       2.82G  ± 8.33K     2.82G  … 2.82G           0 ( 0%)        ⚡-  5.6% ±  0.0%
  cache_references    162M  ±  885K      160M  …  165M           1 ( 2%)        ⚡-  3.0% ±  0.3%
  cache_misses       7.85M  ±  169K     7.53M  … 8.29M           0 ( 0%)        💩+  2.8% ±  1.3%
  branch_misses      6.77M  ± 42.6K     6.67M  … 6.85M           0 ( 0%)        ⚡-  4.1% ±  0.3%

Compile Behavior (x86_64 selfhosted)

Benchmark 1 (23 runs): ../master/stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           907ms ± 18.6ms     892ms …  976ms          2 ( 9%)        0%
  peak_rss            122MB ±  182KB     121MB …  122MB          0 ( 0%)        0%
  cpu_cycles         3.59G  ± 71.9M     3.55G  … 3.85G           3 (13%)        0%
  instructions       6.56G  ± 49.4K     6.56G  … 6.56G           0 ( 0%)        0%
  cache_references    267M  ± 1.15M      265M  …  269M           0 ( 0%)        0%
  cache_misses       36.5M  ±  569K     35.6M  … 37.9M           0 ( 0%)        0%
  branch_misses      26.7M  ± 97.8K     26.6M  … 26.9M           0 ( 0%)        0%
Benchmark 2 (23 runs): ./stage4/bin/zig test test/behavior.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           870ms ± 6.99ms     857ms …  888ms          0 ( 0%)        ⚡-  4.1% ±  0.9%
  peak_rss            125MB ±  176KB     125MB …  125MB          0 ( 0%)        💩+  2.8% ±  0.1%
  cpu_cycles         3.43G  ± 25.0M     3.40G  … 3.49G           0 ( 0%)        ⚡-  4.7% ±  0.9%
  instructions       6.62G  ± 49.3K     6.62G  … 6.62G           1 ( 4%)          +  0.9% ±  0.0%
  cache_references    250M  ± 2.05M      248M  …  256M           3 (13%)        ⚡-  6.5% ±  0.4%
  cache_misses       23.7M  ±  280K     23.3M  … 24.4M           1 ( 4%)        ⚡- 35.0% ±  0.7%
  branch_misses      22.6M  ± 96.7K     22.5M  … 22.9M           1 ( 4%)        ⚡- 15.4% ±  0.2%

Analyze Compiler

Benchmark 1 (6 runs): /home/mlugg/zig/master/stage4/bin/zig build-exe <snip>
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.37s  ± 69.1ms    5.29s  … 5.48s           0 ( 0%)        0%
  peak_rss            237MB ±  244KB     237MB …  237MB          0 ( 0%)        0%
  cpu_cycles         23.7G  ±  491M     23.3G  … 24.6G           0 ( 0%)        0%
  instructions       42.8G  ± 9.71K     42.8G  … 42.8G           0 ( 0%)        0%
  cache_references   2.13G  ± 10.1M     2.11G  … 2.15G           0 ( 0%)        0%
  cache_misses       92.3M  ± 2.94M     88.3M  … 97.1M           0 ( 0%)        0%
  branch_misses      76.8M  ±  319K     76.2M  … 77.1M           0 ( 0%)        0%
Benchmark 2 (6 runs): /home/mlugg/zig/the-great-decl-split/stage4/bin/zig build-exe <snip>
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          5.35s  ± 9.06ms    5.33s  … 5.36s           1 (17%)          -  0.4% ±  1.2%
  peak_rss            274MB ±  242KB     274MB …  274MB          0 ( 0%)        💩+ 15.7% ±  0.1%
  cpu_cycles         23.4G  ± 36.6M     23.3G  … 23.4G           0 ( 0%)          -  1.5% ±  1.9%
  instructions       42.4G  ± 13.1K     42.4G  … 42.4G           0 ( 0%)          -  1.0% ±  0.0%
  cache_references   2.12G  ± 4.85M     2.11G  … 2.12G           0 ( 0%)          -  0.7% ±  0.5%
  cache_misses       98.0M  ± 2.26M     95.2M  …  101M           0 ( 0%)        💩+  6.2% ±  3.7%
  branch_misses      75.1M  ±  445K     74.8M  … 76.0M           0 ( 0%)        ⚡-  2.2% ±  0.6%

Overall, it's no worse. Somehow, even though we're surely doing strictly more work, analysis is sometimes faster. (The builds differ only by commits in this branch, and are both identically-created native builds!)

@mlugg mlugg force-pushed the the-great-decl-split branch from 1f54304 to 016d42e Compare July 4, 2024 07:20
@mlugg
Copy link
Member Author

mlugg commented Jul 4, 2024

Oh, I did forget to look at the peak RSS in those results. It's a little worse across the board, and noticably worse when rebuilding the compiler.

That's to be expected: we're tracking a lot more state. However, this additional state is necessary for correct incremental compilation, so I don't think this is worth worrying about. Also note that I wasn't using LLVM in those builds, so the absolute peak RSS numbers are relatively low -- that +15% when building the compiler is only actually about 40M more!

@mlugg mlugg force-pushed the the-great-decl-split branch 2 times, most recently from 1174b3a to eae6c85 Compare July 4, 2024 10:37
@andrewrk
Copy link
Member

andrewrk commented Jul 4, 2024

CI failure looks like some linker code got accidentally referenced by an "only_c" build when ensuring that our wasm bootstrap process succeeds. Likely just a missing if to turn some branch into a no-op when build_options.only_c is true.

@mlugg mlugg force-pushed the the-great-decl-split branch from eae6c85 to cda6f55 Compare July 4, 2024 20:01
mlugg and others added 11 commits July 4, 2024 21:01
I meant to call it this originally, I just got mixed up -- sorry!
This commit reworks our representation of exported Decls and values in
Zcu to be memory-optimized and trivially serialized.

All exports are now stored in the `all_exports` array on `Zcu`. An
`AnalUnit` which performs an export (either through an `export`
annotation or by containing an analyzed `@export`) gains an entry into
`single_exports` if it performs only one export, or `multi_exports` if
it performs multiple.

We no longer store a persistent mapping from a `Decl`/value to all
exports of that entity; this state is not necessary for the majority of
the pipeline. Instead, we construct it in `Zcu.processExports`, just
before flush. This does not affect the algorithmic complexity of
`processExports`, since this function already iterates all exports in
the `Zcu`.

The elimination of `decl_exports` and `value_exports` led to a few
non-trivial backend changes. The LLVM backend has been wrangled into a
more reasonable state in general regarding exports and externs. The C
backend is currently disabled in this commit, because its support for
`export` was quite broken, and that was exposed by this work -- I'm
hoping @jacobly0 will be able to pick this up!
This change seeks to more appropriately model the way semantic analysis
works by drawing a more clear line between errors emitted by analyzing a
`Decl` (in future a `Cau`) and errors emitted by analyzing a runtime
function.

This does change a few compile errors surrounding compile logs by adding
more "also here" notes. The new notes are more technically correct, but
perhaps not so helpful. They're not doing enough harm for me to put
extensive thought into this for now.
Previously, `reference_table` mapped from a `Decl` being referenced to
the `Decl` that performed the reference. This is convenient for
constructing error messages, but problematic for incremental
compilation. This is because on an incremental update, we want to
efficiently remove all references triggered by an `AnalUnit` which is
being re-analyzed.

For this reason, `reference_table` now maps the other way: from the
`AnalUnit` *performing* the reference, to the `AnalUnit` whose analysis
was triggered. As a general rule, any call to any of the following
functions should be preceded by a call to `Sema.addReferenceEntry`:

* `Zcu.ensureDeclAnalyzed`
* `Sema.ensureDeclAnalyzed`
* `Zcu.ensureFuncBodyAnalyzed`
* `Zcu.ensureFuncBodyAnalysisQueued`

This is not just important for error messages, but also more
fundamentally for incremental compilation. When an incremental update
occurs, we must determine whether any `AnalUnit` has become
unreferenced: in this case, we should ignore its associated error
messages, and perhaps even remove it from the binary. For this reason,
we no longer store only one reference to every `AnalUnit`, but every
reference. At the end of an update, `Zcu.resolveReferences` will
construct the reverse mapping, and as such identify which `AnalUnit`s
are still referenced. The current implementation doesn't quite do what
we need for incremental compilation here, but the framework is in place.

Note that `Zcu.resolveReferences` does constitute a non-trivial amount
of work on every incremental update. However, for incremental
compilation, this work -- which will effectively be a graph traversal
over all `AnalUnit` references -- seems strictly necessary. At the
moment, this work is only done if the `Zcu` has any errors, when
collecting them into the final `ErrorBundle`.

An unsolved problem here is how to represent inline function calls in
the reference trace. If `foo` performs an inline call to `bar` which
references `qux`, then ideally, `bar` would be shown on the reference
trace between `foo` and `qux`, but this is not currently the case. The
solution here is probably for `Zcu.Reference` to store information about
the source locations of active inline calls betweeen the referencer and
its reference.
This change modifies `Zcu.ErrorMsg` to store a `Zcu.LazySrcLoc` rather
than a `Zcu.SrcLoc`. Everything else is dominoes.

The reason for this change is incremental compilation. If a failed
`AnalUnit` is up-to-date on an update, we want to re-use the old error
messages. However, the file containing the error location may have been
modified, and `SrcLoc` cannot survive such a modification. `LazySrcLoc`
is designed to be correct across incremental updates. Therefore, we
defer source location resolution until `Compilation` gathers the compile
errors into the `ErrorBundle`.
I'm so sorry.

This commit was just meant to be making all types fully resolve by
queueing resolution at the moment of their creation. Unfortunately, a
lot of dominoes ended up falling. Here's what happened:

* I added a work queue job to fully resolve a type.
* I realised that from here we could eliminate `Sema.types_to_resolve`
  if we made function codegen a separate job. This is desirable for
  simplicity of both spec and implementation.
* This led to a new AIR traversal to detect whether any required type is
  unresolved. If a type in the AIR failed to resolve, then we can't run
  codegen.
* Because full type resolution now occurs by the work queue job, a bug
  was exposed whereby error messages for type resolution were associated
  with the wrong `Decl`, resulting in duplicate error messages when the
  type was also resolved "by" its owner `Decl` (which really *all*
  resolution should be done on).
* A correct fix for this requires using a different `Sema` when
  performing type resolution: we need a `Sema` owned by the type. Also
  note that this fix is necessary for incremental compilation.
* This means a whole bunch of functions no longer need to take `Sema`s.
  * First-order effects: `resolveTypeFields`, `resolveTypeLayout`, etc
  * Second-order effects: `Type.abiAlignmentAdvanced`, `Value.orderAgainstZeroAdvanced`, etc

The end result of this is, in short, a more correct compiler and a
simpler language specification. This regressed a few error notes in the
test cases, but nothing that seems worth blocking this change.

Oh, also, I ripped out the old code in `test/src/Cases.zig` which
introduced a dependency on `Compilation`. This dependency was
problematic at best, and this code has been unused for a while. When we
re-enable incremental test cases, we must rewrite their executor to use
the compiler server protocol.
Note that the `_ = Address` statements in tests previously were a nop,
and now actually check that the type is valid. However, on WASI, the
type is *not* valid.
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I have some review feedback, but in the interest of forward progress, I'll proceed with the merge regardless, and you can decide how you want to incorporate the feedback or not in your upcoming changes.

Comment on lines +5372 to +5381
defer {
for (decl_exports.values()) |*exports| {
exports.deinit(gpa);
}
decl_exports.deinit(gpa);
for (value_exports.values()) |*exports| {
exports.deinit(gpa);
}
value_exports.deinit(gpa);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since defer expressions are duplicated at every try and return, it can be beneficial for icache and binary size to put more complex cleanup logic in a function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaiu, having an explicit function scope makes it easier for compiler passes to decide whether to inline or deduplicate the code at every callsite/jump to it.
Then is the current behavior of always inlining/duplicating the right default?
Would there be a downside to changing it to automatically pack defers into functions/blocks? (If so, it could also decide with a simple heuristic like statement count, or maybe whether block syntax is used.)

Comment on lines +6475 to +6476
/// TODO: in future, this must be adapted to traverse from roots of analysis. That way, we can
/// use the returned map to determine which units have become unreferenced in an incremental update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother determining which units have become unreferenced? we could rely on garbage collection for that. When garbage collection does not occur, there will be extra functions inside the output binary - dead code, but not hurting anything. But that could be useful if those functions become referenced again, saving work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where all analysis is succeeded, that's fair -- we could probably avoid this work in that case. However, this is necessary at least in the case where there are entries in failed_analysis, because we need to know whether the compile error in question is correct to emit, or whether it's just sitting around from a previous update.

Comment on lines +322 to +324
/// This `Air` is owned by the `Job` and allocated with `gpa`.
/// It must be deinited when the job is processed.
air: Air,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. One step closer to doing linking/codegen on a separate thread. I think this significantly increases the size of the job queue but probably no big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth noting that it's important for codegen jobs to go onto the main work queue to make sure type resolution is done before codegen begins. However, once we want to thread codegen, the processing of this job can just grab the Air and put it onto the queue for the codegen thread, so the main thread can continue with analysis straight away.

Comment on lines 38302 to 38308
pub fn ptrType(sema: *Sema, info: InternPool.Key.PtrType) CompileError!Type {
if (info.flags.alignment != .none) {
_ = try sema.typeAbiAlignment(Type.fromInterned(info.child));
}
return sema.mod.ptrType(info);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure there is no dependency on sema here? What about reference traces? It seems like without this we lose track of which AnalUnit caused the type resolution to occur.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since all types resolve fully anyway, we don't really care which AnalUnit caused the early resolution, no? We get a valid reference trace regardless.

This change does, I suppose, make error reporting in subtle dependency-loop-esque cases more awkward. That's something we can workshop a solution to when we determine in practice which errors (if any) are not sufficiently detailed.

defer codegen_prog_node.end();

if (comp.bin_file) |lf| {
if (!air.typesFullyResolved(zcu)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have untangled types from Sema instances, maybe it's not so bad to detect this failure in the backends, avoiding this pass entirely. It seems a shame to have a whole pass that accomplishes nothing in the success case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that might be fair. We can investigate that in the future -- indeed, if this branch had a negative performance impact, I was going to investigate that before merge.

@andrewrk andrewrk merged commit 790b842 into ziglang:master Jul 5, 2024
@andrewrk
Copy link
Member

andrewrk commented Mar 4, 2025

skipping inclusion in release notes; unclear what should be copy pasted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants