-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
refactors ad infinitum #20494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactors ad infinitum #20494
Conversation
ddbc4f2 to
7555561
Compare
|
The breaking change here is full type resolution. I previously got the go-ahead from Andrew to make this change without a proposal if it passes the existing tests. It means that, for instance, code like this is now disallowed: const S = struct {
x: 123, // nonsensical field type
};
comptime {
_ = S; // previously didn't resolve S at all; now does!
} |
7555561 to
5761544
Compare
Performance Data PointsAnalyze BehaviorCompile Behavior (x86_64 selfhosted)Analyze CompilerOverall, it's no worse. Somehow, even though we're surely doing strictly more work, analysis is sometimes faster. (The builds differ only by commits in this branch, and are both identically-created native builds!) |
1f54304 to
016d42e
Compare
|
Oh, I did forget to look at the peak RSS in those results. It's a little worse across the board, and noticably worse when rebuilding the compiler. That's to be expected: we're tracking a lot more state. However, this additional state is necessary for correct incremental compilation, so I don't think this is worth worrying about. Also note that I wasn't using LLVM in those builds, so the absolute peak RSS numbers are relatively low -- that +15% when building the compiler is only actually about 40M more! |
1174b3a to
eae6c85
Compare
|
CI failure looks like some linker code got accidentally referenced by an "only_c" build when ensuring that our wasm bootstrap process succeeds. Likely just a missing |
eae6c85 to
cda6f55
Compare
I meant to call it this originally, I just got mixed up -- sorry!
This commit reworks our representation of exported Decls and values in Zcu to be memory-optimized and trivially serialized. All exports are now stored in the `all_exports` array on `Zcu`. An `AnalUnit` which performs an export (either through an `export` annotation or by containing an analyzed `@export`) gains an entry into `single_exports` if it performs only one export, or `multi_exports` if it performs multiple. We no longer store a persistent mapping from a `Decl`/value to all exports of that entity; this state is not necessary for the majority of the pipeline. Instead, we construct it in `Zcu.processExports`, just before flush. This does not affect the algorithmic complexity of `processExports`, since this function already iterates all exports in the `Zcu`. The elimination of `decl_exports` and `value_exports` led to a few non-trivial backend changes. The LLVM backend has been wrangled into a more reasonable state in general regarding exports and externs. The C backend is currently disabled in this commit, because its support for `export` was quite broken, and that was exposed by this work -- I'm hoping @jacobly0 will be able to pick this up!
This change seeks to more appropriately model the way semantic analysis works by drawing a more clear line between errors emitted by analyzing a `Decl` (in future a `Cau`) and errors emitted by analyzing a runtime function. This does change a few compile errors surrounding compile logs by adding more "also here" notes. The new notes are more technically correct, but perhaps not so helpful. They're not doing enough harm for me to put extensive thought into this for now.
Previously, `reference_table` mapped from a `Decl` being referenced to the `Decl` that performed the reference. This is convenient for constructing error messages, but problematic for incremental compilation. This is because on an incremental update, we want to efficiently remove all references triggered by an `AnalUnit` which is being re-analyzed. For this reason, `reference_table` now maps the other way: from the `AnalUnit` *performing* the reference, to the `AnalUnit` whose analysis was triggered. As a general rule, any call to any of the following functions should be preceded by a call to `Sema.addReferenceEntry`: * `Zcu.ensureDeclAnalyzed` * `Sema.ensureDeclAnalyzed` * `Zcu.ensureFuncBodyAnalyzed` * `Zcu.ensureFuncBodyAnalysisQueued` This is not just important for error messages, but also more fundamentally for incremental compilation. When an incremental update occurs, we must determine whether any `AnalUnit` has become unreferenced: in this case, we should ignore its associated error messages, and perhaps even remove it from the binary. For this reason, we no longer store only one reference to every `AnalUnit`, but every reference. At the end of an update, `Zcu.resolveReferences` will construct the reverse mapping, and as such identify which `AnalUnit`s are still referenced. The current implementation doesn't quite do what we need for incremental compilation here, but the framework is in place. Note that `Zcu.resolveReferences` does constitute a non-trivial amount of work on every incremental update. However, for incremental compilation, this work -- which will effectively be a graph traversal over all `AnalUnit` references -- seems strictly necessary. At the moment, this work is only done if the `Zcu` has any errors, when collecting them into the final `ErrorBundle`. An unsolved problem here is how to represent inline function calls in the reference trace. If `foo` performs an inline call to `bar` which references `qux`, then ideally, `bar` would be shown on the reference trace between `foo` and `qux`, but this is not currently the case. The solution here is probably for `Zcu.Reference` to store information about the source locations of active inline calls betweeen the referencer and its reference.
This change modifies `Zcu.ErrorMsg` to store a `Zcu.LazySrcLoc` rather than a `Zcu.SrcLoc`. Everything else is dominoes. The reason for this change is incremental compilation. If a failed `AnalUnit` is up-to-date on an update, we want to re-use the old error messages. However, the file containing the error location may have been modified, and `SrcLoc` cannot survive such a modification. `LazySrcLoc` is designed to be correct across incremental updates. Therefore, we defer source location resolution until `Compilation` gathers the compile errors into the `ErrorBundle`.
I'm so sorry. This commit was just meant to be making all types fully resolve by queueing resolution at the moment of their creation. Unfortunately, a lot of dominoes ended up falling. Here's what happened: * I added a work queue job to fully resolve a type. * I realised that from here we could eliminate `Sema.types_to_resolve` if we made function codegen a separate job. This is desirable for simplicity of both spec and implementation. * This led to a new AIR traversal to detect whether any required type is unresolved. If a type in the AIR failed to resolve, then we can't run codegen. * Because full type resolution now occurs by the work queue job, a bug was exposed whereby error messages for type resolution were associated with the wrong `Decl`, resulting in duplicate error messages when the type was also resolved "by" its owner `Decl` (which really *all* resolution should be done on). * A correct fix for this requires using a different `Sema` when performing type resolution: we need a `Sema` owned by the type. Also note that this fix is necessary for incremental compilation. * This means a whole bunch of functions no longer need to take `Sema`s. * First-order effects: `resolveTypeFields`, `resolveTypeLayout`, etc * Second-order effects: `Type.abiAlignmentAdvanced`, `Value.orderAgainstZeroAdvanced`, etc The end result of this is, in short, a more correct compiler and a simpler language specification. This regressed a few error notes in the test cases, but nothing that seems worth blocking this change. Oh, also, I ripped out the old code in `test/src/Cases.zig` which introduced a dependency on `Compilation`. This dependency was problematic at best, and this code has been unused for a while. When we re-enable incremental test cases, we must rewrite their executor to use the compiler server protocol.
Note that the `_ = Address` statements in tests previously were a nop, and now actually check that the type is valid. However, on WASI, the type is *not* valid.
andrewrk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I have some review feedback, but in the interest of forward progress, I'll proceed with the merge regardless, and you can decide how you want to incorporate the feedback or not in your upcoming changes.
| defer { | ||
| for (decl_exports.values()) |*exports| { | ||
| exports.deinit(gpa); | ||
| } | ||
| decl_exports.deinit(gpa); | ||
| for (value_exports.values()) |*exports| { | ||
| exports.deinit(gpa); | ||
| } | ||
| value_exports.deinit(gpa); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since defer expressions are duplicated at every try and return, it can be beneficial for icache and binary size to put more complex cleanup logic in a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaiu, having an explicit function scope makes it easier for compiler passes to decide whether to inline or deduplicate the code at every callsite/jump to it.
Then is the current behavior of always inlining/duplicating the right default?
Would there be a downside to changing it to automatically pack defers into functions/blocks? (If so, it could also decide with a simple heuristic like statement count, or maybe whether block syntax is used.)
| /// TODO: in future, this must be adapted to traverse from roots of analysis. That way, we can | ||
| /// use the returned map to determine which units have become unreferenced in an incremental update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother determining which units have become unreferenced? we could rely on garbage collection for that. When garbage collection does not occur, there will be extra functions inside the output binary - dead code, but not hurting anything. But that could be useful if those functions become referenced again, saving work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where all analysis is succeeded, that's fair -- we could probably avoid this work in that case. However, this is necessary at least in the case where there are entries in failed_analysis, because we need to know whether the compile error in question is correct to emit, or whether it's just sitting around from a previous update.
| /// This `Air` is owned by the `Job` and allocated with `gpa`. | ||
| /// It must be deinited when the job is processed. | ||
| air: Air, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. One step closer to doing linking/codegen on a separate thread. I think this significantly increases the size of the job queue but probably no big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth noting that it's important for codegen jobs to go onto the main work queue to make sure type resolution is done before codegen begins. However, once we want to thread codegen, the processing of this job can just grab the Air and put it onto the queue for the codegen thread, so the main thread can continue with analysis straight away.
| pub fn ptrType(sema: *Sema, info: InternPool.Key.PtrType) CompileError!Type { | ||
| if (info.flags.alignment != .none) { | ||
| _ = try sema.typeAbiAlignment(Type.fromInterned(info.child)); | ||
| } | ||
| return sema.mod.ptrType(info); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure there is no dependency on sema here? What about reference traces? It seems like without this we lose track of which AnalUnit caused the type resolution to occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since all types resolve fully anyway, we don't really care which AnalUnit caused the early resolution, no? We get a valid reference trace regardless.
This change does, I suppose, make error reporting in subtle dependency-loop-esque cases more awkward. That's something we can workshop a solution to when we determine in practice which errors (if any) are not sufficiently detailed.
| defer codegen_prog_node.end(); | ||
|
|
||
| if (comp.bin_file) |lf| { | ||
| if (!air.typesFullyResolved(zcu)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have untangled types from Sema instances, maybe it's not so bad to detect this failure in the backends, avoiding this pass entirely. It seems a shame to have a whole pass that accomplishes nothing in the success case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that might be fair. We can investigate that in the future -- indeed, if this branch had a negative performance impact, I was going to investigate that before merge.
|
skipping inclusion in release notes; unclear what should be copy pasted |
Okay, a bunch of boring changes here, although the last commit ended up being pretty bulky. Perf measurements coming soon (tm). Here's a summary of the changes:
AnalSubject->AnalUnit. I meant to call it this in the first place, I just confused myself. (Any comments bikeshedding this naming will be hidden; I do not care.)The frontend now has a better representation of exports. Rather than multiple hashmaps containing pointers, I observed that we only actually need the map formerly known as
export_owners. By also making the hashmap value an index+length into a larger list rather than anArrayListUnmanaged, we get a much better memory layout which is more friendly to the CPU and better for serialization. Exports are not quite serializable yet, becauseExportitself isn't serializable, but the hard bit is done.The above task also led to a surprising amount of backend changes. This includes minor fixes in some of the self-hosted linkers, a complete rework of how exports are handled in the C backend to deal with
@exportcorrectly (thanks @jacobly0!), and a slight refactor of how exports and externs are handled in the LLVM backend to make it a little less broken.Move a few datastructures towards
AnalUnitrather thanDeclIndex. When we're trying to key something on a "unit of semantic analysis",AnalUnitis the correct abstraction, because even though a function has an ownerDecl, the function has two analyses associated with it: analysis of the function declaration (essentially its signature), and analysis of the function body as a runtime function. Also note that after The Great Decl Split, a generic function instantiation will not have aCauassociated with it, since analysis of the instance's signature occurs a; at callsites and b; before the instance even exists! Anyways, the main subject of this change is compile errors --failed_declsbecomesfailed_analysiskeyed onAnalUnit.The reference trace is now tracked very differently. Previously, we had a
reference_tableinZcuwhich mapped from aDeclto theDeclthat referenced it. Firstly, there's a bit of conflation ofDeclwithAnalUnithere, but more importantly, this has a problem when we consider incremental compilation: what if the "referencer" in the table is not actually referenced within the compilation after an update? To give a valid reference trace, we need to be aware of all references, so we know all potential "paths" to something being referenced. In fact, storing these gives us a way to determine what is referenced at the end of an incremental update.So, here's how the system works now.
reference_tablemaps from theAnalUnitperforming a reference to theAnalUnitit references (more precisely, that it triggers semantic analysis of -- immediately or by queuing, it doesn't matter). In fact, it maps to au32index intoall_references, which containsReferenceobjects, which form a linked list using thenextfield (which is another index intoall_references). During analysis, when something is analyzed, its entry inreference_tableis dropped, since these references will be rediscovered (the corresponding indices inall_referencesare put into a freelist). After the main work queue is flushed during an update,Zcu.resolveReferencesessentially reverses the mapping, returning a hashmap from theAnalUnitthat is referenced to theAnalUnitthat performed the reference. (We don't persist this generally because it's relatively cheap to just construct at the end of compilation and there's a heavy cost associated with maintaining it across incremental updates.) Since we don't have access during this map in the middle of the update (duringSema.failWithOwnedErrorMsg), resolution of the reference trace is deferred untilCompilationgathers errors into the finalErrorBundle.In future,
Zcu.resolveReferenceswill do a full traversal, starting from the roots of analysis (the main one being the root file ofstd). That way, we can learn after an incremental update whether anyDeclis now unreferenced, and if so, skip its stored error messages (and potentially delete it from the binary).Zcu.ErrorMsgno longer stores aSrcLoc, but instead aLazySrcLoc. The source location is resolved when the error message is emitted. This is to make it correct across incremental updates.type.zigbecomesType.zig;@import("type.zig").Typeis now@import("Type.zig"). Self-explanatory.All types are now queued for full resolution at the moment their definition is analyzed. This simplifies the language specification, particularly in regards to incremental compilation.
Code generation of functions is now queued into a separate job after the function body is analyzed. This allows us to delete
types_to_resolveand eliminate all "AIR instruction X triggers type resolution level Y" rules, simplifying the language spec and implementation.Type resolution no longer requires a
Sema, because we construct theSemausing the type's ownerDecl. This is a necessary change with far-reaching consequences in terms of changing function signatures. It fixes bugs exposed by the previous couple of bullet-points, and is necessary for incremental compilation.The test case creation logic -- and thus the compiler's build script -- no longer depend on
src/Compilation.zig. This became a problem because this new logic causedCompilationto be fully resolved in the build script, so@import("build_options")was analyzed due to its fields, so the build script could not compile. This is fixed by removing this legacy (and quite broken) dependency of the compiler's build script on a random part of its codebase.