v2: finish cleanc flat AST self-host migration#27390
Merged
Merged
Conversation
The flat-codegen migration drops b.files for the cleanc backend (the
post-transform FlatAst is the source of truth), but the cleanc
cached-module build path still enumerated modules/files from b.files.
With b.files empty, has_module('builtin') returned false, so the entire
cached-core split-compilation path was silently skipped: no per-module
.vh headers were written and the slower non-cached fallback ran instead.
This regressed module_storage_cache_test (passed at b6bd49b, failed
after the flat commits).
Make the cleanc cache enumeration flat-aware. When b.files is empty and
the post-transform FlatAst is present (uses_flat_module_enumeration),
source module names, file paths and imports from the flat cursors:
- has_module, collect_modules_excluding, user_entry_module_names,
expand_type_modules_with_imports (raw imports via read_file_imports),
has_external_cache_module_name_collision
- import_modules_for_cached_modules (comptime-aware
active_file_imports_from_flat_with_options)
- collect_virtual_main_modules (+ flat_file_declares_executable_main)
- source_files_for_module_name — required for external (non-vlib)
modules whose location the vlib disk-scan fallback cannot resolve;
this is what left ext.vh unwritten.
Each helper keeps its existing b.files loop for the legacy (.v/.eval)
path and normalizes module names through flat_file_module_name (dots to
underscores, 'main' default) to stay bit-identical to ast_file_module_name.
Cache bundles additionally need correct emission: the flat cleanc gen
does not yet scope _option_/_result_ wrapper-typedef emission to a
type-module subset, so it emits a type-module struct/prototype without
the wrapper typedef it references (undefined-type errors in builtin.c).
For restricted bundle generation in flat mode, rehydrate just the
bundle's type-module files from flat and drive the proven legacy gen,
which filters by physical file set. Bounded to cached-module files; the
.o is cached so this only runs on a cache miss. The unrestricted main
translation unit still uses the flat gen. Fully flat-native restricted
bundle codegen remains the follow-up.
Verified: module_storage_cache_test passes; cleanc self-build of
cmd/v2/v2.v produces a working compiler; cache-hit (.vh-parse) rebuild
works; parsed-AST stats report legacy=0; builder/transformer/cleanc/
markused/ssa/ast suites green; the three pre-existing cleanc codegen
failures are unchanged.
Compiling cmd/v2/v2.v via cleanc (flat codegen) spent ~28s in Transform, ~6.5x slower than the legacy path. Two causes, both fixed: 1) lookup_imported_var_type rebuilt its work on every call: it took the current module scope, allocated objects.keys(), SORTED them, then scanned the whole symbol table for Module (import) objects and resolved each import's scope — all to find one var across the module's imports. Called per-identifier during type propagation, this was one of the hottest functions (post_pass alone was 18s). The import-scope set is stable for a transform run, so precompute it once in cache_env_maps (build_cached_imported_module_scopes -> cached_imported_module_scopes) and just iterate the short import list. The dropped per-call sort does not change the result (the "found in exactly one import" / ambiguity semantics are order-independent). post_pass: 18s -> 2.9s, and the per-identifier lookup is now O(imports) instead of O(objects log objects). This speeds up both the flat and legacy propagation paths. 2) The parallel build branch ran the SEQUENTIAL transform_flat_to_flat_direct for flat-codegen backends, so the per-file transform never used worker threads (the legacy driver fans it out). The flat AST has no thread-safe merge primitive for workers to append into one builder, so route the parallel branch through transform_files_parallel_to_flat_via_driver and drop b.files immediately for flat-codegen backends — codegen stays flat-only; the legacy files are live only transiently during the parallel transform. The memory-critical arm64 self-host runs --no-parallel, so it takes the sequential branch and keeps the allocation-minimal flat-direct path untouched. Result: Transform on cmd/v2/v2.v 28.1s -> ~4.2s, on par with the legacy parallel path (~4.1s). Verified: cleanc self-build of cmd/v2/v2.v produces a working compiler; module_storage_cache_test, all 19 transformer tests (incl. the propagation parity guard), cleanc_test, and the builder suite pass; parsed-AST stats stay legacy=0; arm64 sequential and parallel smokes work; the three pre-existing cleanc codegen failures are unchanged.
Restricted cleanc cache-bundle generation was the last codegen path that still materialized legacy ast.File: it rehydrated the bundle's type-module files via b.flat.to_files_range(i, i+1) and drove the LEGACY cleanc gen, because the flat gen, given the whole b.flat, emitted foreign-module structs/prototypes (e.g. term__ColorConfig, os__open_file) whose _option_/_result_/Array_ wrapper typedefs are registered only while emitting that module's bodies — which a bundle does not do — producing undefined-type C errors. Fix it the same way the legacy gen did: by scoping the INPUT to the bundle's type modules. The legacy gen filtered its gen_files; the flat gen drives every one of its ~65 emission passes off flat.files, so hand it a FlatAst whose file list is restricted to the bundle's type_module_names (flat_scoped_to_modules). The node/edge/string arena is shared, not copied (V shares the array buffer on struct-literal field assignment), so this is a cheap file-list filter, not an AST rehydrate. Every pass then sees exactly the files the legacy gen saw. This removes the last to_files_range / legacy-ast.File materialization on the cleanc codegen path AND moves bundle generation onto the flat gen (previously legacy). Memory: warm cache (bundles reused) is unchanged (5.7GB); a cold self-build is +1.6GB (12.5 vs 10.9GB) because bundles now run through the flat gen — acceptable on the cleanc C-backend path (the memory-critical arm64 self-host is --no-parallel/flat-direct and never touches this), and only on a cache miss. Verified: cleanc self-build of cmd/v2/v2.v produces a working compiler with zero undefined-type errors in builtin/vlib/v2compiler/imports bundles; module_storage_cache_test and cleanc_test pass; the cleanc codegen suite is 6 passed / 3 failed (the 3 are pre-existing, unchanged); arm64 smoke unaffected.
…lat backends)
For flat-codegen backends (cleanc/c/x64/arm64), the parallel transform
(transform_files_parallel_to_flat_via_driver) ran the full legacy
file-mutating post-pass on the transformed []ast.File and threaded the
result back to the caller, which then immediately dropped it. That work is
redundant: post_pass_to_flat already applies the same edits to the flat,
and the type-propagation tail can run against the flat
(apply_post_pass_tail_from_flat) instead of the legacy files. This is
exactly what the sequential transform_flat_to_flat_direct already does.
Add a `keep_files` parameter:
- flat backends pass keep_files=false -> skip post_pass_files_with_
generated_parts, run apply_post_pass_tail_from_flat, return [] (the
transient `result` is freed in the function instead of being returned
and dropped).
- .v/eval still consume the files, so they pass keep_files=true (the
previous behavior, unchanged).
Removes the last []ast.File consumer in the post-pass tail on the flat
codegen path (advances toward deleting legacy ast) and skips a redundant
whole-program legacy post-pass + propagate_types on every default
(parallel) cleanc/c/x64 build.
Verified: cleanc self-build of cmd/v2/v2.v -> working compiler; eval
backend (keep_files=true path) prints correct output; module_storage_cache_test,
cleanc_test, all 19 transformer tests (incl. propagate_types_from_flat
parity + transformer_flat_diff), cleanc_target_e2e, native, target_os,
type_check_parallel, flat_streaming pass; -stats legacy=0; arm64 smoke ok.
…kers
The parallel transform always streams per-file from the post-parse FlatAst:
transform_files_parallel_no_post_pass_impl was only ever called with
stream_from_flat=true (both wrappers passed true), so the entire
`stream_from_flat == false` machinery — which transformed a legacy
[]ast.File input — was unreachable. Remove it:
- the `stream==false` branches in the fan-out (Windows + non-Windows,
single-thread + worker spawn), folding the impl into a flat-only
transform_files_parallel_no_post_pass (drop the bool param);
- the top-level-stmt splitting subsystem
(transform_files_parallel_top_level_stmts, TransformStmtJob/Result/
ChunkArgs, transform_stmt_chunk_thread, file/stmt_can_split_top_level,
file_transform_cost, lpt_stmt_buckets);
- the LPT cost-walkers that recursed over legacy ast.Stmt/Expr/Type
(transform_stmt_cost, transform_stmts_cost, transform_expr_cost,
transform_type_cost, top_level_transform_stmt_cost, lpt_buckets);
- the uncalled wrappers transform_files_parallel,
transform_files_parallel_from_flat,
transform_files_parallel_no_post_pass_from_flat;
- the non-flat branch of transform_chunk_thread and the now-unused
TransformChunkArgs.files field.
Net -653 lines, and removes a large cluster of legacy-ast.Stmt/Expr-walking
code from the hot transform path (advances toward deleting legacy ast).
Verified: cleanc self-build of cmd/v2/v2.v -> working compiler;
cleanc_target_e2e_test (compiles AND runs real programs end-to-end through
the parallel transform), all 19 transformer tests, module_storage_cache_test,
type_check_parallel_test, cleanc_test pass; -stats legacy=0; arm64 + eval
(keep_files=true) smokes ok. A pre-existing cleanc bug with struct methods
in small standalone programs reproduces identically on HEAD (not a regression).
First step of making the transform read from cursors instead of decoding each whole top-level statement subtree to legacy ast at the transform_cursor_stmts_to_flat_direct loop (flat_write.v:1835) — the last big legacy-AST consumer and the blocker for deleting ast.Stmt/Expr. Add transform_stmt_list_item_cursor_to_flat(c ast.Cursor, ...): it dispatches on c.kind() and falls back, in one line, to the proven legacy list-item path for unconverted kinds (reusing the full guard chain + pending-stmts drain). The loop seam now passes the cursor (stmts.at(i)) instead of stmts.at(i).stmt(). First converted set is the nine TRUE-passthrough top-level kinds that carry no try_expand_* guard and that transform_stmt_to_flat emits verbatim (flat_write.v:3890): stmt_import, stmt_module, stmt_directive, stmt_empty, stmt_enum_decl, stmt_interface_decl, stmt_type_decl, stmt_asm, stmt_flow_control. They route through append_transformed_stmt_to_flat exactly as the fallback would after its guards fail — bit-equal, just skipping the always-false-for-these-kinds guard checks. This is the dispatcher foundation, not yet a decode win: the converted kinds still decode via c.stmt() (dropping that needs flat-to-flat subtree copy). The point is the seam that later stages — const/global, then the FnDecl body (the real per-function whole-body decode that dominates) — extend arm by arm, each gated by the existing parity harness. Verified: transformer_flat_diff_test parity (cursor output bit-equal to the decode path) + all 19 transformer tests; cleanc self-build -> working compiler; cleanc_target_e2e_test (compiles+runs real programs); -stats legacy=0; arm64 + cleanc smokes exercising import/enum/type-decl top-level kinds both correct.
…ge 2) Second step of the cursor-native transform (slice 4). Convert the stmt_const_decl and stmt_global_decl arms of transform_stmt_list_item_cursor_to_flat to read straight from the cursor instead of decoding the whole statement to legacy ast first. - transform_const_decl_cursor_to_flat / transform_global_decl_cursor_to_flat mirror the ast.ConstDecl / ast.GlobalDecl arms of transform_stmt_to_flat exactly (same emit order, same field-init/field-decl encoding) but read is_public, the field list, and per-field name/flags/typ/value/attrs from the cursor. The ConstDecl/GlobalDecl wrapper + FieldInit/FieldDecl structure are no longer rehydrated as legacy structs; only the field value/typ exprs still decode (via the existing transform_expr_to_flat), which the later expr-arm stages eliminate. - Factor append_transformed_stmt_id_to_flat out of append_transformed_stmt_to_flat so the cursor arms drain t.pending_stmts ahead of the emitted stmt identically (bit-equal ordering). Verified: transformer_flat_diff_test parity (cursor output bit-equal to the decode path) + all 19 transformer tests; cleanc self-build of cmd/v2/v2.v (hundreds of top-level const decls) -> working compiler; a const+__global smoke compiles+runs correctly on cleanc, arm64, and the self-built compiler (hi 49); cleanc_target_e2e_test and module_storage_cache_test pass; -stats legacy=0.
Behavior-preserving refactor that splits the 264-line transform_fn_decl_parts into three reusable pieces, preparing the cursor-native FnDecl body streaming (slice 4, stage 4): - enter_fn_body_transform(decl) ?FnBodyTransformCtx — the early-exit checks (uninstantiated-generic skip, comptime-attr elision, @[live] detection) and the full prologue (fn scope from cached_fn_scopes or fallback seeding, param/receiver seeding, return-type-name resolution, per-fn state set, smartcast reset). Returns none on early-exit, else a FnBodyTransformCtx holding the ~17 saved-state locals + live_fn_detected + scope keys + has_return_type/fn_return_type. - restore_fn_body_transform_state(mut ctx) — restores the per-fn state saved around the body transform. - finish_fn_body_transform(decl, mut ctx) — cached_fn_scopes writeback, scope restore, and the @[live] noinline attr; returns the final attribute list. transform_fn_decl_parts now reads: enter -> transform_stmts -> restore_state -> lower_defer_stmts -> finish, identical to before. The split lets a future streaming variant run enter -> stream body via the cursor body driver -> restore_state -> finish (skipping defer, which is a no-op for no-defer fns). Verified behavior-identical: transformer_flat_diff_test parity + all 19 transformer tests; cleanc self-build of cmd/v2/v2.v (every function routes through the helpers) -> working compiler that compiles+runs a defer'd fn correctly; -stats legacy=0.
…te fix Stage 4 of the cursor-native transform, the real decode-reduction step: top-level functions whose body has no defers now transform their body directly from the cursor instead of decoding the whole FnDecl (header + entire body + every nested expr) to legacy ast at once. flat_write.v: - transform_fn_decl_streaming_to_flat: read the signature via c.fn_decl_signature() (body-less, no whole-decl decode), run the shared enter_fn_body_transform prologue, stream the body from c.list_at(3) via transform_cursor_stmts_to_flat_direct (one statement decoded at a time), then restore_fn_body_transform_state + finish_fn_body_transform and emit. Peak transform memory per function drops from whole-body to one statement. - flat_subtree_has_defer / flat_body_has_defer: cheap recursive cursor scan (no decode); functions with a defer anywhere fall back to the whole-decl decode path, because lower_defer_stmts needs the complete body. Over- detection (e.g. a defer in a nested closure) only costs a fallback. - emit_fn_decl_flat: shared FnDecl encoder used by both the legacy arm and the streaming path, so they emit identically. - wired stmt_fn_decl into transform_stmt_list_item_cursor_to_flat. cursor.v: fix a latent bug the streaming exposed — Cursor.attribute() decoded comptime_cond (`@[if cond ?]`) with the limited attribute_expr() (ident/string subset only), so any non-trivial condition silently became empty_expr. Decode it with the full expr() (mirrors FlatReader.read_attribute). Without this, streamed @[if X ?] functions (track_heap/trace_error/ debug_strconv) were never elided. Verified BIT-EQUAL to the decode path: diffing the --no-parallel cleanc C of the whole compiler with streaming on vs off shows zero divergence in any compiled function (only the new streaming helpers themselves differ). transformer_flat_diff_test parity + all 19 transformer tests + ast cursor test; cleanc self-build of cmd/v2/v2.v (every fn streams or falls back) -> working compiler that compiles+runs recursion/loops/maps/defer correctly; module_storage_cache_test (attributes), cleanc_test, cleanc_target_e2e_test pass; -stats legacy=0. (The arm64 native self-host is pre-existing-broken on this branch — unresolved libc/captured-fn-literal symbols on the baseline too — so the bit-equal C diff is the stronger gate.)
Stage 5 of the flat-AST transform migration: stream the body of plain `for` loops (cond / classic / bare) cursor-native instead of decoding the whole loop to legacy AST. Loops nested inside stream recursively, so a function's entire control-flow body materialises one statement at a time. The dispatcher gains a `.stmt_for` arm. for-in loops (init is a ForInStmt) keep the whole-decl decode fallback — their range/array/ string/map/untyped lowering is the monolithic `transform_for_stmt` for-in branch plus the `try_expand_for_in_map` guard. The init-kind check mirrors `transform_for_stmt`'s own `stmt.init is ast.ForInStmt` branch exactly, so nothing for-in changes. `transform_for_stmt_streaming_to_flat` mirrors `transform_for_stmt`'s non-for-in tail (open scope, push cond `is`-check smartcasts, transform body, pop, transform init/cond/post, close scope). The body is transformed first to keep the legacy transform order (identical synth positions); the for-node edges are still [init, cond, post, body...], so output is structurally identical to the decode path. ast: CursorList gains an `offset` field (default 0, fully backward compatible) so a node's trailing edges can be viewed as a list; `Cursor.for_body_list()` views a stmt_for node's body (edges 3+). Verified: bit-identical generated C across all 214,948 lines of `cmd/v2/v2.v` compiled by a streaming vs a forced-fallback compiler (complete cleanc correctness proof); parity test, ast 10/10, transformer 19/19, cleanc_target_e2e, module_storage_cache, legacy=0; nested-loop and while-style programs run correctly.
…lice 6) Slice 6 groundwork for the flat-native parallel transform: a primitive that concatenates a whole src FlatAst (nodes, edges, strings, file roots) into a builder, relocating every node id / edge target and re-interning every string so the merged result decodes identically to src standalone. Returns the node-id offset applied to src's nodes. This will replace the per-worker legacy `ast.File` rehydrate-then-flatten: each worker emits into its own FlatBuilder, the main thread concatenates the outputs with append_flat. The hard part is FlatNode.extra (variant-specific). Audited against the canonical decoder (flat_reader.v) + every emit() call site: extra holds an interned string id for EXACTLY three kinds — .file (mod), .stmt_directive (value), .stmt_import (alias) — and those are re-interned on merge. Every other kind packs ints/counts/flags/list-boundaries in extra (assign lhs_len, map_init keys_len, fn_literal captured.len, string_inter width/precision, expr_lock lock/rlock, aux_int value, stmt_empty enum) and is copied verbatim. aux is always a token/sub-kind enum; pos.id is globally unique across merged inputs (and selector_names is keyed by pos.id), so both copy verbatim. Not yet wired into any path (behavior unchanged) — it's the standalone, unit-tested foundation. Test covers: node/edge relocation + offset accounting, name_id string remap across a deliberately-shifted destination intern table, all three string-extra kinds re-interned (verified by decoding the merged import/directive and reading the merged .file node's extra), two independent sources staying structurally intact, and the empty-source no-op. Verified: ast 11/11, transformer parity, v2 self-build + runs.
…ile rehydrate (slice 6)
Wires the append_flat merge primitive into the parallel transform. Flat-codegen
backends (cleanc/c/x64/arm64, i.e. !keep_files) now take a flat-native parallel
path: each worker transforms its contiguous file range cursor-native straight
into its OWN FlatBuilder (transform_file_index_with_extra_to_flat), then the main
thread concatenates the per-worker flats in file order via FlatBuilder.append_flat
and runs the same post_pass tail. No legacy ast.File is materialised on the
default build path — this closes the gap the old comment named ("the flat AST has
no thread-safe merge primitive to let workers append to one builder").
The `.v`/eval backends (keep_files) keep the legacy rehydrate-then-flatten path,
which still needs the transformed []ast.File.
Reuses the proven worker machinery unchanged: new_worker_clone (per-worker
synth_pos_counter offset -worker_idx*100_000 keeps synth ids disjoint),
merge_worker (output-format-independent), contiguous file ranges merged in spawn
order to preserve file order. The per-file transform is the same cursor-native
code the sequential --no-parallel path already uses (transform_flat_to_flat_direct).
Verified: a forced-old-path vs new-path C diff of the whole cmd/v2/v2.v differs
only in synthetic temp-variable names (_tuple_tmp_<id> / _st<id> — the cursor
transform advances the synth counter in a different order); after normalizing
those, new-parallel matches the PROVEN sequential flat-direct's symbol content
(and even fixes a pre-existing old-parallel discrepancy where it dropped a few
Array_f64/builtin symbols the sequential path keeps). Output is deterministic
(run1==run2). Functional: new-path v3 self-compiles to a working v4, both compile
and run programs correctly; cleanc_target_e2e + module_storage_cache + transformer
(30) + ast (incl. append_flat) all pass.
collect_decl_type_aliases reads only signature-level type references (receiver / param / return types for FnDecl, field types for structs), yet the call site decoded the whole stmt — rehydrating every fn body just to read its signature. Decode the body-less signature (fn_decl_signature) for the fn_decl case instead. This was the only fn-body decode left on the DEFAULT cleanc build path: pass5_file_cost only runs under -stats, live-reload scan only under -live, and consts/globals have no bodies. So the default parse->transform-> markused->ssa->cleanc pipeline now decodes no fn body in cleanc gen. Verified: bit-identical generated C across all of cmd/v2/v2.v (the same type aliases are collected, the body is simply never decoded); cleanc_target_e2e and module_storage_cache (cache-bundle type-alias path) pass.
Comptime-guarded (`$if dbg_sel ?`) eprintln tracing for diagnosing `missing X.Y` selector errors / `write_string` method resolution: traces selector_expr's missing-symbol path, find_field_or_method, the lookup_method_direct intrinsic vs loop branches, lookup_method_for_type_name (env.methods presence/count), and error_with_pos. Inert in normal builds; enable with `-d dbg_sel`.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tests