Skip to content

v2: finish cleanc flat AST self-host migration#27390

Merged
medvednikov merged 22 commits into
masterfrom
codex/v2-flat-ast-next
Jun 8, 2026
Merged

v2: finish cleanc flat AST self-host migration#27390
medvednikov merged 22 commits into
masterfrom
codex/v2-flat-ast-next

Conversation

@medvednikov

Copy link
Copy Markdown
Member

Summary

  • remove cleanc weak generic specialization emission and weak generic bookkeeping
  • rely on normal nested generic discovery for strong concrete specializations
  • fix flat-AST cleanc cases for interface defaults, slice bytestr lowering, and fixed-array map keys

Tests

  • ./v -g -keepc -o ./vnew cmd/v
  • ./vnew -nocache -silent test vlib/v2/gen/cleanc/flag_enum_codegen_test.v
  • ./vnew -nocache -silent test vlib/v2/gen/cleanc/result_option_codegen_test.v
  • ./vnew -nocache -silent test vlib/v2/gen/cleanc/
  • ./vnew -nocache -silent test vlib/v2/transformer/transformer_test.v
  • git diff --check
  • 5-level cleanc self-host chain with -nocache --no-parallel

The flat-codegen migration drops b.files for the cleanc backend (the
post-transform FlatAst is the source of truth), but the cleanc
cached-module build path still enumerated modules/files from b.files.
With b.files empty, has_module('builtin') returned false, so the entire
cached-core split-compilation path was silently skipped: no per-module
.vh headers were written and the slower non-cached fallback ran instead.
This regressed module_storage_cache_test (passed at b6bd49b, failed
after the flat commits).

Make the cleanc cache enumeration flat-aware. When b.files is empty and
the post-transform FlatAst is present (uses_flat_module_enumeration),
source module names, file paths and imports from the flat cursors:
  - has_module, collect_modules_excluding, user_entry_module_names,
    expand_type_modules_with_imports (raw imports via read_file_imports),
    has_external_cache_module_name_collision
  - import_modules_for_cached_modules (comptime-aware
    active_file_imports_from_flat_with_options)
  - collect_virtual_main_modules (+ flat_file_declares_executable_main)
  - source_files_for_module_name — required for external (non-vlib)
    modules whose location the vlib disk-scan fallback cannot resolve;
    this is what left ext.vh unwritten.

Each helper keeps its existing b.files loop for the legacy (.v/.eval)
path and normalizes module names through flat_file_module_name (dots to
underscores, 'main' default) to stay bit-identical to ast_file_module_name.

Cache bundles additionally need correct emission: the flat cleanc gen
does not yet scope _option_/_result_ wrapper-typedef emission to a
type-module subset, so it emits a type-module struct/prototype without
the wrapper typedef it references (undefined-type errors in builtin.c).
For restricted bundle generation in flat mode, rehydrate just the
bundle's type-module files from flat and drive the proven legacy gen,
which filters by physical file set. Bounded to cached-module files; the
.o is cached so this only runs on a cache miss. The unrestricted main
translation unit still uses the flat gen. Fully flat-native restricted
bundle codegen remains the follow-up.

Verified: module_storage_cache_test passes; cleanc self-build of
cmd/v2/v2.v produces a working compiler; cache-hit (.vh-parse) rebuild
works; parsed-AST stats report legacy=0; builder/transformer/cleanc/
markused/ssa/ast suites green; the three pre-existing cleanc codegen
failures are unchanged.
Compiling cmd/v2/v2.v via cleanc (flat codegen) spent ~28s in Transform,
~6.5x slower than the legacy path. Two causes, both fixed:

1) lookup_imported_var_type rebuilt its work on every call: it took the
   current module scope, allocated objects.keys(), SORTED them, then
   scanned the whole symbol table for Module (import) objects and resolved
   each import's scope — all to find one var across the module's imports.
   Called per-identifier during type propagation, this was one of the
   hottest functions (post_pass alone was 18s). The import-scope set is
   stable for a transform run, so precompute it once in cache_env_maps
   (build_cached_imported_module_scopes -> cached_imported_module_scopes)
   and just iterate the short import list. The dropped per-call sort does
   not change the result (the "found in exactly one import" / ambiguity
   semantics are order-independent). post_pass: 18s -> 2.9s, and the
   per-identifier lookup is now O(imports) instead of O(objects log objects).
   This speeds up both the flat and legacy propagation paths.

2) The parallel build branch ran the SEQUENTIAL transform_flat_to_flat_direct
   for flat-codegen backends, so the per-file transform never used worker
   threads (the legacy driver fans it out). The flat AST has no thread-safe
   merge primitive for workers to append into one builder, so route the
   parallel branch through transform_files_parallel_to_flat_via_driver and
   drop b.files immediately for flat-codegen backends — codegen stays
   flat-only; the legacy files are live only transiently during the parallel
   transform. The memory-critical arm64 self-host runs --no-parallel, so it
   takes the sequential branch and keeps the allocation-minimal flat-direct
   path untouched.

Result: Transform on cmd/v2/v2.v 28.1s -> ~4.2s, on par with the legacy
parallel path (~4.1s).

Verified: cleanc self-build of cmd/v2/v2.v produces a working compiler;
module_storage_cache_test, all 19 transformer tests (incl. the propagation
parity guard), cleanc_test, and the builder suite pass; parsed-AST stats
stay legacy=0; arm64 sequential and parallel smokes work; the three
pre-existing cleanc codegen failures are unchanged.
Restricted cleanc cache-bundle generation was the last codegen path that
still materialized legacy ast.File: it rehydrated the bundle's type-module
files via b.flat.to_files_range(i, i+1) and drove the LEGACY cleanc gen,
because the flat gen, given the whole b.flat, emitted foreign-module
structs/prototypes (e.g. term__ColorConfig, os__open_file) whose
_option_/_result_/Array_ wrapper typedefs are registered only while
emitting that module's bodies — which a bundle does not do — producing
undefined-type C errors.

Fix it the same way the legacy gen did: by scoping the INPUT to the
bundle's type modules. The legacy gen filtered its gen_files; the flat gen
drives every one of its ~65 emission passes off flat.files, so hand it a
FlatAst whose file list is restricted to the bundle's type_module_names
(flat_scoped_to_modules). The node/edge/string arena is shared, not copied
(V shares the array buffer on struct-literal field assignment), so this is
a cheap file-list filter, not an AST rehydrate. Every pass then sees
exactly the files the legacy gen saw.

This removes the last to_files_range / legacy-ast.File materialization on
the cleanc codegen path AND moves bundle generation onto the flat gen
(previously legacy). Memory: warm cache (bundles reused) is unchanged
(5.7GB); a cold self-build is +1.6GB (12.5 vs 10.9GB) because bundles now
run through the flat gen — acceptable on the cleanc C-backend path (the
memory-critical arm64 self-host is --no-parallel/flat-direct and never
touches this), and only on a cache miss.

Verified: cleanc self-build of cmd/v2/v2.v produces a working compiler
with zero undefined-type errors in builtin/vlib/v2compiler/imports
bundles; module_storage_cache_test and cleanc_test pass; the cleanc
codegen suite is 6 passed / 3 failed (the 3 are pre-existing, unchanged);
arm64 smoke unaffected.
…lat backends)

For flat-codegen backends (cleanc/c/x64/arm64), the parallel transform
(transform_files_parallel_to_flat_via_driver) ran the full legacy
file-mutating post-pass on the transformed []ast.File and threaded the
result back to the caller, which then immediately dropped it. That work is
redundant: post_pass_to_flat already applies the same edits to the flat,
and the type-propagation tail can run against the flat
(apply_post_pass_tail_from_flat) instead of the legacy files. This is
exactly what the sequential transform_flat_to_flat_direct already does.

Add a `keep_files` parameter:
  - flat backends pass keep_files=false -> skip post_pass_files_with_
    generated_parts, run apply_post_pass_tail_from_flat, return [] (the
    transient `result` is freed in the function instead of being returned
    and dropped).
  - .v/eval still consume the files, so they pass keep_files=true (the
    previous behavior, unchanged).

Removes the last []ast.File consumer in the post-pass tail on the flat
codegen path (advances toward deleting legacy ast) and skips a redundant
whole-program legacy post-pass + propagate_types on every default
(parallel) cleanc/c/x64 build.

Verified: cleanc self-build of cmd/v2/v2.v -> working compiler; eval
backend (keep_files=true path) prints correct output; module_storage_cache_test,
cleanc_test, all 19 transformer tests (incl. propagate_types_from_flat
parity + transformer_flat_diff), cleanc_target_e2e, native, target_os,
type_check_parallel, flat_streaming pass; -stats legacy=0; arm64 smoke ok.
…kers

The parallel transform always streams per-file from the post-parse FlatAst:
transform_files_parallel_no_post_pass_impl was only ever called with
stream_from_flat=true (both wrappers passed true), so the entire
`stream_from_flat == false` machinery — which transformed a legacy
[]ast.File input — was unreachable. Remove it:

  - the `stream==false` branches in the fan-out (Windows + non-Windows,
    single-thread + worker spawn), folding the impl into a flat-only
    transform_files_parallel_no_post_pass (drop the bool param);
  - the top-level-stmt splitting subsystem
    (transform_files_parallel_top_level_stmts, TransformStmtJob/Result/
    ChunkArgs, transform_stmt_chunk_thread, file/stmt_can_split_top_level,
    file_transform_cost, lpt_stmt_buckets);
  - the LPT cost-walkers that recursed over legacy ast.Stmt/Expr/Type
    (transform_stmt_cost, transform_stmts_cost, transform_expr_cost,
    transform_type_cost, top_level_transform_stmt_cost, lpt_buckets);
  - the uncalled wrappers transform_files_parallel,
    transform_files_parallel_from_flat,
    transform_files_parallel_no_post_pass_from_flat;
  - the non-flat branch of transform_chunk_thread and the now-unused
    TransformChunkArgs.files field.

Net -653 lines, and removes a large cluster of legacy-ast.Stmt/Expr-walking
code from the hot transform path (advances toward deleting legacy ast).

Verified: cleanc self-build of cmd/v2/v2.v -> working compiler;
cleanc_target_e2e_test (compiles AND runs real programs end-to-end through
the parallel transform), all 19 transformer tests, module_storage_cache_test,
type_check_parallel_test, cleanc_test pass; -stats legacy=0; arm64 + eval
(keep_files=true) smokes ok. A pre-existing cleanc bug with struct methods
in small standalone programs reproduces identically on HEAD (not a regression).
First step of making the transform read from cursors instead of decoding
each whole top-level statement subtree to legacy ast at the
transform_cursor_stmts_to_flat_direct loop (flat_write.v:1835) — the last
big legacy-AST consumer and the blocker for deleting ast.Stmt/Expr.

Add transform_stmt_list_item_cursor_to_flat(c ast.Cursor, ...): it
dispatches on c.kind() and falls back, in one line, to the proven legacy
list-item path for unconverted kinds (reusing the full guard chain +
pending-stmts drain). The loop seam now passes the cursor (stmts.at(i))
instead of stmts.at(i).stmt().

First converted set is the nine TRUE-passthrough top-level kinds that carry
no try_expand_* guard and that transform_stmt_to_flat emits verbatim
(flat_write.v:3890): stmt_import, stmt_module, stmt_directive, stmt_empty,
stmt_enum_decl, stmt_interface_decl, stmt_type_decl, stmt_asm,
stmt_flow_control. They route through append_transformed_stmt_to_flat
exactly as the fallback would after its guards fail — bit-equal, just
skipping the always-false-for-these-kinds guard checks.

This is the dispatcher foundation, not yet a decode win: the converted
kinds still decode via c.stmt() (dropping that needs flat-to-flat subtree
copy). The point is the seam that later stages — const/global, then the
FnDecl body (the real per-function whole-body decode that dominates) —
extend arm by arm, each gated by the existing parity harness.

Verified: transformer_flat_diff_test parity (cursor output bit-equal to the
decode path) + all 19 transformer tests; cleanc self-build -> working
compiler; cleanc_target_e2e_test (compiles+runs real programs); -stats
legacy=0; arm64 + cleanc smokes exercising import/enum/type-decl top-level
kinds both correct.
…ge 2)

Second step of the cursor-native transform (slice 4). Convert the
stmt_const_decl and stmt_global_decl arms of
transform_stmt_list_item_cursor_to_flat to read straight from the cursor
instead of decoding the whole statement to legacy ast first.

- transform_const_decl_cursor_to_flat / transform_global_decl_cursor_to_flat
  mirror the ast.ConstDecl / ast.GlobalDecl arms of transform_stmt_to_flat
  exactly (same emit order, same field-init/field-decl encoding) but read
  is_public, the field list, and per-field name/flags/typ/value/attrs from
  the cursor. The ConstDecl/GlobalDecl wrapper + FieldInit/FieldDecl
  structure are no longer rehydrated as legacy structs; only the field
  value/typ exprs still decode (via the existing transform_expr_to_flat),
  which the later expr-arm stages eliminate.
- Factor append_transformed_stmt_id_to_flat out of
  append_transformed_stmt_to_flat so the cursor arms drain t.pending_stmts
  ahead of the emitted stmt identically (bit-equal ordering).

Verified: transformer_flat_diff_test parity (cursor output bit-equal to the
decode path) + all 19 transformer tests; cleanc self-build of cmd/v2/v2.v
(hundreds of top-level const decls) -> working compiler; a const+__global
smoke compiles+runs correctly on cleanc, arm64, and the self-built
compiler (hi 49); cleanc_target_e2e_test and module_storage_cache_test
pass; -stats legacy=0.
Behavior-preserving refactor that splits the 264-line transform_fn_decl_parts
into three reusable pieces, preparing the cursor-native FnDecl body streaming
(slice 4, stage 4):

- enter_fn_body_transform(decl) ?FnBodyTransformCtx — the early-exit checks
  (uninstantiated-generic skip, comptime-attr elision, @[live] detection) and
  the full prologue (fn scope from cached_fn_scopes or fallback seeding,
  param/receiver seeding, return-type-name resolution, per-fn state set,
  smartcast reset). Returns none on early-exit, else a FnBodyTransformCtx
  holding the ~17 saved-state locals + live_fn_detected + scope keys +
  has_return_type/fn_return_type.
- restore_fn_body_transform_state(mut ctx) — restores the per-fn state saved
  around the body transform.
- finish_fn_body_transform(decl, mut ctx) — cached_fn_scopes writeback, scope
  restore, and the @[live] noinline attr; returns the final attribute list.

transform_fn_decl_parts now reads: enter -> transform_stmts -> restore_state
-> lower_defer_stmts -> finish, identical to before. The split lets a future
streaming variant run enter -> stream body via the cursor body driver ->
restore_state -> finish (skipping defer, which is a no-op for no-defer fns).

Verified behavior-identical: transformer_flat_diff_test parity + all 19
transformer tests; cleanc self-build of cmd/v2/v2.v (every function routes
through the helpers) -> working compiler that compiles+runs a defer'd fn
correctly; -stats legacy=0.
…te fix

Stage 4 of the cursor-native transform, the real decode-reduction step:
top-level functions whose body has no defers now transform their body
directly from the cursor instead of decoding the whole FnDecl (header +
entire body + every nested expr) to legacy ast at once.

flat_write.v:
- transform_fn_decl_streaming_to_flat: read the signature via
  c.fn_decl_signature() (body-less, no whole-decl decode), run the shared
  enter_fn_body_transform prologue, stream the body from c.list_at(3) via
  transform_cursor_stmts_to_flat_direct (one statement decoded at a time),
  then restore_fn_body_transform_state + finish_fn_body_transform and emit.
  Peak transform memory per function drops from whole-body to one statement.
- flat_subtree_has_defer / flat_body_has_defer: cheap recursive cursor scan
  (no decode); functions with a defer anywhere fall back to the whole-decl
  decode path, because lower_defer_stmts needs the complete body. Over-
  detection (e.g. a defer in a nested closure) only costs a fallback.
- emit_fn_decl_flat: shared FnDecl encoder used by both the legacy arm and
  the streaming path, so they emit identically.
- wired stmt_fn_decl into transform_stmt_list_item_cursor_to_flat.

cursor.v: fix a latent bug the streaming exposed — Cursor.attribute()
decoded comptime_cond (`@[if cond ?]`) with the limited attribute_expr()
(ident/string subset only), so any non-trivial condition silently became
empty_expr. Decode it with the full expr() (mirrors FlatReader.read_attribute).
Without this, streamed @[if X ?] functions (track_heap/trace_error/
debug_strconv) were never elided.

Verified BIT-EQUAL to the decode path: diffing the --no-parallel cleanc C
of the whole compiler with streaming on vs off shows zero divergence in any
compiled function (only the new streaming helpers themselves differ).
transformer_flat_diff_test parity + all 19 transformer tests + ast cursor
test; cleanc self-build of cmd/v2/v2.v (every fn streams or falls back) ->
working compiler that compiles+runs recursion/loops/maps/defer correctly;
module_storage_cache_test (attributes), cleanc_test, cleanc_target_e2e_test
pass; -stats legacy=0. (The arm64 native self-host is pre-existing-broken on
this branch — unresolved libc/captured-fn-literal symbols on the baseline
too — so the bit-equal C diff is the stronger gate.)
Stage 5 of the flat-AST transform migration: stream the body of plain
`for` loops (cond / classic / bare) cursor-native instead of decoding the
whole loop to legacy AST. Loops nested inside stream recursively, so a
function's entire control-flow body materialises one statement at a time.

The dispatcher gains a `.stmt_for` arm. for-in loops (init is a
ForInStmt) keep the whole-decl decode fallback — their range/array/
string/map/untyped lowering is the monolithic `transform_for_stmt`
for-in branch plus the `try_expand_for_in_map` guard. The init-kind
check mirrors `transform_for_stmt`'s own `stmt.init is ast.ForInStmt`
branch exactly, so nothing for-in changes.

`transform_for_stmt_streaming_to_flat` mirrors `transform_for_stmt`'s
non-for-in tail (open scope, push cond `is`-check smartcasts, transform
body, pop, transform init/cond/post, close scope). The body is
transformed first to keep the legacy transform order (identical synth
positions); the for-node edges are still [init, cond, post, body...], so
output is structurally identical to the decode path.

ast: CursorList gains an `offset` field (default 0, fully backward
compatible) so a node's trailing edges can be viewed as a list;
`Cursor.for_body_list()` views a stmt_for node's body (edges 3+).

Verified: bit-identical generated C across all 214,948 lines of
`cmd/v2/v2.v` compiled by a streaming vs a forced-fallback compiler
(complete cleanc correctness proof); parity test, ast 10/10, transformer
19/19, cleanc_target_e2e, module_storage_cache, legacy=0; nested-loop and
while-style programs run correctly.
…lice 6)

Slice 6 groundwork for the flat-native parallel transform: a primitive that
concatenates a whole src FlatAst (nodes, edges, strings, file roots) into a
builder, relocating every node id / edge target and re-interning every string
so the merged result decodes identically to src standalone. Returns the
node-id offset applied to src's nodes. This will replace the per-worker legacy
`ast.File` rehydrate-then-flatten: each worker emits into its own FlatBuilder,
the main thread concatenates the outputs with append_flat.

The hard part is FlatNode.extra (variant-specific). Audited against the
canonical decoder (flat_reader.v) + every emit() call site: extra holds an
interned string id for EXACTLY three kinds — .file (mod), .stmt_directive
(value), .stmt_import (alias) — and those are re-interned on merge. Every other
kind packs ints/counts/flags/list-boundaries in extra (assign lhs_len,
map_init keys_len, fn_literal captured.len, string_inter width/precision,
expr_lock lock/rlock, aux_int value, stmt_empty enum) and is copied verbatim.
aux is always a token/sub-kind enum; pos.id is globally unique across merged
inputs (and selector_names is keyed by pos.id), so both copy verbatim.

Not yet wired into any path (behavior unchanged) — it's the standalone,
unit-tested foundation. Test covers: node/edge relocation + offset accounting,
name_id string remap across a deliberately-shifted destination intern table,
all three string-extra kinds re-interned (verified by decoding the merged
import/directive and reading the merged .file node's extra), two independent
sources staying structurally intact, and the empty-source no-op.

Verified: ast 11/11, transformer parity, v2 self-build + runs.
…ile rehydrate (slice 6)

Wires the append_flat merge primitive into the parallel transform. Flat-codegen
backends (cleanc/c/x64/arm64, i.e. !keep_files) now take a flat-native parallel
path: each worker transforms its contiguous file range cursor-native straight
into its OWN FlatBuilder (transform_file_index_with_extra_to_flat), then the main
thread concatenates the per-worker flats in file order via FlatBuilder.append_flat
and runs the same post_pass tail. No legacy ast.File is materialised on the
default build path — this closes the gap the old comment named ("the flat AST has
no thread-safe merge primitive to let workers append to one builder").

The `.v`/eval backends (keep_files) keep the legacy rehydrate-then-flatten path,
which still needs the transformed []ast.File.

Reuses the proven worker machinery unchanged: new_worker_clone (per-worker
synth_pos_counter offset -worker_idx*100_000 keeps synth ids disjoint),
merge_worker (output-format-independent), contiguous file ranges merged in spawn
order to preserve file order. The per-file transform is the same cursor-native
code the sequential --no-parallel path already uses (transform_flat_to_flat_direct).

Verified: a forced-old-path vs new-path C diff of the whole cmd/v2/v2.v differs
only in synthetic temp-variable names (_tuple_tmp_<id> / _st<id> — the cursor
transform advances the synth counter in a different order); after normalizing
those, new-parallel matches the PROVEN sequential flat-direct's symbol content
(and even fixes a pre-existing old-parallel discrepancy where it dropped a few
Array_f64/builtin symbols the sequential path keeps). Output is deterministic
(run1==run2). Functional: new-path v3 self-compiles to a working v4, both compile
and run programs correctly; cleanc_target_e2e + module_storage_cache + transformer
(30) + ast (incl. append_flat) all pass.
collect_decl_type_aliases reads only signature-level type references
(receiver / param / return types for FnDecl, field types for structs),
yet the call site decoded the whole stmt — rehydrating every fn body just
to read its signature. Decode the body-less signature (fn_decl_signature)
for the fn_decl case instead.

This was the only fn-body decode left on the DEFAULT cleanc build path:
pass5_file_cost only runs under -stats, live-reload scan only under -live,
and consts/globals have no bodies. So the default parse->transform->
markused->ssa->cleanc pipeline now decodes no fn body in cleanc gen.

Verified: bit-identical generated C across all of cmd/v2/v2.v (the same
type aliases are collected, the body is simply never decoded); cleanc_target_e2e
and module_storage_cache (cache-bundle type-alias path) pass.
Comptime-guarded (`$if dbg_sel ?`) eprintln tracing for diagnosing
`missing X.Y` selector errors / `write_string` method resolution: traces
selector_expr's missing-symbol path, find_field_or_method, the
lookup_method_direct intrinsic vs loop branches, lookup_method_for_type_name
(env.methods presence/count), and error_with_pos. Inert in normal builds;
enable with `-d dbg_sel`.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@medvednikov medvednikov merged commit d358576 into master Jun 8, 2026
65 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant