Skip to content

v2: self-host & codegen performance + fix v -profile#27387

Merged
medvednikov merged 13 commits into
masterfrom
v2-codegen-perf
Jun 8, 2026
Merged

v2: self-host & codegen performance + fix v -profile#27387
medvednikov merged 13 commits into
masterfrom
v2-codegen-perf

Conversation

@medvednikov

Copy link
Copy Markdown
Member

Performance work on the v2 self-host pipeline, plus a fix to mainline v -profile.

v -profile fix

The profiler emitted vpc_<fn> (time), vpc_<fn>_only_current, and vpc_<fn>_calls globals. When two functions' mangled C names differed only by a _calls/_only_current suffix, one's u64 counter aliased the other's double accumulator → redefinition of 'vpc_...' with a different type and a failed C compile. Counter names are now prefixed with a unique per-function index, so v -profile builds and runs on large codebases (e.g. cmd/v2/v2.v).

v2 codegen: eliminate O(n²) resolution scans

Guided by the now-working profiler's call counts, four per-call linear scans over the signature tables were replaced with one-time indexes. Generated C is byte-identical; C Gen ~26% faster on a full self-compile:

  • resolve_specialized_receiver_method — was 42k calls / ~1.2s self
  • imported_symbol_c_type — was 897k calls / ~1.6s self (scanned all ~280 files per call)
  • unique_v_method_return_type — per-call v_fn_return_types scan
  • qualify_ierror_concrete_base — per-call keys().sort()

v2 self-host caching

Durable object cache + a pre-parse fast-relink gate make a cold cleanc self-host with unchanged sources ~60ms, plus other rebuild/codegen speedups.

Validation

Generated C verified byte-identical to the prior compiler (bundled v2compiler.c + main.c md5-identical); transformer test suite passes. The pre-existing arm64 v3 self-host SIGSEGV is unaffected — the pre-change v3 crashes identically.

Three cold self-host speedups for `v2 -backend cleanc cmd/v2/v2.v`:

- Durable object cache + pre-parse fast-relink gate. The bundle objects +
  main.o are mirrored to ~/.cache/v2cleanc_persist so a cold build (the /tmp
  obj cache wiped) restores them. A new gate hoisted to the top of build()
  relinks directly when all sources are fresh, skipping the ~825ms front-end
  (parse/typecheck/transform) that otherwise always runs. Cold self-host with
  unchanged sources drops from ~2.4s to ~60ms; warm from ~0.9s to ~50ms.
  Conservative: any staleness fails a freshness check and falls through to a
  normal build, so it can never emit a stale binary (verified byte-equivalent
  to a fresh full build).

- Sub-file Pass 5 splitting. Parallel cleanc Pass 5 split work per file, so the
  single biggest file (ssa/builder.v, ~590ms) pinned the whole phase. Large
  files are now split into contiguous FnDecl-index slices across workers;
  v2compiler.o global symbol set is byte-identical.

- Skip dead v2compiler .vh header generation. Those headers are only read by
  the disabled v2compiler header parse-reuse path, so generating them on every
  cold self-build was ~230ms of pure overhead (gated behind V2_V2COMPILER_VH).

Validated: builder/markused/transformer suites pass; generated C is unchanged
for the .vh and persist changes.
The profiler emitted three globals per function — `vpc_<cfn>` (double, time),
`vpc_<cfn>_only_current` (double), `vpc_<cfn>_calls` (u64, calls). When two
functions' mangled C names differ only by a `_calls`/`_only_current` suffix
(e.g. `..._lower` and `..._lower_calls`), one's call counter aliases the
other's time accumulator, producing `redefinition of 'vpc_...' with a
different type: double vs u64` and failing the C compile.

Prefix the counter base name with a unique per-function index so no derived
name can ever alias another function's base. `v -profile` now builds and runs
on large codebases (e.g. cmd/v2/v2.v).
resolve_specialized_receiver_method scanned the entire fn_return_types +
fn_param_is_ptr tables for each (receiver_type, method) key — O(unique_keys ×
total_fns). The `v -profile` call counts showed it at 42k calls / ~1.2s self
on a full self-compile. Replace the scan with a one-time (base|method) index;
the per-key result is unchanged (byte-identical generated C). C Gen ~8% faster.
imported_symbol_c_type was the hottest codegen function (~897k calls, ~1.6s
self per `v -profile`): each call scanned all ~280 g.files (string-comparing
file.name) to find the current file, then its imports' symbols. Build a flat
"file\x01symbol -> module" index once in collect_source_module_names (g.files
is stable) and make it an O(1) lookup. Generated C is byte-identical; C Gen
~8% faster (on top of the specialized-method index).
unique_v_method_return_type scanned all of v_fn_return_types per call (715
calls / ~680ms self in `v -profile`). Index it by method short-name once after
collect_fn_signatures_to_fixed_point (v_fn_return_types is final there).
Byte-identical C; ~3% faster C Gen.
qualify_ierror_concrete_base did `fn_return_types.keys()` + `.sort()` on every
call (543 calls / ~540ms self in `v -profile`) — allocating and sorting all fn
names each time. Index the `*__base__msg` functions by base once (smallest per
base = the original sorted-first match); keep the emitted_types/pending scans
as fallbacks for those growing maps. Byte-identical C.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

…r/flag changes

The pre-parse self-host fast relink keyed only on the input file
(is_cmd_v2_self_build), so a warm-cache `-o foo.c cmd/v2/v2.v` linked an
executable into foo.c instead of writing C source. Mirror gen_cleanc()'s
generation-only decision (.c output / cannot-compile-locally / shared lib) and
fall through to normal generation in those cases.

It also trusted the cc/cc_flags/cc_link_flags recorded in main.stamp without
checking them against the current invocation, so changing the compiler or env
CFLAGS while sources were unchanged relinked with stale flags. Record a
pre-parse flag fingerprint (compiler choice, prod/shared mode, V2CFLAGS) in the
stamp and re-check it on the fast path; a mismatch falls through to a normal
build. Source-derived #flag directives are excluded — they change only when a
source changes, which the existing freshness checks already catch.
…mbiguity

Pass 5 splits large files into per-slice work items across workers, but the
cross-worker dedup guard (blocked_fn_keys) was keyed at file level via
fn_owner_file. When a file was split across workers every slice's worker
considered the whole file owned, so neither blocked the file's lazily or
transitively emitted fns — a latent duplicate/reorder hole. Now only the worker
that emits a file's globals takes file-level ownership; a split file's other
slices block the file's fns and emit their explicit slice through an
owner-scoped bypass (explicit_slice_emit_allows) in gen_file_range. Generated
self-host C is byte-identical.

Also fix resolve_specialized_receiver_method: the one-time lazy
specialized_index snapshotted the signature tables at first lookup and went
stale, so a second specialization registered afterwards (which makes a
(base, method) pair ambiguous) was missed. Consult the always-fresh incremental
specialized_receiver_methods / _ambiguous index that remember_specialized_fn_base
maintains, and drop the redundant snapshot index. Restores pass5_worker_test;
self-host C unchanged.
…ice ownership

Add regression coverage for the fast-relink and pass5-split review fixes:
- fast_relink_test.v: a `.c` output and a shared lib are generation-only and must
  never be relinked, and preparse_flag_fingerprint changes when the compiler /
  prod / shared mode / V2CFLAGS change (and is stable otherwise).
- pass5_worker_test.v: the owner-scoped blocked_fn_keys bypass is scoped to the
  slice's own file and never unblocks fns owned by another file.

Extract fast_relink_output_is_generation_only() so the decision is unit-testable
without a warm object cache. Output-preserving: the inline and extracted forms
emit byte-identical self-host C.
@medvednikov medvednikov merged commit 7134f48 into master Jun 8, 2026
81 of 93 checks passed
@medvednikov medvednikov deleted the v2-codegen-perf branch June 8, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant