diff --git a/WIP.md b/WIP.md index e96b2986..0df1030c 100644 --- a/WIP.md +++ b/WIP.md @@ -432,6 +432,8 @@ Python scripts are reserved for non-render concerns: one-off content conversion The site builds via [builder/](builder/), a custom Node.js static site generator (`tbdocs`). See [builder/PLAN.md](builder/PLAN.md) for the architecture overview, [builder/README.md](builder/README.md) for the quickstart, and the [tbdocs Internals](docs/Documentation/Builder.md) site page for the high-level tour. +A task-graph scheduler / parallelisation pass is designed in [builder/PLAN-scheduler.md](builder/PLAN-scheduler.md) and has been implemented (Phases 0--4). + Historical engineering notes from the Jekyll era --- the original build pipeline, the HTML-compress plugin, the per-phase optimisation passes that preceded the JS port, the migration notes, and the Phase 11 parity-update retrospective --- live in [WIP.OldJekyll.md](WIP.OldJekyll.md). ## Build / preview diff --git a/builder/PLAN-5.md b/builder/PLAN-5.md index 68fe6284..51ed1f59 100644 --- a/builder/PLAN-5.md +++ b/builder/PLAN-5.md @@ -307,21 +307,22 @@ size. `fs.writeFile(path, buffer)` is the right primitive. ▼ [1] assertNoDestinationCollisions(pages, staticFiles) ← §6.4 (throws on any page.destPath == staticFile.destRel; - runs BEFORE prepareDestination so a collision aborts - without wiping the previous destination) + a collision aborts the build without any destructive I/O) │ ▼ - [2] prepareDestination(destRoot, dryRun) ← §5.1 - (delete + recreate, or skip if dry-run) + NOTE: prepareDestination (§5.1) is now a separate scheduler + task (`prepDest`) that runs as a seed -- it overlaps with + the entire main-thread spine and joins only at `write`. + writePhase assumes the destination is already prepared. │ ▼ - [3] In parallel (Promise.all): + [2] In parallel (Promise.all): writePages(pages, destRoot, limit) ← §5.2 copyTheme(builderAssetsRoot, destRoot, limit) ← §5.3 copyStaticFiles(staticFiles, destRoot, limit) ← §5.4 │ ▼ - [4] summarise(totals) ← §5.5 + [3] summarise(totals) ← §5.5 (file counts, byte counts, timing; one log line) ``` @@ -356,23 +357,22 @@ constrained systems; on the dev machine, no cap at all also works If profiling shows the cap is too low (write throughput < expected), bump it. The arg lives at the top of `write.mjs` as a constant. -### Why prepare-destination is sequential before the parallel writes +### Why prepare-destination is a separate seed task -Two reasons: +`prepareDestination` (§5.1) deletes and recreates the destination +directory. It now runs as the `prepDest` scheduler seed --- no +dependencies, so it overlaps with the entire main-thread spine and +worker seeds. The `write` task joins on `prepDest` alongside +`renderJoin`, `scss`, and `mermaid`, guaranteeing the destination is +clean before any file is written. -1. **Correctness.** The clean step deletes the existing tree. The - parallel writers would race the delete if it ran concurrently -- - a page write could land before the matching directory is removed, - then the delete would either fail (`ENOTEMPTY`) or destroy the - freshly-written file. -2. **Predictability.** A user-facing error from `prepareDestination` - (e.g. "destination is locked by another process") has a clean - single-source point. If it raced with writes, the error message - would be one of dozens of `EBUSY`s with no obvious culprit. +The same two invariants from the original design hold: -The prepare step is ~50 ms (recursive delete of a tree with ~1,080 -files + recreate). Sequencing it costs that 50 ms; parallelising -would save it but risk the failure modes above. +1. **Correctness.** The DAG edge `prepDest → write` ensures the + clean step finishes before the parallel writers start. +2. **Predictability.** A `prepareDestination` failure (e.g. locked + directory) surfaces as a single clear error before any write + begins. ### Phase 5 init order (one-time) @@ -886,12 +886,12 @@ function assertNoDestinationCollisions(pages, staticFiles) { } ``` -Called from `writePhase` **before** `prepareDestination`. The order -matters: a collision detected after the clean step would have -already wiped the previous `_site-new/` contents, leaving the user -no way to investigate the previous state. Running the assertion -first means a collision aborts the build without any destructive -I/O. Fast (set membership check over ~1,080 entries; <1 ms). +Called from `writePhase` before any file writes. Because +`prepareDestination` now runs as a separate seed task (`prepDest`), +it may have already cleaned the destination by the time the +collision check runs --- but the check still fires before `write` +begins writing any pages, so a collision still aborts cleanly. +Fast (set membership check over ~1,080 entries; <1 ms). --- @@ -1614,10 +1614,13 @@ async function main() { `writePhase(pages, staticFiles, { destRoot, dryRun })`: 1. Calls `assertNoDestinationCollisions(pages, staticFiles)` (§6.4). -2. Calls `prepareDestination(destRoot, dryRun)` (§5.1). -3. `Promise.all`-fans `writePages`, `copyTheme`, `copyStaticFiles` +2. `Promise.all`-fans `writePages`, `copyTheme`, `copyStaticFiles` (§5.2 / §5.3 / §5.4). -4. Returns `{ pages: {written, skipped}, theme: {copied}, staticFiles: {copied} }`. +3. Returns `{ pages: {written, skipped}, theme: {copied}, staticFiles: {copied} }`. + +Note: `prepareDestination` (§5.1) is no longer called inside +`writePhase`. The scheduler's `prepDest` seed task handles it +before `write` runs. The orchestrator's return value gains `destRoot` so Phase 6 / Phase 7 / Phase 8 know where to write or read. diff --git a/builder/PLAN-sab-pull-scheduler.md b/builder/PLAN-sab-pull-scheduler.md new file mode 100644 index 00000000..96838306 --- /dev/null +++ b/builder/PLAN-sab-pull-scheduler.md @@ -0,0 +1,4038 @@ +# SAB-based worker-pull scheduler + +Replaces the current push-based scheduler (main thread decides what's +ready, dispatches to workers via `pool.run()`) with a pull-based +model where workers resolve dependencies themselves via +SharedArrayBuffer atomics and claim the next task immediately after +completing one --- no main-thread round-trip for worker-to-worker +transitions. + +## Problem + +The scheduler runs on the main thread. When a `runOnMain` task's +`execute()` is running (e.g. `discover` at ~135 ms), the event loop is +blocked. Worker completion messages queue; no new tasks are dispatched +until the main-thread work finishes. On a 16-core machine, the idle +time across all threads sums to ~1 s --- significant against a <2 s +build. Worse on CI (4 threads) because fewer workers share the same +blocking windows. + +## Solution + +Put the mutable scheduling state (dependency counts, task status) in a +SharedArrayBuffer visible to all threads. Workers update dep counts and +claim ready tasks via `Atomics` operations. The main thread only +participates for `runOnMain` tasks and output merges into `SharedState`. + +Three new scheduling primitives handle the warmup case and +future worker-affinity needs: + +1. **`on_demand`** --- a seed task (no prerequisites) that is NOT + started at build start. It is triggered only when a dependent task + would otherwise be ready to run. Applies to both worker and + main-thread tasks. + +2. **`unique_per_worker`** --- instead of one global "done" flag, the + task has a done flag per worker lane. From worker W's perspective, + the task is done iff lane W's instance ran. Only applies to worker + tasks. + +3. **`pin_to_predecessor`** --- the task must run on the same worker + lane that ran a named predecessor. Applies to worker tasks. + +4. **`run_when_idle`** --- when a worker has no claimable tasks and + would otherwise sleep, it speculatively executes this task. This + is distinct from `on_demand` (triggered by a dependent) --- it is + triggered by worker idleness. The primary use case is overlapping + `warmInit` with the main-thread spine: workers finish their seed + tasks (scss, buildInfo) well before render chunks appear, and + would otherwise sit idle. With `run_when_idle`, they warm up + during that dead time. Applies to worker tasks. + +### warmInit under the new model + +`warmInit` is declared as an explicit task with both `unique_per_worker` +and `on_demand`. All `render` chunks list it as a per-worker dependency. +This replaces the current ad-hoc mechanism: the two-tier idle queue +(`_idleWarm` / `_idleCold`), the `warmup()` call in `scheduler.start()`, +the `deferHighlighter` flag on task defs, the `warmedUp` message +protocol, and the conditional `ensureHighlighterInit()` calls in +`cpu-worker.mjs`. + +## SAB memory layout + +A single SharedArrayBuffer allocated by the main thread before the +build starts. All arrays are Int32 for `Atomics` compatibility. + +``` +Constants: + MAX_TASKS = 256 // static tasks + max dynamic tasks (render chunks + renderJoin) + MAX_LANES = 64 // max worker threads + MAX_EDGES = 512 // total successor edges across all tasks + +Status values: + NOT_READY = 0 + READY = 1 + CLAIMED = 2 + DONE = 3 + +Flag bits: + F_ON_DEMAND = 1 + F_UNIQUE_PER_WORKER = 2 + F_RUN_ON_MAIN = 4 + F_PIN_TO_PRED = 8 + F_RUN_WHEN_IDLE = 16 + +Arrays (all Int32Array views into the SAB): + taskCount // [1] — current number of registered tasks (atomic) + depCount [MAX_TASKS] // remaining normal predecessor count per task + status [MAX_TASKS] // NOT_READY | READY | CLAIMED | DONE + flags [MAX_TASKS] // bitmask of F_* constants + succOffset [MAX_TASKS] // index into succList where this task's successors start + succCount [MAX_TASKS] // number of successors for this task + succList [MAX_EDGES] // flat array of successor task indices + affinityLane[MAX_TASKS] // -1 = any worker, 0..N-1 = pinned to this lane + pinnedTo [MAX_TASKS] // -1 = no pin, else = predecessor task index whose lane to inherit + completedOnLane[MAX_TASKS] // which lane completed this task (-1 = not done / ran on main) + perWorkerDone[MAX_TASKS * MAX_LANES] // 0 = not done, 1 = done (for unique_per_worker tasks) + edgeCount // [1] — current successor edge count (for dynamic append) + notify // [1] — generation counter for worker wakeup (see §Notify protocol) + firstReady // [1] — low-water mark: all tasks below this index are DONE (optimization) + buildDone // [1] — 0 = running, 1 = done, 2 = error (workers check and exit) + chunkOffset [MAX_RENDER_CHUNKS] // byte offset into chunkDataSAB for render chunk i + chunkLength [MAX_RENDER_CHUNKS] // byte length of render chunk i's JSON in chunkDataSAB +``` + + MAX_RENDER_CHUNKS = MAX_LANES * 10 // SLICES_PER_WORKER = 10 + +Total size: `(MAX_TASKS * 9 + MAX_EDGES + MAX_TASKS * MAX_LANES ++ MAX_RENDER_CHUNKS * 2 + 6) * 4` bytes. +With the defaults above: ~73 KB. Negligible. + +### Task ID mapping + +The SAB uses integer indices. A bidirectional mapping (name -> index, +index -> name) is built at startup for static tasks and extended by +`dispatch` for dynamic tasks. + +Static tasks get indices 0..N-1 in definition order. Dynamic task +slots (render chunks + renderJoin) are pre-reserved starting at index +`DYNAMIC_BASE`. The maximum number of render chunks is known: +`workerCount * SLICES_PER_WORKER` (currently 10 slices/worker). +`renderJoin` gets the slot after the last possible render chunk. + +### Graph metadata (init message to workers) + +The SAB encodes mutable scheduling state (dep counts, status, flags). +Immutable graph structure that workers also need is sent once at build +start via a `postMessage` init message. This includes: + +- **`taskMeta[]`** --- per-task-index metadata array: + - `handler`: string name of the worker handler function to call + (e.g. `"render"`, `"scssLight"`, `"warmInit"`). Workers use this + to look up the right function after claiming a task by index. + - `perWorkerDeps`: array of task indices that are + `unique_per_worker` dependencies. Empty for most tasks; render + chunks have `[renderEnvInitIdx]`. Workers check these after + claiming. + - `expectedIdxs`: array of predecessor task indices used as + preconditions for `unique_per_worker` tasks (Phase 10). Workers + verify each predecessor is DONE before executing the per-worker + instance. Empty for tasks without predecessors (e.g. `warmInit`); + `renderEnvInit` has `[dispatchIdx]`. + - `name`: string task name (for timing / error messages). + +- **`ctx`** --- the build context (`{ srcRoot, destRoot, opts, + workerCount }`). Small, immutable within a build. Workers cache it + and use it for seed handlers (`buildInfo`, `scssLight`, `scssDark`, + `mermaid`). + +- **`sabRef`** --- the scheduling SharedArrayBuffer itself. + +- **`idMapping`** --- task name to index and index to name maps + (for debug logging; not needed for the hot path). + +The init message is sent once per build. In serve mode, each rebuild +sends a fresh init message with a new SAB and ctx. Workers detect +the new init message, switch to the new SAB, reset their cached +`chunkDataSAB` / `renderEnv`, and re-enter the pull loop. + +Dynamic task registration (`dispatch` creating `render:i` tasks) +extends the metadata: `dispatch.submit()` broadcasts an update +message with the new task entries' metadata (handler names, +perWorkerDeps) for the dynamically-registered index range. Workers +merge this into their local `taskMeta` array. This arrives before +the render tasks become READY in the SAB (dispatch sets their status +to READY after broadcasting). + +## Task definition format + +Existing fields (`expected`, `handler`, `runOnMain`, `execute`, +`submit`) are retained. New fields: + +```js +warmInit: { + expected: [], + on_demand: true, // not started until a dependent needs it + unique_per_worker: true, // one instance per worker lane + run_when_idle: true, // speculatively run when worker has no other work + handler: "warmInit", + // No submit --- unique_per_worker tasks don't participate in the + // normal dependency graph. +}, + +'render:0': { + expected: ["dispatch"], // normal dep (dispatch must complete) + perWorkerDeps: ["warmInit"], // checked at claim time, per-lane + handler: "render", + // ... +}, + +someTask: { + expected: ["priorTask"], + pin_to_predecessor: "priorTask", // must run on priorTask's worker lane + handler: "someHandler", + // ... +}, +``` + +### `submit()` split + +`submit()` currently does two things: (a) signal dependency completion +via `emit()`, and (b) mutate `SharedState`. Under the new model: + +- **(a) Dependency signaling** moves to the SAB. The completing thread + (worker or main) atomically decrements successor dep counts. This + is encoded in the SAB's successor adjacency list, not in `submit()`. + +- **(b) State mutation** stays in `submit()`, which runs on the main + thread as before. Worker tasks fire-and-forget their output via + `postMessage`; the main thread runs `submit()` to merge the output + into `SharedState` when it processes the message. + +The ordering constraint: a worker posts the output message BEFORE +updating successor dep counts in the SAB. Since worker-to-main +messages are FIFO, the merge message arrives before the main thread +would claim any downstream `runOnMain` task. The main thread drains +all pending messages before scanning the SAB for ready main-thread +tasks, ensuring merges complete first. + +For **worker-to-worker chains** (e.g. render:i -> renderJoin where +renderJoin is a trivial barrier), the successor dep count update +happens directly in the SAB with no main-thread involvement. If +`render:i.submit()` has state mutations (merging page deltas), the +merge message is fire-and-forget --- it doesn't gate the next worker +task because the downstream workers don't read `SharedState`. + +### How submit() is triggered + +Worker tasks: the main thread's message handler calls `submit()` +when it processes the output message. This is asynchronous relative +to the worker's progress (the worker has already moved on to its next +task via SAB). + +Main-thread tasks: `submit()` is called inline after `execute()` +completes, as today. + +## Worker pull loop + +Each worker runs a persistent loop after receiving the SAB and graph +metadata at startup: + +``` +function pullLoop(sab, views, myLane, handlers, graphMeta): + loop: + if Atomics.load(views.buildDone, 0) !== 0: return + + taskIdx = scanAndClaim(views, myLane) + if taskIdx === -1: + gen = Atomics.load(views.notify, 0) + // Double-check after reading gen — a task may have become READY + // between our scan and this load. + taskIdx = scanAndClaim(views, myLane) + if taskIdx === -1: + Atomics.wait(views.notify, 0, gen, 50) // sleep until gen changes, 50ms fallback + continue + + // Check per-worker deps (unique_per_worker) + unsatisfied = null + for each perWorkerDep D of graphMeta[taskIdx]: + if Atomics.load(views.perWorkerDone, D * MAX_LANES + myLane) === 0: + unsatisfied = D + break + + if unsatisfied !== null: + if flags[unsatisfied] & F_ON_DEMAND && !(flags[unsatisfied] & F_RUN_ON_MAIN): + // On-demand worker dep (e.g. warmInit): per-worker, no contention. + // Release the original task BEFORE executing the dep. + Atomics.store(views.status, taskIdx, READY) + Atomics.add(views.notify, 0, 1) + Atomics.notify(views.notify, 0, 1) // wake one worker for the released task + + execute handlers[unsatisfied] + Atomics.store(views.perWorkerDone, unsatisfied * MAX_LANES + myLane, 1) + continue // re-enter pull loop; may reclaim original task or get a different one + + if flags[unsatisfied] & F_ON_DEMAND && flags[unsatisfied] & F_RUN_ON_MAIN: + // On-demand main-thread dep: trigger it, release our task, wait. + Atomics.store(views.status, unsatisfied, READY) + postMessage({ triggerMainTask: unsatisfied }) + Atomics.store(views.status, taskIdx, READY) + Atomics.add(views.notify, 0, 1) + Atomics.notify(views.notify, 0, 1) // wake one worker for the released task + + // Check for other non-dependent work before sleeping + altTask = scanAndClaim(views, myLane) // may find unrelated work + if altTask !== -1: + // ... execute altTask (same path as below) ... + else: + // Wait on the dep's status slot rather than spinning claim-release cycles + Atomics.wait(views.status, unsatisfied, READY) + continue // re-enter pull loop + + // unique_per_worker without on_demand: should already be done + // (was started eagerly). Spin-wait. + Atomics.store(views.status, taskIdx, READY) + Atomics.add(views.notify, 0, 1) + Atomics.notify(views.notify, 0, 1) + waitForPerWorkerDone(views, unsatisfied, myLane) + continue + + // All deps satisfied --- execute + result = await handlers[graphMeta[taskIdx].handler](taskPayload) + postMessage({ done: taskIdx, output: result }) // fire-and-forget + onTaskDone(views, taskIdx, myLane, graphMeta) +``` + +### scanAndClaim + +``` +function scanAndClaim(views, myLane): + start = Atomics.load(views.firstReady, 0) + count = Atomics.load(views.taskCount, 0) + for i = start to count - 1: + if Atomics.load(views.status, i) !== READY: continue + if Atomics.load(views.flags, i) & F_RUN_ON_MAIN: continue + aff = Atomics.load(views.affinityLane, i) + if aff !== -1 && aff !== myLane: continue + if Atomics.compareExchange(views.status, i, READY, CLAIMED) === READY: + return i + return -1 +``` + +### onTaskDone (successor dep count update) + +``` +function onTaskDone(views, taskIdx, lane, graphMeta): + Atomics.store(views.status, taskIdx, DONE) + Atomics.store(views.completedOnLane, taskIdx, lane) + advanceFirstReady(views, taskIdx) + + readyCount = 0 + wakeMain = false + + off = Atomics.load(views.succOffset, taskIdx) + count = Atomics.load(views.succCount, taskIdx) + for i = off to off + count - 1: + succ = Atomics.load(views.succList, i) + + // Skip unique_per_worker successors (not tracked via depCount) + if Atomics.load(views.flags, succ) & F_UNIQUE_PER_WORKER: continue + + remaining = Atomics.sub(views.depCount, succ, 1) - 1 + if remaining === 0: + // Set affinity if successor is pinned + pin = Atomics.load(views.pinnedTo, succ) + if pin !== -1: + srcLane = Atomics.load(views.completedOnLane, pin) + Atomics.store(views.affinityLane, succ, srcLane) + + Atomics.store(views.status, succ, READY) + + if Atomics.load(views.flags, succ) & F_RUN_ON_MAIN: + wakeMain = true + else: + readyCount++ + + // Bump generation counter and wake the right number of workers + if readyCount > 0: + Atomics.add(views.notify, 0, 1) + Atomics.notify(views.notify, 0, readyCount) + if wakeMain: + postMessage({ mainTaskReady: true }) +``` + +### advanceFirstReady + +`firstReady` is a low-water mark: all task indices below it are DONE. +It only advances forward (monotonic). After any task transitions to +DONE, the completing thread tries to advance past consecutive DONE +tasks starting from the current value: + +``` +function advanceFirstReady(views, taskIdx): + count = Atomics.load(views.taskCount, 0) + cur = Atomics.load(views.firstReady, 0) + if taskIdx !== cur: return // not at the frontier; nothing to advance + next = cur + while next < count && Atomics.load(views.status, next) === DONE: + next++ + if next > cur: + Atomics.compareExchange(views.firstReady, 0, cur, next) + // CAS may fail if another thread advanced it further; that's fine. +``` + +This is a best-effort optimization. The CAS failure case is harmless +--- the competing thread advanced the pointer at least as far, so no +scan work is wasted. The scan is already microseconds, so this avoids +re-checking the early spine tasks (config, discover, nav, ...) once +they've completed and the build is in the render fan-out or write +phase. + +### Anti-thundering-herd for on-demand main-thread deps + +When a worker discovers an unsatisfied on-demand main-thread dep, it: + +1. Claims the dep (CAS on its status, or just sets it READY if it's + on_demand + not yet triggered --- first writer wins since + READY is idempotent) +2. Releases its original task back to READY +3. **Waits on the dep's status slot** (`Atomics.wait(status, depIdx, ...)`) + rather than re-entering the scan loop + +This prevents N workers from cycling through claim-release on render +chunks while the main thread runs the dep. When the dep completes +(main thread sets its status to DONE + notifies), all waiting workers +wake and re-scan productively. + +Workers first check if other non-dependent work is available before +waiting. The check is cheap (one scan of the status array) and avoids +unnecessary sleeping when there's useful work to do. + +## Main thread protocol + +The main thread is event-loop driven. It does NOT spin or use +`Atomics.wait` (which would block the event loop). Instead: + +### Input accumulation (results map) + +Main-thread tasks receive an `inputs` object keyed by predecessor +name: `{ predecessor1: output1, predecessor2: output2, ... }`. Under +the current push scheduler, `emit()` accumulates these in a +`pending.received` map. Under the SAB model, dependency *counting* +moves to the SAB, but the actual *data* still flows through the main +thread. + +The main thread maintains a `results` map: +`Map`. Every task's output is stored here --- +both worker tasks (when the `{ done, output }` message is processed) +and main-thread tasks (inline after execute). When a main-thread +task becomes READY and the main thread claims it, it assembles the +inputs: + +``` +function assembleInputs(taskIdx, taskDef, results, idMapping): + inputs = {} + for predName of taskDef.expected: + predIdx = idMapping.nameToIdx[predName] + inputs[predName] = results.get(predIdx) + return inputs +``` + +This is simpler than the current `pending` machinery --- no received +counting, no emit routing. The SAB dep count handles "when is it +ready"; the results map handles "what data does it get." + +Worker tasks do NOT read from the results map. They get their inputs +from the SAB (chunk data, shared payload) or from `ctx`. The results +map is main-thread-only. + +### Message handler + +``` +worker.on('message', msg => { + if (msg.done != null) { + // Store output for downstream main-thread tasks' input assembly + results.set(msg.done, msg.output) + // Run submit() to merge into SharedState + taskDef = tasks[msg.done] + taskDef.submit(msg.output, state) + } + if (msg.mainTaskReady != null || msg.triggerMainTask != null) { + scheduleMainScan() + } +}) +``` + +`scheduleMainScan()` uses `queueMicrotask()` (coalesced --- skip if +already scheduled) so all pending messages are processed (output +stored + merges complete) before the scan runs. + +### Main-thread task execution + +``` +function mainScan(): + start = Atomics.load(views.firstReady, 0) + count = Atomics.load(views.taskCount, 0) + for i = start to count - 1: + if Atomics.load(views.status, i) !== READY: continue + if !(Atomics.load(views.flags, i) & F_RUN_ON_MAIN): continue + if Atomics.compareExchange(views.status, i, READY, CLAIMED) !== READY: continue + + // Check on-demand deps (main-thread on_demand deps are global, not per-worker) + unsatisfied = checkOnDemandDeps(i) + if unsatisfied !== null: + // Run the on-demand dep inline (single-threaded, no concurrency concern) + output = await executeMainTask(unsatisfied) + results.set(unsatisfied, output) + Atomics.store(views.status, unsatisfied, DONE) + advanceFirstReady(views, unsatisfied) + Atomics.notify(views.status, unsatisfied) // wake waiting workers + + inputs = assembleInputs(i, taskDef, results, idMapping) + output = await taskDef.execute(inputs, ctx, state) + results.set(i, output) + Atomics.store(views.status, i, DONE) + taskDef.submit(output, state) // mutate SharedState + onTaskDone(views, i, -1, graphMeta) // -1 = main thread lane + + // Re-scan: the task we just completed may have made more main tasks ready + scheduleMainScan() + return +``` + +### Draining messages before scanning + +The main thread processes worker messages in the event loop's message +handler. `scheduleMainScan()` posts a microtask. Since microtasks run +after the current handler but before the next event, and multiple +worker messages in the same event-loop tick are processed sequentially, +all pending merges complete before the scan. + +If messages arrive while a `runOnMain` execute() is in progress, they +queue until execute() yields (await) or completes. This is the +irreducible cost of main-thread tasks --- same as today, but worker- +to-worker transitions no longer pay it. + +## Dynamic task registration (dispatch) + +`dispatch` runs on the main thread. After computing chunks: + +1. Write render:i task entries into pre-reserved SAB slots: + - `depCount[slot]` = 0 (render chunks are seeded directly) + - `flags[slot]` = 0 (worker task, not on_demand/unique_per_worker) + - `perWorkerDeps` metadata = `[warmInitIdx]` + - successor entries pointing to `renderJoinIdx` + +2. Write renderJoin entry: + - `depCount[renderJoinIdx]` = N (one per render chunk) + - `flags[renderJoinIdx]` = F_RUN_ON_MAIN + - successor entries to write, writePdf, searchData + +3. Append successor edges for render:i -> renderJoin to `succList`. + +4. Update `Atomics.store(views.taskCount, 0, newCount)`. + +5. Set each render:i's status to READY and notify workers. + +Workers that are waiting (Atomics.wait) wake up and see the new +render tasks. + +### Task inputs for render chunks + +The render chunks need their page data (the chunk array + the shared +SAB broadcast payload). The approach: extend the existing +`sab-broadcast.mjs` pattern. + +After `dispatch` creates chunks on the main thread: + +1. JSON-serialize each chunk, concatenate the byte arrays into one + buffer. +2. Pack into a single `chunkDataSAB` (SharedArrayBuffer). +3. Write offset/length per chunk into the scheduling SAB's + `chunkOffset` / `chunkLength` arrays. +4. Broadcast `{ chunkDataSAB }` to all workers in a single + postMessage (the SAB is a shared reference, not cloned). + +Workers store the `chunkDataSAB` reference when they receive the +message. When a worker claims `render:i`, it reads +`chunkOffset[i]` / `chunkLength[i]` from the scheduling SAB and +deserializes its slice from `chunkDataSAB`. Same cost as today's +structured clone per chunk, but without the main thread serializing +N copies and without blocking on per-worker postMessage delivery. + +The existing shared payload SAB (site data, initData, link tables) +is packed separately by `dispatch` as today and included in the same +broadcast message. + +Workers that are busy with non-render work (scss, buildInfo) when the +broadcast arrives queue the message and read it when they first claim +a render chunk. + +### Task inputs for non-render worker tasks + +`buildInfo`, `scssLight`, `scssDark`, and `mermaid` need `ctx` +(mainly `ctx.srcRoot`). The `ctx` object is small and immutable +within a build. It is sent once at build start alongside the +scheduling SAB as part of the init message. Workers cache it. + +In serve mode, a fresh `ctx` is sent with each rebuild's new SAB. + +## Build start sequence + +Step-by-step from `runBuild()` entry to the first task executing: + +1. **Allocate the scheduling SAB.** `allocSchedulerSAB(TASKS, + workerCount)` reads the task definitions, assigns integer indices, + computes the successor adjacency list, and writes all static + fields (depCount, flags, succOffset/succCount/succList, pinnedTo, + affinityLane initialized to -1, completedOnLane initialized to -1, + perWorkerDone zeroed, firstReady = 0, notify = 0, buildDone = 0). + Returns `{ sab, views, idMapping, taskMeta }`. + + Seed tasks (expected.length === 0 AND NOT on_demand) have their + status set to READY. All others are NOT_READY. On-demand seeds + stay NOT_READY until triggered. + +2. **Construct or reuse the worker pool.** In `runBuild()`, the pool + is created fresh (and destroyed after the build). In serve mode, + the pool persists and is reused. + +3. **Post init message to all workers.** Each worker receives: + ``` + { init: true, sab, taskMeta, ctx, idMapping } + ``` + Workers store these, create Int32Array views over the SAB, and + enter the pull loop. Workers that were sleeping from a previous + build (serve mode) receive this as a regular message, detect the + `init` flag, switch to the new SAB, reset cached state + (`chunkDataSAB`, `renderEnv`), and re-enter the pull loop. + +4. **Main thread enters its scan loop.** `scheduleMainScan()` is + called once to kick off the first scan. Seed `runOnMain` tasks + (e.g. `config`) are already READY in the SAB, so the first + `mainScan()` claims and executes them. + +5. **Workers wake and scan.** Workers see seed worker tasks + (`buildInfo`, `scssLight`, `scssDark`, optionally `mermaid`) as + READY in the SAB and claim them. + +6. **Build proceeds.** Workers and main thread independently claim + and execute tasks via the SAB. Worker outputs flow to the main + thread as fire-and-forget messages. The main thread merges, + accumulates results, and claims main-thread tasks as they become + ready. + +7. **Dispatch phase.** When `dispatch` (runOnMain) executes, it: + - Computes render chunks and builds the chunkDataSAB + sharedSAB. + - Writes dynamic task entries (render:i, renderJoin) into the + pre-reserved SAB slots. + - Broadcasts `{ renderData: true, chunkDataSAB, sharedSAB, + taskMeta: [...new entries...] }` to all workers. + - Sets render:i status to READY and bumps the notify generation + counter: `Atomics.add(views.notify, 0, 1)` + + `Atomics.notify(views.notify, 0, Infinity)`. + - Workers wake, merge the new taskMeta entries, store the + chunkDataSAB, and claim render chunks. + +8. **Completion.** The main thread detects all tasks DONE (or + untriggered on_demand). Sets `buildDone = 1` in the SAB, notifies + all workers. Workers exit the pull loop. `runBuild()` resolves. + +## What gets removed + +| Current code | Replacement | +|---|---| +| `WorkerPool._idleWarm` / `_idleCold` / `_warm` | Workers are equal; warmth is emergent | +| `WorkerPool._onWarmedUp()` | No warm/cold distinction | +| `WorkerPool._drain()` + `_queue` | Workers pull from SAB; no push queue | +| `WorkerPool.warmup()` | `warmInit` task (on_demand + unique_per_worker) | +| `WorkerPool.run()` for worker tasks | Workers self-schedule; main-thread tasks still use a thin dispatch | +| `Scheduler._flush()` / `_run()` push logic | SAB atomics | +| `Scheduler.pending` / `ready` / `emit()` | SAB depCount + status | +| `deferHighlighter` flag on task defs | Gone; warmInit is explicit | +| `ensureHighlighterInit()` in cpu-worker.mjs | `warmInit` handler | +| `warmedUp` / `warmBoot` message protocol | Gone | +| `warmup: true` message handling | Gone | + +`WorkerPool` reduces to a lifecycle manager: spawn workers at +construction, send them the SAB + metadata, terminate on destroy. +The message forwarding (output merges, main-task signals) remains. + +## Serve mode + +The pool persists across rebuilds. Per rebuild: + +1. Main thread allocates a new scheduling SAB (new dep counts, fresh + status array). +2. Main thread posts the new SAB to all workers as a "new build" + message. +3. Workers switch to the new SAB and enter the pull loop. +4. Build completes (main thread detects: all tasks DONE, no pending + work). +5. Workers go idle (`Atomics.wait` on the status array --- nothing + is READY). + +On the next rebuild, step 2 wakes all workers (they're waiting) and +they switch to the fresh SAB. + +The `chunkDataSAB` from the previous build is garbage-collected once +no worker holds a reference. + +## Completion detection + +The build is complete when: + +- All tasks have status DONE (or are unreachable --- e.g. on_demand + tasks that were never triggered). +- No worker is executing (all are in Atomics.wait or have exited + the pull loop). + +The main thread tracks this by counting: every time a task transitions +to DONE, increment a `doneCount` atomic. When `doneCount` equals the +number of non-on_demand tasks plus the number of triggered on_demand +tasks, the build is done. + +Simpler approach: the main thread's `mainScan()` checks after each +task completion whether any tasks remain (status != DONE, excluding +untriggered on_demand). When none remain, resolve the build promise. + +Workers are notified of build completion via a `buildDone` atomic in +the SAB. Workers check this in their pull loop and exit cleanly. + +## Error handling + +### Worker task failure + +Worker catches the error in its handler, posts +`{ error: taskIdx, message, stack }` to the main thread, and sets +the task's SAB status to a new `FAILED` state (value 4). The main +thread's error handler rejects the build promise (same as today's +`_onError`). All other workers see `FAILED` when scanning and skip +the task. + +Workers do NOT abort on a sibling's failure --- they continue +processing ready tasks until the main thread signals build abort +via the `buildDone` atomic (set to an error sentinel). Workers +check `buildDone` in their pull loop and exit. + +### Main-thread task failure + +Same as today: the main thread catches the error in `execute()`, +rejects the build promise, and signals workers to stop via +`buildDone`. + +### Worker crash + +Same as today: the crashed worker is not respawned; the pool +degrades. The worker's in-progress task stays CLAIMED forever +(no successor dep counts are decremented). The main thread +detects a stalled build via a timeout or the worker's `exit` event, +and aborts. + +## Timing / instrumentation + +Workers post timing data alongside their output: +`{ done: taskIdx, output, timing: { start, end } }`. The main +thread collects these into the existing `timings` map. The summary +and Gantt chart code are unchanged. + +For `unique_per_worker` tasks (`warmInit`), each worker posts its own +timing. The summary consolidates these per-lane, same as render chunks. + +## Migration phases + +### Current state (phases 0--4 are done) + +Phases 0--4 from [PLAN-scheduler.md](PLAN-scheduler.md) are fully +implemented. The codebase already has: + +- `builder/scheduler.mjs` --- push-based `Scheduler` class with + `SharedState`, `pending`/`ready` maps, `_flush()`/`_run()` dispatch. +- `builder/worker-pool.mjs` --- `WorkerPool` with two-tier idle queue + (`_idleWarm` / `_idleCold`), `warmup()`, `run()` push dispatch. +- `builder/cpu-worker.mjs` --- worker harness with `parentPort` + message loop, `ensureHighlighterInit()`, `getOrInitRenderEnv()`, + named handlers (`scssLight`, `scssDark`, `mermaid`, `buildInfo`, + `render`). +- `builder/sab-broadcast.mjs` --- `packShared()` / `unpackShared()` + for the render fan-out's shared payload SAB. +- `builder/tbdocs.mjs` --- full task DAG (`TASKS` object) with all + static and dynamic task definitions, `dispatch.submit()` dynamic + registration of `render:i` + `renderJoin`, Gantt chart instrumentation. +- `builder/serve.mjs` --- dev server reusing the pool across rebuilds. + +The build runs end-to-end through the push scheduler with worker +fan-out. `build.bat && check.bat` is clean at baseline. + +The phases below (5--8) replace the push scheduler internals with +the SAB-based pull model while preserving the task definitions, +handler functions, and external behavior. + +### Phase 5: SAB scheduler skeleton + +**Files:** new `builder/sab-scheduler.mjs`, modifications to +`scheduler.mjs`, `worker-pool.mjs`, `cpu-worker.mjs`. + +1. Define SAB constants (`MAX_TASKS`, `MAX_LANES`, `MAX_EDGES`, + `MAX_RENDER_CHUNKS`, status values, flag bits), byte-offset + calculations, and a `createViews(sab)` helper that returns an + object of named Int32Array views over the SAB. + +2. Add `allocSchedulerSAB(taskDefs, workerCount)`: + - Assigns integer indices to each static task (definition order). + - Pre-reserves `DYNAMIC_BASE` through + `DYNAMIC_BASE + MAX_RENDER_CHUNKS` for render chunks, plus one + slot for `renderJoin`. + - Builds the successor adjacency list from `taskDef.expected` + (inverting predecessor lists to successor lists). + - Writes depCount, flags (from `runOnMain`, `on_demand`, + `unique_per_worker`, `pin_to_predecessor`), succOffset/succCount/ + succList, pinnedTo, affinityLane (-1), completedOnLane (-1). + - Sets seed tasks' status to READY (except on_demand seeds). + - Returns `{ sab, views, idMapping, taskMeta }`. + +3. `idMapping` contains: + - `nameToIdx`: `Map` (task name -> SAB index). + - `idxToName`: `string[]` (SAB index -> task name). + - `DYNAMIC_BASE`, `RENDER_JOIN_IDX`: constants for dispatch. + +4. `taskMeta` is a plain array indexed by task index: + - `taskMeta[i].handler`: handler function name (string). + - `taskMeta[i].perWorkerDeps`: array of task indices (for + unique_per_worker deps). Empty for most tasks. + - `taskMeta[i].expectedIdxs`: array of predecessor task indices + (for precondition checking on unique_per_worker tasks with + `expected` predecessors, Phase 10). Empty for most tasks. + Populated by mapping `def.expected` names through `nameToIdx`. + - `taskMeta[i].name`: task name (for debug/timing). + +5. Add the `warmInit` task definition to `TASKS` in `tbdocs.mjs`: + ```js + warmInit: { + expected: [], + on_demand: true, + unique_per_worker: true, + handler: "warmInit", + submit() {}, + }, + ``` + No runtime behavior change yet --- the push scheduler ignores the + new flags and the warmInit handler is not wired up. + +6. No runtime behavior change. The existing push scheduler still + runs. This phase adds data structures only. + +**Verification:** assert at build time that the SAB encodes the +expected dep counts and successor edges for the static task graph. +`build.bat && check.bat` clean; output unchanged. + +### Phase 6: Worker pull loop + +**This is the critical phase.** Worker-to-worker transitions move +to the SAB. Main-thread tasks still run via the existing push +scheduler, bridged into the SAB. + +1. **Init message handling.** `WorkerPool` sends `{ init: true, sab, + taskMeta, ctx, idMapping }` to each worker after construction (or + after each rebuild in serve mode). Workers store these and create + SAB views. + +2. **Handler table.** `cpu-worker.mjs` keeps its existing named + handlers (`scssLight`, `scssDark`, `mermaid`, `buildInfo`, + `render`) and adds `warmInit`: + ```js + async warmInit() { + const start = Date.now(); + const highlighter = await (await import("./highlight.mjs")).initHighlighter(); + return { warmInit: true, timing: { start, end: Date.now() } }; + } + ``` + The pull loop looks up the handler by name: + `handlers[taskMeta[taskIdx].handler]`. + +3. **Pull loop.** Replace the `parentPort.on('message')` dispatch + with the persistent pull loop (§Worker pull loop pseudocode). + The message handler is retained only for: + - `{ init }` --- switch to new SAB + metadata. + - `{ renderData, chunkDataSAB, sharedSAB, taskMeta }` --- store + chunk data and merge new taskMeta entries from dispatch. + +4. **Output posting.** After executing a task, the worker posts: + ``` + { done: taskIdx, output: result, timing: { start, end } } + ``` + Then calls `onTaskDone()` to update the SAB. The `postMessage` + happens BEFORE the SAB update (ordering constraint from §submit() + split). + +5. **Bridge: main thread updates SAB after its tasks.** The existing + push scheduler's `_onDone()` is extended: after running `submit()` + and `emit()` as today, it also calls `onTaskDone(views, taskIdx, + -1, graphMeta)` to decrement successor dep counts in the SAB and + set newly-ready tasks to READY. This lets workers see downstream + tasks become ready immediately after a main-thread task completes, + without waiting for the push scheduler's `_flush()`. + + The push scheduler's `_flush()` / `_run()` still handles + main-thread tasks. Worker tasks are no longer dispatched through + `pool.run()` --- they're pulled from the SAB. + +6. **warmInit replaces ensureHighlighterInit().** The `warmInit` + handler does the same work (dynamic import of highlight.mjs + + initHighlighter). The on-demand + unique_per_worker flags ensure + it runs once per lane, only when needed. The `deferHighlighter` + flag and `ensureHighlighterInit()` calls are removed. + +**Verification:** `build.bat && check.bat` clean. Timing summary +shows render chunks starting without main-thread gaps. `warmInit` +appears in per-lane timing (consolidated like render chunks). + +### Phase 7: Main-thread SAB integration + +Replace the push scheduler's main-thread dispatch with SAB-based +claiming. The `Scheduler` class is rewritten. + +1. **`results` map.** The scheduler maintains + `results: Map`. Populated in two places: + - Worker output messages: `results.set(msg.done, msg.output)`. + - Main-thread task completion: `results.set(idx, output)` inline. + +2. **`assembleInputs()`.** Before executing a `runOnMain` task, the + scheduler reads the task definition's `expected` array, maps each + predecessor name to its index via `idMapping`, looks up the output + in `results`, and builds the `inputs` object: + ```js + function assembleInputs(taskIdx, taskDef, results, idMapping) { + const inputs = {}; + for (const predName of taskDef.expected) { + const predIdx = idMapping.nameToIdx.get(predName); + inputs[predName] = results.get(predIdx); + } + return inputs; + } + ``` + This replaces the current `pending.received` accumulation and + `emit()` routing. + +3. **`mainScan()`.** Replaces `_flush()` / `_run()`. Scans the SAB + for READY + F_RUN_ON_MAIN tasks, claims via CAS, assembles inputs, + executes, runs `submit()`, calls `onTaskDone()` to update successor + dep counts. See §Main-thread task execution pseudocode. + +4. **Message handler.** Replaces the pool's completion callback. + Processes `{ done, output }` (store + submit), `{ mainTaskReady }` + and `{ triggerMainTask }` (schedule scan), `{ error }` (abort). + Uses `queueMicrotask` coalescing so all pending messages drain + before scanning. + +5. **Completion detection.** After each `onTaskDone()` call from the + main thread, check: scan the SAB for any task that is not DONE + and not an untriggered on_demand task. If none remain, set + `buildDone = 1`, notify all workers, resolve the build promise. + +6. **Remove push machinery.** Delete `Scheduler.pending`, `ready`, + `emit()`, `_flush()`, `_run()`, `seed()`, `register()`. + `dispatch.submit()` now writes directly to the SAB and broadcasts + to workers (see §Build start sequence step 7) instead of calling + `scheduler.register()` / `scheduler.seed()`. + +**Verification:** `build.bat && check.bat` clean. Full build runs +through the SAB scheduler with no push-based code paths. The timing +summary and Gantt chart are identical to Phase 6 (same tasks, same +concurrency, different dispatch mechanism). + +### Phase 8: Cleanup + +1. Remove `WorkerPool._idleWarm`, `_idleCold`, `_warm`, + `_onWarmedUp`, `_drain`, `_queue`, `warmup()`. +2. Remove `deferHighlighter` from task defs and cpu-worker. +3. Remove `warmedUp` / `warmBoot` message protocol. +4. Remove `warmup: true` handling in cpu-worker. +5. `WorkerPool` becomes a thin lifecycle manager: spawn, forward + messages, terminate. +6. Update `serve.mjs` per-rebuild SAB reallocation. +7. Update Gantt chart to include `warmInit` per-lane entries. + +**Verification:** `build.bat && check.bat` clean. Serve mode works +(rebuild on file change, workers reuse across rebuilds). No +warmup-related code remains. + +### Phase 9: Speculative idle execution (`run_when_idle`) + +After Phase 8, `warmInit` is on-demand: it runs only when a worker +claims a render chunk and discovers the per-worker dep is unsatisfied. +This is correct but leaves performance on the table --- workers that +finish seed tasks (scss, buildInfo, mermaid) sit idle during the +main-thread spine (~200 ms) when they could be warming up. + +This phase adds `F_RUN_WHEN_IDLE` and wires `warmInit` to use it. + +1. **Flag bit.** Add `F_RUN_WHEN_IDLE = 16` to the SAB flag + constants and `run_when_idle` to the task definition schema. + `allocSchedulerSAB` sets the bit when the task def has + `run_when_idle: true`. + +2. **Pull loop change.** In the worker pull loop, after + `scanAndClaim` returns -1 (no claimable work) and before the + sleep path, insert a speculative-execution check: + + ``` + if taskIdx === -1: + // Speculative: run idle-eligible tasks before sleeping + idleTask = findIdleTask(views, myLane) + if idleTask !== -1: + execute handlers[idleTask] + Atomics.store(views.perWorkerDone, idleTask * MAX_LANES + myLane, 1) + postMessage({ done: idleTask, output, timing: { start, end } }) + continue // re-enter pull loop (real work may have appeared) + + // Nothing to do — sleep + gen = Atomics.load(views.notify, 0) + taskIdx = scanAndClaim(views, myLane) // double-check + if taskIdx === -1: + Atomics.wait(views.notify, 0, gen, 50) + continue + ``` + +3. **`findIdleTask`.** Scans task indices for tasks with + `F_RUN_WHEN_IDLE` set: + + ``` + function findIdleTask(views, myLane): + count = Atomics.load(views.taskCount, 0) + for i = 0 to count - 1: + if !(Atomics.load(views.flags, i) & F_RUN_WHEN_IDLE): continue + if Atomics.load(views.flags, i) & F_UNIQUE_PER_WORKER: + if Atomics.load(views.perWorkerDone, i * MAX_LANES + myLane) === 0: + return i // per-worker: no contention, no CAS needed + else: + if Atomics.load(views.status, i) !== DONE: + if Atomics.compareExchange(views.status, i, NOT_READY, CLAIMED) === NOT_READY: + return i + return -1 + ``` + + In practice, only `warmInit` has this flag, and it's + `unique_per_worker`, so the scan hits one task and checks one + per-worker-done flag. After a worker has run its `warmInit`, the + check short-circuits on every subsequent idle pass. + +4. **`warmInit` task def.** Add `run_when_idle: true` alongside the + existing `on_demand: true` and `unique_per_worker: true`. + +5. **No change to the render claim path.** Render chunks still list + `warmInit` in `perWorkerDeps`. If a render chunk becomes ready + before the idle-speculative path ran (e.g. the worker was busy + with scss the whole time), the existing on-demand claim-release + protocol handles it. The two paths are complementary, not + alternatives. + +**Verification:** `build.bat && check.bat` clean. Timing summary +shows `warmInit` per-lane timings overlapping with the main-thread +spine (starting around t=100--200 ms, while discover/nav/seo are +running), rather than clustering at render-chunk claim time +(t=400+ ms). This is the same overlap the old `pool.warmup()` +achieved, now expressed declaratively. + +### Phase 10: Explicit render env init (`renderEnvInit`) + +After Phase 9, the first render chunk on each worker pays a hidden +startup cost inside `getOrInitRenderEnv`: unpack ~300 KB shared +payload (JSON.parse), reconstruct three link-table Maps (~857 entries +each), instantiate markdown-it with plugins, build two Sets +(`staticFilesArr`, `sitePathsArr`). This cost is invisible in +timing --- it's buried inside the first render chunk's wall-clock --- +and it front-loads onto one chunk per worker, making that chunk +appear ~10--15 ms slower than the rest. + +This phase extracts the init into an explicit per-worker task, +making the cost visible, moving it off the render hot path, and +eliminating the `while (!_chunkDataSAB)` polling loop from the +render handler. + +#### Design extension: `unique_per_worker` tasks with predecessors + +Phases 5--9 treat `unique_per_worker` tasks as seeds (no `expected` +predecessors). `renderEnvInit` needs `dispatch` to be DONE before it +can run (the sharedSAB doesn't exist until then). This requires +allowing `expected` on `unique_per_worker` tasks. + +The semantics: `expected` predecessors on a `unique_per_worker` task +are **preconditions**, checked as read-only SAB status reads before +the per-worker instance executes. They are NOT tracked via depCount +(the task still doesn't participate in normal dependency counting). +The worker simply verifies each predecessor is DONE: + +``` +// About to run on-demand unique_per_worker dep D: +for each predIdx of taskMeta[D].expectedIdxs: + if Atomics.load(views.status, predIdx) !== DONE: + // Precondition not met; release original task and re-scan. + Atomics.store(views.status, taskIdx, READY) + Atomics.add(views.notify, 0, 1) + Atomics.notify(views.notify, 0, 1) + continue outer loop +``` + +The precondition check runs after the per-worker dep check (nested +deps are checked first). If `renderEnvInit` depends on `warmInit` +via `perWorkerDeps`, the flow for a render chunk is: + +1. Worker claims render:i +2. Checks perWorkerDeps: `renderEnvInit[W]` not done +3. About to run renderEnvInit on-demand; checks its perWorkerDeps: + `warmInit[W]` not done +4. Runs warmInit[W] (on-demand, releases render:i first) +5. Re-enters pull loop, claims render:i again +6. Checks perWorkerDeps: `renderEnvInit[W]` not done +7. About to run renderEnvInit; checks its perWorkerDeps: + `warmInit[W]` done +8. Checks renderEnvInit's preconditions: `dispatch` DONE? Yes +9. Runs renderEnvInit[W] (releases render:i first) +10. Re-enters pull loop, claims render:i again +11. Checks perWorkerDeps: `renderEnvInit[W]` done +12. Executes render:i --- env already initialized, no polling + +If Phase 9's `run_when_idle` already ran warmInit during the spine, +steps 3--5 are skipped (warmInit[W] is already done). + +#### Task definitions + +```js +warmInit: { + expected: [], + on_demand: true, + unique_per_worker: true, + run_when_idle: true, + handler: "warmInit", + submit() {}, +}, + +renderEnvInit: { + expected: ["dispatch"], // precondition: sharedSAB exists + perWorkerDeps: ["warmInit"], // needs Shiki loaded + on_demand: true, + unique_per_worker: true, + handler: "renderEnvInit", + submit() {}, +}, +``` + +Render chunks change from `perWorkerDeps: ["warmInit"]` to +`perWorkerDeps: ["renderEnvInit"]`. This chains the dependency: +render → renderEnvInit → warmInit. + +#### Handler + +The `renderEnvInit` handler does what `getOrInitRenderEnv` does +today, minus the highlighter init (already done by warmInit): + +```js +async renderEnvInit() { + // Wait for renderData message (sharedSAB delivered via postMessage + // after dispatch; may not be processed yet if we were in + // Atomics.wait when it arrived). + while (!_sharedSAB) { + await new Promise(resolve => setImmediate(resolve)); + } + + const { siteData, initData, linkTablesData, staticFilesArr, + baseurl, buildInfo, sitePathsArr, + skipOffline } = unpackShared(_sharedSAB); + + const { initHighlighter } = await import("./highlight.mjs"); + const highlighter = await initHighlighter(); // cached; instant after warmInit + const linkTables = reconstructLinkTables(linkTablesData); + const staticFiles = new Set(staticFilesArr); + const markdown = createMarkdownIt({ highlighter, linkTables, baseurl, staticFiles }); + const site = { ...siteData, markdown, buildInfo }; + + let offlineBase = null; + if (!skipOffline) { + offlineBase = { + sitePaths: new Set(sitePathsArr), + baseurl: normalizeBaseurl(baseurl), + }; + } + + _renderEnv = { site, initData, offlineBase }; + return {}; +} +``` + +The render handler simplifies --- no lazy init, no polling: + +```js +async render(taskIdx) { + const workerStart = Date.now(); + + const chunkIndex = taskIdx - idMapping.DYNAMIC_BASE; + const offset = Atomics.load(views.chunkOffset, chunkIndex); + const length = Atomics.load(views.chunkLength, chunkIndex); + const chunk = JSON.parse( + new TextDecoder().decode(new Uint8Array(_chunkDataSAB, offset, length)), + ); + + // _renderEnv guaranteed initialized by renderEnvInit (perWorkerDep). + const env = _renderEnv; + + await renderPhase(chunk, env.site); + await templatePhase(chunk, env.site, env.initData); + // ... offline rewriting ... +} +``` + +`getOrInitRenderEnv` is deleted. + +#### Changes to `findIdleTask` (Phase 9) + +`findIdleTask` must respect preconditions for `run_when_idle` tasks +that have `expected` predecessors. `renderEnvInit` is NOT +`run_when_idle` (no benefit --- there's no idle window between +dispatch and render chunks), so this doesn't apply to it. But if a +future `run_when_idle` task has predecessors, `findIdleTask` should +check them: + +``` +function findIdleTask(views, myLane, taskMeta): + count = Atomics.load(views.taskCount, 0) + for i = 0 to count - 1: + if !(Atomics.load(views.flags, i) & F_RUN_WHEN_IDLE): continue + if Atomics.load(views.flags, i) & F_UNIQUE_PER_WORKER: + if Atomics.load(views.perWorkerDone, i * MAX_LANES + myLane) !== 0: + continue // already done on this lane + + // Check preconditions and perWorkerDeps before running + if !preconditionsMet(views, i, taskMeta): continue + if !perWorkerDepsMet(views, i, myLane, taskMeta): continue + + return i + // ... non-unique_per_worker case unchanged ... + return -1 +``` + +For `warmInit` (no predecessors, no perWorkerDeps), these checks +are no-ops. The guard exists for forward-compatibility. + +#### Implementation steps + +1. Extend the on-demand dep execution path in the pull loop: + before executing an on-demand `unique_per_worker` dep, check + its `expected` predecessors against SAB status. If any + predecessor is not DONE, release the original task and + re-scan. + +2. Extend the on-demand dep execution path to recursively check + `perWorkerDeps` on the dep itself. If `renderEnvInit` has + `perWorkerDeps: ["warmInit"]`, and warmInit is not done on + this lane, handle warmInit first (same on-demand protocol). + +3. Add the `renderEnvInit` task definition to `TASKS`. + +4. Add the `renderEnvInit` handler to `cpu-worker.mjs`. + +5. Move the `while (!_sharedSAB)` polling from the render handler + to `renderEnvInit`. + +6. Simplify the render handler: read `_renderEnv` directly. + +7. Delete `getOrInitRenderEnv` and the `_renderSAB` cache-check + variable. + +8. Update render chunk `perWorkerDeps` from `["warmInit"]` to + `["renderEnvInit"]` (in `allocSchedulerSAB`'s taskMeta + construction and in any task def that references it). + +9. Update `findIdleTask` to check preconditions and perWorkerDeps + for `run_when_idle` tasks (forward-compatibility). + +**Verification:** `build.bat && check.bat` clean. Timing summary +shows `renderEnvInit:w0`, `renderEnvInit:w1`, ... per-lane entries +(consolidated in the Boot section, ~10--15 ms each) appearing after +dispatch. First render chunk per worker no longer shows an inflated +wall-clock relative to subsequent chunks. Total render wall-clock +is unchanged (the init cost moved, not eliminated). + +### Phase 11: Amortize chunk packing into discover I/O gaps — DECLINED + +**Decision:** precondition is false; pages are mutated between +discover and dispatch. + +The plan assumed page objects are NOT mutated between +`discover.submit()` and `dispatch.submit()`, so JSON serialized +during discover would be identical to what `JSON.stringify` would +produce at pack time. In practice, `nav` mutates every page in +place (adding `navPath`, `breadcrumbs`, `children`, `navLevels`) +and `seo` adds `seoTitle`, `seoFullTitle`, `seoCanonical`, +`seoIsHome`. Both run between discover and dispatch. A debug +assertion (`TBDOCS_DEBUG=1`) confirmed the mismatch immediately +on the first chunk. + +The ~40 ms `packChunkData` cost is real but cannot be amortized +into discover's I/O gaps without either (a) serializing only the +discover-time fields and reconstructing the full page on the worker +(fragile, couples the cache to every future spine mutation), or +(b) re-serializing after the last mutation (which puts the work +back on the critical path and defeats the purpose). Neither is +worth the complexity for a 40 ms saving. + +--- + +*Original plan text retained below for reference.* + +`packChunkData` runs inside `dispatch.submit()`, after the dispatch +timing window closes. It JSON-serializes every chunk array (~858 +pages across ~16 chunks), creating a ~40 ms gap between the dispatch +bar and the first render bar in the Gantt. The serialization is +pure synchronous CPU work on the main thread. + +`discover` spends ~200 ms in async disk I/O (`fs.readFile` on every +source file via `Promise.all`). During each I/O wait the main +thread's event loop is idle. This phase pre-serializes page objects +inside discover's `Promise.all` callbacks, overlapping the ~40 ms of +CPU work with the ~200 ms of libuv reads. At dispatch time, +`packChunkData` concatenates pre-serialized strings instead of +re-traversing the page data. + +Independent of Phase 10 --- neither changes code the other touches. + +#### Precondition + +Page objects are NOT mutated between `discover.submit()` and +`dispatch.submit()`. The spine tasks (`nav`, `seo`, `markdownInit`, +`deriveRedirects`, `buildInit`) all read `state.pages` but write +their results to `state.site.*`, not back to individual page +objects. JSON serialized during discover is therefore identical to +what `JSON.stringify` would produce at pack time. + +A debug assertion (step 6) guards this invariant. + +#### Side-table, not page property + +Pre-serialized strings are stored in a `Map` returned +alongside `{ pages, staticFiles }` from `discover()` --- NOT as a +`_json` property on the page object. A property would be included +by `JSON.stringify(page)`, embedding a JSON-escaped copy of the +whole page inside itself and roughly doubling the payload. + +#### Data flow + +1. **`discover()` returns the cache.** After `buildPage()` creates + a page object, `JSON.stringify(page)` produces the pre-serialized + string. Both happen in the same `Promise.all` callback, right + after `await fs.readFile()`: + + ``` + await Promise.all(allFiles.map(async (srcRel) => { + ... + const raw = await fs.readFile(srcPath, "utf8") + const page = buildPage(srcRoot, srcRel, parsed) + jsonCache.set(page, JSON.stringify(page)) + pages.push(page) + })) + return { pages, staticFiles, jsonCache } + ``` + + Each `JSON.stringify` takes ~47 μs (40 ms / 858 pages). The + libuv thread pool continues servicing reads while the main thread + does this work. + + Object identity is preserved through `pages.sort()` and + `chunkPages()` --- both reorder or slice the same objects --- so + `jsonCache.get(page)` resolves correctly at pack time. + +2. **`discover.submit()` stores the cache on `state`.** + + ``` + submit(out, state) { + state.pages = out.pages + state.staticFiles = out.staticFiles + state.site.config = out.config + state.jsonCache = out.jsonCache // new + for (const p of out.pages) state.pageByDest.set(p.destPath, p) + } + ``` + +3. **`Scheduler.dispatchRender()` passes the cache to + `packChunkData`.** + + ``` + dispatchRender(chunks, sharedSAB) { + ... + const chunkDataSAB = packChunkData(chunks, this._views, + this.state.jsonCache) + ... + } + ``` + +4. **`packChunkData` concatenates instead of serializing.** + + ``` + export function packChunkData(chunks, views, jsonCache) { + const buffers = jsonCache + ? chunks.map(chunk => + encoder.encode("[" + chunk.map(p => jsonCache.get(p)).join(",") + "]")) + : chunks.map(c => encoder.encode(JSON.stringify(c))) + ... // remainder unchanged: allocate SAB, copy buffers, write offsets + } + ``` + + The fallback path (no cache) is retained so any caller that omits + the argument gets the original behavior. + +#### Why `Promise.all` still works + +The current `discover` fires all reads at once via +`Promise.all(allFiles.map(async ...))`. Adding `JSON.stringify` +inside each callback does not reduce I/O concurrency --- all reads +are already dispatched to libuv before any callback runs. As I/O +completions arrive, the event loop runs each callback synchronously +(parse frontmatter, build page, serialize). Each ~47 μs serialize +is invisible between I/O completions; libuv workers keep reading +disk in the background. + +#### Memory + +The cache holds ~858 JSON strings averaging ~4 KB each --- ~3.4 MB +total. Negligible. The cache is dropped when the build state is +GC'd (after each build, or after each rebuild in serve mode). + +#### Implementation steps + +**Files:** `builder/discover.mjs`, `builder/tbdocs.mjs` (the +`discover` task definition and `SharedState`), `builder/scheduler.mjs` +(`Scheduler` class), `builder/sab-scheduler.mjs` (`packChunkData`). + +1. Add a `jsonCache` field to `SharedState` in + `builder/scheduler.mjs` (initialized to `null`). + +2. In `discover()` (`builder/discover.mjs`), create a `new Map()`, + populate it inside the `Promise.all` callback after + `buildPage()`, and include it in the return value. + +3. In the `discover` task definition in `builder/tbdocs.mjs`, update + `execute()` to destructure `jsonCache` from `discover()`'s + return value and pass it through in the output object. Update + `submit()` to store `state.jsonCache = out.jsonCache`. + +4. Add a `jsonCache` parameter to `packChunkData` in + `builder/sab-scheduler.mjs`. When present, use string + concatenation (`"[" + ... + "]"`); otherwise fall back to + `JSON.stringify`. + +5. In `Scheduler.dispatchRender()` (`builder/scheduler.mjs`), pass + `this.state.jsonCache` to `packChunkData`. + +6. Add a debug assertion (gated on `process.env.TBDOCS_DEBUG`) that + compares each concatenated chunk JSON with + `JSON.stringify(chunk)` to catch any unexpected page mutation + between discover and dispatch. + +**Verification:** `build.bat && check.bat` clean. The dispatch-to- +first-render gap in the Gantt shrinks from ~40 ms to <5 ms. +Discover's wall-clock time does not increase meaningfully (~1--3 ms). +Rendered output is byte-identical to the pre-change build. Run once +with `TBDOCS_DEBUG=1` to exercise the assertion. + +### Phase 12: Per-worker page flush + +**Suggested model:** Opus. + +**Motivation.** Today the entire `writePages` pass --- ~1,080 files to +disk --- waits behind `renderJoin`, then runs on the main thread. The +per-page I/O is embarrassingly parallel and the data is already in +worker memory after `render`. This phase moves page writes into +workers, overlapping I/O with the render tail and eliminating the +`html` / `offlineHtml` fields from the render delta (the two largest +structured-clone payloads per chunk). + +**Design.** Three changes: + +1. **Page stash.** Each worker keeps a module-scope array + (`_pageStash = []`) initialized empty at startup. The `render` + handler appends `{ destPath, html, offlineHtml }` for each rendered + page to the stash instead of returning those fields in the delta. + The render delta shrinks to `{ destPath, renderedContent, + offlineMisses }`. + +2. **`flushPages` task.** A new `unique_per_worker` + `on_demand` task. + When activated (by `prepPageDirs` completing on main), it becomes + eligible in the idle-task scan. The handler writes every stashed + page to `_site/` and `_site-offline/`, clears the stash, and returns + write stats. A `flushJoin` barrier on main collects all per-worker + flush completions. + +3. **Priority-ordered idle scan.** The current `F_RUN_WHEN_IDLE` + boolean becomes a numeric priority (`idle_priority`). + `findIdleTask` picks the eligible task with the lowest + (= highest-priority) value. Assignment: `warmInit` = 0 (run first + --- Shiki must load before rendering), `flushPages` = 1 (run after + render drains). In practice they never compete (`warmInit` finishes + long before `flushPages` becomes eligible), but the priority makes + the ordering explicit and defensive. + + Implementation: store `idle_priority` in `taskMeta` (the JS-side + per-task metadata already sent to workers at init), not in the SAB + layout. `findIdleTask` keeps the flag-bit scan + (`F_RUN_WHEN_IDLE`) to identify idle-eligible tasks, then among + eligible candidates picks the one with the lowest + `taskMeta[i].idlePriority`. With only 2--3 idle tasks the extra + comparison is negligible. + +**Edge case: worker with zero render chunks.** Under high worker +counts (or small page sets), one or more workers may never claim a +render task. Their stash stays at the initialized-empty `[]`. +`flushPages` on such a worker writes zero pages and returns +`{ written: 0, offlineWritten: 0 }`. This is safe --- `flushJoin` +counts it as done. Guard: `_pageStash` must be preset to `[]` both +at module scope and in the `msg.init` handler (serve-mode pool reuse +across rebuilds). + +**Graph changes.** `renderJoin` is removed entirely. All downstream +tasks that depended on it switch to `flushJoin`: + +``` +render:i [W] (stashes pages locally; delta carries renderedContent + offlineMisses only) + render:i.submit() merges renderedContent into state.pages on main + +prepPageDirs [M] → activates flushPages ON_DEMAND slots + +flushPages [W, unique_per_worker, on_demand, idle_priority: 1] + writes stashed html → _site/ + writes stashed offlineHtml → _site-offline/ + → flushJoin [M] + +flushJoin + scssJoin + mermaid + highlighterInit → writeAssets [M] + (copyTheme + copyStaticFiles + writeGeneratedAssets --- no page writes) + +flushJoin + prepDest → searchData [M] + +flushJoin + searchData + deriveRedirects + deriveSitemap → writeAux [M] + +writeAux + writeAssets → writeOffline [M] + (offline theme / static / aux only --- page HTML already on disk from flush) + +flushJoin + mermaid → writePdf [M] + (reads renderedContent from state.pages --- not html) +``` + +**Why `flushJoin` subsumes `renderJoin`.** Messages from a single +worker to main are FIFO. Each worker posts all `render:i` completion +messages (triggering `render:i.submit()` delta merges on main) before +posting its `flushPages` completion. By the time main processes the +last worker's flush-done --- which is when `flushJoin` fires --- every +`render:i.submit()` has already executed. So `flushJoin` implies all +render deltas are merged, and `searchData` / `writePdf` can safely +read `renderedContent` from `state.pages`. + +**Implementation details.** + +- **Stash initialization.** `let _pageStash = []` at module scope in + `cpu-worker.mjs`. Reset to `[]` in the `msg.init` handler (for + serve-mode pool reuse across rebuilds). + +- **Render handler change.** After `renderPhase` + `templatePhase` + + offline derivation, the handler pushes `{ destPath, html, + offlineHtml }` onto `_pageStash` for each writable page + (`html !== undefined`). The return value drops `html` and + `offlineHtml`: + ```js + return { + workerStart, workerEnd, + pages: chunk.map(p => ({ + destPath: p.destPath, + renderedContent: p.renderedContent, + offlineMisses: p.offlineMisses, + })), + }; + ``` + +- **`flushPages` handler.** Reads `ctx.destRoot` (already available on + the worker via the init message). Writes each stashed page to + `path.join(destRoot, p.destPath)` and, when `offlineHtml` is + defined, to `path.join(destRoot + '-offline', p.destPath)`. Skips + actual writes when `ctx.opts.dryRun` is true. Returns + `{ written, offlineWritten, offlineMisses }` (`offlineMisses` is + the sum of per-page `offlineMisses` counts from the stash). + + **Stats delivery.** The `perWorkerTiming` message format gains an + optional `output` field. The `flushPages` completion path in + `cpu-worker.mjs` sets it to the handler's return value so the stats + reach main alongside the timing. Existing per-worker tasks + (`warmInit`, `renderEnvInit`) omit the field; the main-thread + handler ignores it when absent. + +- **`flushPages` task definition.** + ```js + flushPages: { + expected: ["prepPageDirs"], + on_demand: true, + unique_per_worker: true, + run_when_idle: true, + idle_priority: 1, + handler: "flushPages", + submit() {}, + }, + ``` +- **`flushJoin` task definition and activation.** `flushJoin` is + `on_demand` + `runOnMain`. It does not participate in the normal + SAB successor system (because `flushPages` is `unique_per_worker`, + which uses `perWorkerDone` + `perWorkerTiming` --- not the regular + task-completion path that decrements successor dep counts). + + Instead, **counter-based activation in `_onPerWorkerTiming`:** + the scheduler keeps a `_flushCount` counter (initialized to 0) and + a `_flushStats` accumulator. When `_onPerWorkerTiming` receives a + message with `taskName === "flushPages"`, it increments the counter + and folds the message's `output` into the accumulator (summing + `written`, `offlineWritten`, `offlineMisses`). When the counter + reaches `workerCount`, it: + + 1. Stores the aggregated stats on `this.results` under + `"flushPages"` so downstream tasks can read them. + 2. Calls `addDynamicTasks(1)` (so `_remaining` includes + `flushJoin`). + 3. Sets `flushJoin`'s SAB status to `READY`. + 4. Calls `_scheduleMainScan()`. + + `flushJoin` then runs as a no-op barrier; its `submit()` is empty + (downstream tasks declare `"flushJoin"` in their `expected` arrays + and the scheduler's `_assembleInputs` resolves it from + `this.results`). + + ```js + flushJoin: { + expected: [], + on_demand: true, + runOnMain: true, + execute() { return {}; }, + submit() {}, + }, + ``` + + Reset `_flushCount` and `_flushStats` in the constructor (and on + each rebuild in serve mode). + +- **`render:i.submit()` change.** Drops the `html` and `offlineHtml` + assignments from the delta merge. Only merges `renderedContent` and + `offlineMisses`. + +- **`prepPageDirs` extension.** Currently creates directories under + `destRoot` only. Extended to also create the corresponding + directories under `destRoot + '-offline'` so `flushPages` workers + can write without per-file mkdir. + +- **`write` → `writeAssets`.** The current `write` task is renamed. + Its `expected` changes from + `["renderJoin", "scssJoin", "mermaid", "prepPageDirs", + "highlighterInit"]` to + `["flushJoin", "scssJoin", "mermaid", "prepPageDirs", + "highlighterInit"]`. It no longer calls `writePages` --- only + `copyTheme`, `copyStaticFiles`, and `writeGeneratedAssets`. + +- **`writeOffline` change.** The `writeOfflinePages` call is removed + from `writeOffline`'s `Promise.all` orchestration. With + `offlineHtml` no longer merged back into `state.pages` (it stays on + the worker), the `precomputed` branch at `offline.mjs:235` would + filter to zero pages --- the offline page HTML is already on disk + from the flush. `writeOffline` keeps: JS patches + (`just-the-docs.js`), `search-data.js` wrapper, redirect-stub + rewrites, theme-asset copy, static-file copy. + + The `deps.counters.html` and `deps.counters.unresolved` tallies + that `writeOfflinePages` used to maintain move to `flushPages` + return stats, aggregated on main via `flushJoin`. + + `writeOffline` gains a direct dependency on `writeAssets` (in + addition to `writeAux`): it reads `_site/assets/js/just-the-docs.js` + to produce the patched offline copy, and walks `_site/assets/` to + mirror theme files with CSS URL rewrites. Today this is covered + transitively through `writeAux → write`; with `write` split into + `flushPages` + `writeAssets`, the edge must be explicit. + +- **`searchData`, `writePdf` dependency change.** Both switch from + `renderJoin` to `flushJoin` in their `expected` arrays. No other + changes --- `searchData` reads `renderedContent` from + `state.pages`; `writePdf` reads `renderedContent` via + `bookData._chapters` refs. Neither reads `html`. + +- **`GANTT_SECTION` map** (`tbdocs.mjs`). Remove `write`, add + `writeAssets: "Write"`, `flushJoin: "Write"`. Per-worker + `flushPages` timings are recorded by `_onPerWorkerTiming` under + `"flushPages:wN"` with `ganttSection: "Write"` (update the + hard-coded `"Boot"` section in `_onPerWorkerTiming` to read from + taskMeta, or special-case `flushPages`). + +- **Summary output** (`tbdocs.mjs`). The build summary currently + reports write stats from the `write` task result. With the split, + page-write stats come from the aggregated `flushPages` result + (stored under `"flushPages"` in `this.results` by the counter + activation), and asset-write stats come from `writeAssets`. + +**Files touched:** + +| File | Changes | +|---|---| +| `cpu-worker.mjs` | `_pageStash` module var + reset in `msg.init`; render handler: push to stash, drop `html`/`offlineHtml` from return; new `flushPages` handler; `perWorkerTiming` message gains `output` field for flush stats | +| `tbdocs.mjs` | New `flushPages` + `flushJoin` task defs; rename `write` → `writeAssets` and strip `writePages` call; update `expected` arrays (`searchData`, `writePdf`, `writeOffline`); `GANTT_SECTION` map; summary output | +| `scheduler.mjs` | `_flushCount` / `_flushStats` counter; `_onPerWorkerTiming` extension for flush activation + stats aggregation; Gantt section for per-worker flush timings | +| `sab-scheduler.mjs` | `idlePriority` in `taskMeta` wire-up (minor --- already passed to workers, just needs to be populated from the task def); `flushJoin` SAB slot allocation | +| `write.mjs` | `preparePageDirs` extended to create dirs under `destRoot + '-offline'` | +| `offline.mjs` | Remove `writeOfflinePages` call from `writeOffline`'s `Promise.all`; adjust counter reporting | + +**Expected savings.** + +Three sources: +1. **Wall-clock overlap.** Pages start hitting disk as soon as a worker + exhausts its render tasks, instead of waiting for `renderJoin` + + main-thread `writePages`. The overlap between the render tail and + the first flush is the direct win. +2. **Reduced structured-clone cost.** `html` (~5--15 KB per page) and + `offlineHtml` (similar size) no longer cross the worker boundary. + On ~1,080 pages across ~16 chunks, this drops the total return-path + clone volume substantially. +3. **Decoupled `writeAssets`.** Theme and static-file copies no longer + wait for rendered pages. They start as soon as `flushJoin` + their + seed deps are ready. + +Conservative estimate: 50--100 ms wall-clock on a 16-core machine. +The main value is architectural --- the write pipeline is no longer a +main-thread bottleneck gated on the render barrier. + +**Verification.** `build.bat && check.bat` clean. The timing summary +should show: +- Per-worker `flushPages` timings appearing after the last `render:i` + per worker, with earlier workers' flushes overlapping later workers' + render tails. +- `writeAssets` replacing `write` in the Write section, with a shorter + duration (no `writePages`). +- `writeOffline` duration dropping (no per-page HTML writing). +- `renderJoin` absent from the summary. + +### Phase 13: Uniform task timing (t0 / t1 / t3) — DONE + +**Suggested model:** Opus. + +**Motivation.** The Gantt chart shows a gap between `dispatch` ending +and the first `renderEnvInit` starting. Investigation reveals the +gap is real but uncharted: `dispatch.submit()` runs *after* the +execute timing window closes (t1), and it does substantial work --- +`packChunkData`, `broadcastRenderData`, `activateRenderTasks` --- that +is invisible in the timeline. The same blind spot exists for every +main-thread task: `submit()` is never timed. + +A secondary problem: every worker handler (`scssLight`, `scssDark`, +`mermaid`, `buildInfo`, `render`) redundantly captures its own +`workerStart` / `workerEnd` timestamps, near-identical to the pull +loop's `start` / `end` that already wrap the same call. The timing +should live in one place --- the runner (pull loop on the worker side, +`_executeMainTask` / `_onWorkerDone` on main) --- with handlers +unaware of timing. + +**Design.** Two boundary timestamps per task, with a third on +main-thread tasks only: + +| Timestamp | Main-thread tasks | Worker tasks | +|-----------|-------------------|--------------| +| t0 | before `execute()` | before `handler()` | +| t1 | after `execute()` | after `handler()` | +| *(t2)* | *(reserved, unused)* | *(reserved, unused)* | +| t3 | after `submit()` | *(not captured --- see below)* | + +t2 is reserved for a future split (e.g. timing `results.set()` +separately) but not captured now. + +**Why t3 is main-thread only.** For main-thread tasks, `submit()` +runs between t1 and `sabOnTaskDone` --- it gates successor activation, +so its cost is on the critical path. For worker tasks, the worker +itself calls `onTaskDone` (SAB update) *before* posting the result +message; `submit()` runs later on the main thread when +`_onWorkerDone` processes the message, off the critical path. +Worker-side `postMessage` cost (structured-clone serialization) +cannot be included in the message it is measuring; the gap between +t1 and the main thread's receipt time serves as a proxy if needed. + +The `workerStart` / `workerEnd` fields on handler return values are +removed. Handlers no longer capture timing; the runner does it +uniformly. + +#### Changes to `scheduler.mjs` + +1. **`_executeMainTask`.** Move the timing-entry construction below + `def.submit()` and capture t3 after it. Currently the timing + object is built between t1 and `results.set`; it must move so that + t3 is available. Preserve the existing `consolidate` / + `ganttSection` / `lane` properties on the timing entry: + + ```js + const t0 = Date.now(); + output = await def.execute(inputs, this._ctx, this.state); + const t1 = Date.now(); + + this.results.set(name, output); + def.submit(output, this.state, this); + const t3 = Date.now(); + + const timing = { start: t0, end: t1, t3 }; + // keep consolidate, ganttSection, lane as before + if (def.consolidate) timing.consolidate = true; + if (def.ganttSection) timing.ganttSection = def.ganttSection; + this.timings.set(name, timing); + ``` + + The existing `start` / `end` semantics are preserved (t0 / t1) for + backwards compatibility with the summary and Gantt. `t3` is a new + optional field. + +2. **`_onWorkerDone`.** Drop the `output.workerStart` / + `output.workerEnd` extraction. Populate `workerStart` / + `workerEnd` from the timing message (t0 / t1): + + ```js + const t = { start: timing.start, end: timing.end }; + if (lane != null) { + t.workerStart = timing.start; + t.workerEnd = timing.end; + t.lane = lane; + } + ``` + + No t3 --- worker `submit()` is off the critical path. + +3. **`_onPerWorkerTiming`.** Per-worker tasks (`warmInit`, + `renderEnvInit`, `flush`) arrive via this path. The runner + sends `{ start, end }` (t0 / t1). No change to the timing + fields stored --- `workerStart` / `workerEnd` are already + populated from `timing.start` / `timing.end`: + + ```js + _onPerWorkerTiming({ taskName, timing, lane, output }) { + this.timings.set(`${taskName}:w${lane}`, { + start: timing.start, end: timing.end, + workerStart: timing.start, workerEnd: timing.end, + lane, + consolidate: true, + ganttSection: this._ganttSections[taskName] ?? "Boot", + }); + ``` + + This is unchanged from the current code. + +4. **Drop `workerStart` / `workerEnd` extraction from output.** The + lines in `_executeMainTask` and `_onWorkerDone` that read + `output.workerStart` / `output.workerEnd` are deleted. The runner + provides these timestamps; handlers no longer carry them. + +#### Changes to `cpu-worker.mjs` + +1. **Pull loop --- regular task path** (~line 427). Capture t0 / t1 + as explicit variables. The timing object sent to main carries + `{ start, end }` = t0 / t1: + + ```js + const t0 = Date.now(); + result = await handler(taskIdx); + const t1 = Date.now(); + + parentPort.postMessage({ + done: taskIdx, + output: result, + timing: { start: t0, end: t1 }, + lane: myLane, + }); + ``` + + This is the same data the current code sends (it evaluates + `Date.now()` inside the postMessage args); the only change is + naming the variable `t1` before the call instead of inlining it. + +2. **Pull loop --- per-worker dep and idle paths.** Three distinct + `perWorkerTiming` send sites need the same t0 / t1 treatment: + + a. **Idle-task completion** (~line 281). Currently: + `timing: { start: idleStart, end: Date.now() }`. + Change to capture t1 before the postMessage: + ```js + const t0 = Date.now(); + idleResult = await handlers[idleMeta.handler](); + const t1 = Date.now(); + Atomics.store(views.perWorkerDone, idleTask * MAX_LANES + myLane, 1); + parentPort.postMessage({ + perWorkerTiming: true, + taskName: idleMeta.name, + timing: { start: t0, end: t1 }, + lane: myLane, + output: idleResult, + }); + ``` + + b. **Nested per-worker dep completion** (~line 347). Currently: + `timing: { start: nestedStart, end: Date.now() }`. + Same pattern --- capture t1, send `{ start: t0, end: t1 }`. + + c. **Direct per-worker dep completion** (~line 393). Currently: + `timing: { start: depStart, end: Date.now() }`. + Same pattern. + +3. **Remove `workerStart` / `workerEnd` from handlers.** Delete the + `workerStart = Date.now()` / `workerEnd: Date.now()` boilerplate + from: `scssLight`, `scssDark`, `mermaid`, `buildInfo`, `render`. + Each handler returns only its domain data (e.g. `{ scssLightResult }`, + `{ buildInfo }`, `{ pages: [...] }`). + +#### Changes to `gantt.mjs` + +1. **Worker lane bars.** Currently uses `t.workerStart` / + `t.workerEnd`. After Phase 13, these are populated from the + runner's t0 / t1 in `_onWorkerDone` and `_onPerWorkerTiming` + (see scheduler changes above), so the Gantt renderer needs no + change for basic rendering. + +2. **Submit / dispatch overlay.** When `t3` is present on a timing + entry, render a half-height rect from `end` to `t3`, aligned to + the bottom of the bar, using the same fill class as the main bar. + This makes the `dispatch.submit()` cost visible in the Gantt --- + the gap that motivated this phase. Only main-thread tasks carry + `t3`, so worker lane bars are unaffected. + +#### Changes to `groupGanttTimings` (`tbdocs.mjs`) + +Pass through the `t3` field when present: + +```js +const entry = { id, start: start - t0, end: end - t0 }; +if (t3 != null) entry.t3 = t3 - t0; +``` + +The destructuring on line ~662 gains `t3`. + +#### Implementation steps + +1. Remove `workerStart` / `workerEnd` from the five worker handlers + (`scssLight`, `scssDark`, `mermaid`, `buildInfo`, `render`). + +2. Update the pull loop's regular-task path (~line 427) to capture + t0 / t1 as named variables. Send `{ start: t0, end: t1 }` in + the timing object. + +3. Update all three per-worker timing send sites in the pull loop: + idle-task completion (~line 281), nested per-worker dep completion + (~line 347), direct per-worker dep completion (~line 393). Each + gets the same t0 / t1 pattern. + +4. In `_executeMainTask`, capture t3 after `submit()`. Store it on + the timing entry. + +5. In `_onWorkerDone`, stop reading `output.workerStart` / + `output.workerEnd`. Populate `workerStart` / `workerEnd` from + the timing message's `start` / `end`. + +6. In `groupGanttTimings`, pass through t3. + +7. In `gantt.mjs`, render the `end`--`t3` overlay rect on + main-thread task bars when `t3` is present. + +**Files touched:** + +| File | Changes | +|---|---| +| `cpu-worker.mjs` | Remove `workerStart` / `workerEnd` from 5 handlers; name t0 / t1 in pull loop regular-task path; same pattern in all 3 per-worker timing send sites | +| `scheduler.mjs` | `_executeMainTask`: capture t3 after submit; `_onWorkerDone`: drop `output.workerStart` extraction, populate from timing message; `_onPerWorkerTiming`: no change | +| `tbdocs.mjs` | `groupGanttTimings`: pass through t3 | +| `gantt.mjs` | Render end--t3 overlay rect on main-thread task bars | + +**Verification.** `build.bat && check.bat` clean. The timing summary +is unchanged (it reads `start` / `end`, which remain t0 / t1). The +Gantt chart shows a visible submit-phase tail on main-thread task +bars --- most notably on `dispatch`, where the end--t3 overlay accounts +for the previously invisible gap before `renderEnvInit`. + +## Notify protocol + +Workers sleep on a single generation-counter slot (`views.notify`) +rather than per-task status slots. The protocol: + +``` +// Worker (after scanAndClaim returns -1): +gen = Atomics.load(views.notify, 0) +taskIdx = scanAndClaim(views, myLane) // double-check: race window +if taskIdx === -1: + Atomics.wait(views.notify, 0, gen, 50) // sleep until gen changes, 50ms fallback + +// Any thread making N worker tasks READY (in onTaskDone): +Atomics.add(views.notify, 0, 1) // bump generation +Atomics.notify(views.notify, 0, readyCount) // wake exactly readyCount workers + +// dispatch seeding N render chunks: +Atomics.add(views.notify, 0, 1) +Atomics.notify(views.notify, 0, Infinity) // wake all workers +``` + +The double-check in the worker's sleep path prevents a race: a task +could become READY between the failed scan and the `Atomics.load` of +the generation counter. Without the double-check, the worker would +sleep with stale `gen` and miss the notification. + +The 50 ms timeout is a safety net for edge cases where a notification +is lost (e.g. the bump and notify happen between the worker's +`Atomics.load` and `Atomics.wait`, and no other notification follows). +50 ms is long enough to avoid busy-spinning but short enough to not +stall a build. + +**Exception: on-demand main-thread deps.** When a worker is waiting +for a specific on-demand main-thread task to complete, it waits on +that task's **status slot** (`Atomics.wait(views.status, depIdx, +READY)`) --- not the generation counter. This is targeted: the worker +knows exactly what it's waiting for, and wakes as soon as the main +thread sets the task to DONE and notifies the slot. Before waiting, +the worker checks if other non-dependent work is available (one scan); +if so, it does that work instead of sleeping. + +### Phase 14: Move `writePdf` to a worker --- DEFERRED + +**Motivation.** `writePdf` depends on `flushJoin` + `mermaid` + +`resolveBookChapters` --- no data dependency on the offline pipeline +(`searchData` → `writeAux` → `writeOffline`). But both `writePdf` and +the offline pipeline are `runOnMain`, so they serialize on the main +thread. On a machine where `flushJoin` lands at ~1.2 s, the ~150 ms +`writePdf` cost is 12 % of the build. + +**Investigation.** Splitting `writePdf` into two main-thread tasks +(`assemblePdf` + `writePdfFiles`) to measure the compute-vs-I/O +breakdown showed: + +``` +assemblePdf=160ms writePdfFiles=30ms +``` + +The compute half (`assembleBook`: chapter walking, body transforms, +href rewriting, html-compress) is ~84 % of the cost. The file-write +half (one `book.html` + 2 CSS files + ~100 images) is only ~30 ms. + +**Consequence.** Moving just the file writes to a worker saves ~30 ms +--- not enough to justify the SAB broadcast plumbing. The real win +requires moving `assembleBook` itself off main. Two blockers prevent +that: + +1. **Live page-object references.** `resolveBookChapters` stores page + objects in `bookData._chapters[]`, `_foreword`, `_landing`. These + are identity-linked to `state.pages` entries where `renderedContent` + was merged after render. Structured clone to a worker breaks the + identity link. + +2. **`site.markdown` dependency.** `assembleBook` → + `renderPartDivider` calls `site.markdown.render()` for part + subtitles and intros. The markdown-it instance is not serializable. + +**Paths forward (not yet committed to):** + +- **Index-based chapter references.** `resolveBookChapters` stores + permalink strings instead of page objects; `assembleBook` builds a + `Map` at the start and resolves refs through it. + Removes blocker 1. + +- **Pre-render book text.** Pre-render subtitles/intros during + `resolveBookChapters` (which runs after `markdownInit`), storing the + HTML on `bookData` entries. `renderPartDivider` reads the + pre-rendered strings instead of calling `site.markdown`. Removes + blocker 2. + +- **Full worker migration.** With both blockers removed, the entire + `writePdf` (compute + I/O) can run on a worker via SAB broadcast of + a page projection (~10 MB: all pages' `permalink`, `navPath`, + `renderedContent`, `frontmatter` subset). Packing cost ~30--50 ms; + net main-thread savings ~100--120 ms. + +Deferred: the refactoring cost is significant for a ~120 ms saving on +a 4 s build. Revisit if the build wall-clock shrinks enough that the +PDF task becomes a larger fraction. + +### Phase 15: Generic dynamic tasks and per-chunk flush --- DONE + +**Suggested model:** Opus. + +**Outcome.** Landed as designed. `build.bat` runs all worker lanes in +parallel through both render and flush. `check.bat` reports zero +intra-site issues (only the 8 pre-existing PDF broken links remain). +Two divergences from the design surfaced during implementation; both +are folded into the description below. + +1. **`flush:i` is gated on `prepPageDirs` as well as `render:i`** (so + `setDepCount(views, flushBase + i, 2)`, not 1). The design's + `depCount = 1` trusted that `prepPageDirs` would always finish + before any render chunk did. On Windows, `mkdir` over ~100 nested + subdirectories takes longer than the first render chunk and the + first `flush:i` `ENOENT`s on a missing output directory. A new + `appendDynamicSuccessors(views, edges)` primitive in + `sab-scheduler.mjs` extends `prepPageDirs`'s successor list with + `flush:0..N-1` without overwriting its static `writeAssets` edge + (it relocates the existing successors to the end of `succList`, + appends the new ones, and updates `succOffset` / `succCount`; the + old slots become dead space, ~4 bytes each). + +2. **`affinityLane`, `pinnedTo`, and `completedOnLane` are pre-filled + to `-1` for the whole array, not just static slots.** + `SharedArrayBuffer` is zero-initialized, so a dynamic slot's + `affinityLane[idx] === 0` made `scanAndClaim`'s + `aff !== -1 && aff !== myLane` filter treat every dynamic task as + pinned to lane 0. Only `w0` ever claimed work; the other 15 + workers sat idle through the entire render fan-out (build still + "succeeded" because `w0` ran all 160 chunks sequentially --- ~4 s + instead of ~250 ms on the render bar). The design only specified + pre-filling the metadata arrays (`handlerIdx`, `perWorkerDep`, + `expectedDep`); the three identity-coordinates need the same + treatment. + +**Motivation.** Three coupled problems: + +1. **Flush clustering.** All `render:i` tasks become READY + simultaneously after `dispatch`. Workers pull them greedily via + `scanAndClaim`, which claims the first READY task it finds. The + per-worker `flush` task is `run_when_idle` --- it only fires when + `scanAndClaim` returns -1, i.e. every render task is already + CLAIMED or DONE. With `SLICES_PER_WORKER = 10` and 8--16 + workers, that means 80--160 render tasks drain before any worker + goes idle. All workers finish within moments of each other, so + all flushes cluster at the tail --- the I/O burst that Phase 12 + was meant to spread. + +2. **Special-cased join barriers.** `renderJoin` and `flushJoin` use + hand-rolled counters in `_onWorkerDone` and `_onPerWorkerTiming` + with manual `Atomics.store(status, joinIdx, READY)`. Each new + fan-out pattern requires a new counter, a new name-matching branch, + and a new stats accumulator. The SAB already has a general + dep-count mechanism (`onTaskDone` decrements successor dep counts + and sets READY at zero); the joins should use it. + +3. **Render-specific infrastructure.** Three layers of the scheduler + know about `render` by name: + + - `allocSchedulerSAB` pre-reserves named `render:${i}` slots and + hardcodes their `taskMeta` (handler `"render"`, `perWorkerDeps` + `[renderEnvInitIdx]`). + - `scheduler.mjs` name-matches `startsWith("render:")` in + `_onWorkerDone` and `taskName === "flush"` in + `_onPerWorkerTiming`. + - `cpu-worker.mjs` computes `taskIdx - idMapping.DYNAMIC_BASE` to + index into render-specific `chunkOffset` / `chunkLength` SAB + arrays. + + The scheduler should know about *tasks* (static or dynamic) and + their metadata (handler, deps, successors, priority) --- not about + what any specific task does. + +This phase solves all three by introducing a generic dynamic task pool, +SAB-based task metadata, a generic payload mechanism, and per-task +priority --- and uses them to replace the current `render:i` / +`flush` / `renderJoin` / `flushJoin` infrastructure with per-chunk +`flush:i` tasks and SAB dep-count-gated barriers. + +#### Design overview + +Four pillars: + +1. **SAB-based task metadata.** The `taskMeta` JS array (sent once + to workers via `workerData`, frozen at worker creation) is replaced + by SAB arrays that the main thread can write at any time and + workers read atomically at claim time. A pair of functions --- + `writeTaskMeta` / `readTaskMeta` --- encapsulates the layout so + call sites never touch raw offsets. `taskMeta` is deleted. + +2. **Generic dynamic pool.** `allocSchedulerSAB` no longer + pre-reserves named slots for any specific task type. Static tasks + get indices `0..S-1`. Slots `S..MAX_TASKS-1` are a blank pool. + Any task's `submit()` can allocate slots from it at runtime via + `allocDynamicSlots`. + +3. **Generic payload.** The render-specific `chunkOffset` / + `chunkLength` SAB arrays are replaced by `payloadOffset[MAX_TASKS]` + / `payloadLength[MAX_TASKS]`, indexed by `taskIdx` directly. A + `packPayloads` function packs data into a `SharedArrayBuffer` and + writes the per-task offsets. Any dynamic task handler can read its + payload; handlers with no payload (like `flush`) ignore the + zero-length entry. + +4. **Per-task priority + per-chunk flush.** An Int32 `priority` field + in the SAB makes `scanAndClaim` prefer higher-priority tasks. + The single `unique_per_worker` `flush` task is replaced by N + dynamic `flush:i` tasks, each pinned to its predecessor `render:i` + and assigned `priority: 1` (above render's default 0). Both + `renderJoin` and `flushJoin` become normal dep-count-gated + barriers --- no counters, no name-matching. + +#### SAB layout changes + +**Bump constants.** + +| Constant | Old | New | Reason | +|---|---|---|---| +| `MAX_TASKS` | 256 | 512 | 2N dynamic tasks (render + flush) at 16 cores = 320; need headroom | +| `MAX_EDGES` | 512 | 2048 | 3N dynamic edges + ~37 static; worst case (64 lanes × 10 slices) = 1957 | +| `MAX_RENDER_CHUNKS` | 640 | *(deleted)* | Replaced by `payloadOffset` / `payloadLength` sized to `MAX_TASKS` | +| `SLICES_PER_WORKER` | 10 | 10 | Unchanged | + +**New arrays** (all Int32, alongside existing arrays): + +``` +handlerIdx [MAX_TASKS] // handler function ID; -1 = unassigned +perWorkerDep [MAX_TASKS * 2] // up to 2 per-worker dep indices; -1 = none +expectedDep [MAX_TASKS * 2] // up to 2 precondition pred indices; -1 = none +idlePriority [MAX_TASKS] // idle-task ordering; 0 = default +priority [MAX_TASKS] // scanAndClaim ordering; 0 = default, higher = first +payloadOffset [MAX_TASKS] // byte offset into payloadSAB for this task's data +payloadLength [MAX_TASKS] // byte length of this task's data; 0 = no payload +``` + +**Removed arrays:** `chunkOffset`, `chunkLength`. + +**Total SAB size.** With MAX_TASKS = 512, MAX_EDGES = 2048, +MAX_LANES = 64: + +``` +existing: taskCount(1) + depCount(512) + status(512) + flags(512) + + succOffset(512) + succCount(512) + succList(2048) + + affinityLane(512) + pinnedTo(512) + completedOnLane(512) + + perWorkerDone(512 × 64 = 32768) + edgeCount(1) + + notify(1) + firstReady(1) + buildDone(1) +new: handlerIdx(512) + perWorkerDep(1024) + expectedDep(1024) + + idlePriority(512) + priority(512) + + payloadOffset(512) + payloadLength(512) +total: ~42,000 Int32 slots × 4 = ~164 KB +``` + +Roughly double the current ~80 KB. Negligible for a build tool. + +#### Handler registry + +A shared integer mapping, defined once in `sab-scheduler.mjs`: + +```js +export const HANDLERS = { + warmInit: 0, renderEnvInit: 1, flush: 2, + scssLight: 3, scssDark: 4, mermaid: 5, + buildInfo: 6, render: 7, +}; +``` + +`allocSchedulerSAB` resolves `def.handler ?? name` through this table +when writing static task metadata. `registerDynamicTasks` (below) +receives the integer directly. + +Workers build the reverse table at init: + +```js +const handlerById = [ + handlers.warmInit, handlers.renderEnvInit, handlers.flush, + handlers.scssLight, handlers.scssDark, handlers.mermaid, + handlers.buildInfo, handlers.render, +]; +``` + +#### Task metadata API + +Two functions in `sab-scheduler.mjs`, encapsulating the SAB layout +for per-task metadata. Every caller --- `allocSchedulerSAB` for +static tasks, dynamic task registration for dynamic tasks, pull loop +and idle scan for reads --- goes through these. + +```js +export function writeTaskMeta(views, idx, { + handlerIdx, perWorkerDeps, expectedDeps, idlePriority, priority, +}) { + Atomics.store(views.handlerIdx, idx, handlerIdx); + Atomics.store(views.perWorkerDep, idx * 2, perWorkerDeps?.[0] ?? -1); + Atomics.store(views.perWorkerDep, idx * 2 + 1, perWorkerDeps?.[1] ?? -1); + Atomics.store(views.expectedDep, idx * 2, expectedDeps?.[0] ?? -1); + Atomics.store(views.expectedDep, idx * 2 + 1, expectedDeps?.[1] ?? -1); + Atomics.store(views.idlePriority, idx, idlePriority ?? 0); + Atomics.store(views.priority, idx, priority ?? 0); +} + +export function readTaskMeta(views, idx) { + const d0 = Atomics.load(views.perWorkerDep, idx * 2); + const d1 = Atomics.load(views.perWorkerDep, idx * 2 + 1); + const e0 = Atomics.load(views.expectedDep, idx * 2); + const e1 = Atomics.load(views.expectedDep, idx * 2 + 1); + return { + handlerIdx: Atomics.load(views.handlerIdx, idx), + perWorkerDeps: d1 !== -1 ? [d0, d1] : d0 !== -1 ? [d0] : [], + expectedDeps: e1 !== -1 ? [e0, e1] : e0 !== -1 ? [e0] : [], + idlePriority: Atomics.load(views.idlePriority, idx), + priority: Atomics.load(views.priority, idx), + }; +} +``` + +Static tasks: `allocSchedulerSAB` calls `writeTaskMeta` in its +existing per-task loop, replacing the `taskMeta[idx] = { ... }` +assignment. Name resolution (`HANDLERS[name]`, `nameToIdx.get(dep)`) +happens in the same loop as today. Main-thread tasks (`runOnMain`) +are skipped --- workers never read their metadata (`scanAndClaim` +skips `F_RUN_ON_MAIN` tasks), so writing `handlerIdx` is unnecessary. +Their SAB metadata slots stay at the initialized defaults +(`handlerIdx = -1`, deps = -1, priorities = 0). + +Dynamic tasks: `dispatch.submit()` calls `writeTaskMeta` for each +allocated slot before activation. + +Workers: the pull loop calls `readTaskMeta` after `scanAndClaim` +returns a task index. `findIdleTask` calls it per candidate. +On-demand dep execution calls it on the dep. + +#### Dynamic task API + +Five primitives in `sab-scheduler.mjs`. Together with `writeTaskMeta`, +they replace the pre-reservation loop, the `taskMeta` pre-fill loop, +`activateRenderTasks`, and `packChunkData`. + +**`allocDynamicSlots(views, idMapping, count)`** --- reserves `count` +contiguous slots from the dynamic pool. Returns the base index. +Advances `idMapping.nextDynamic` and updates `taskCount` in the SAB +so workers scan the new slots. Does not write metadata or edges. + +```js +export function allocDynamicSlots(views, idMapping, count) { + const base = idMapping.DYNAMIC_BASE + idMapping.nextDynamic; + if (base + count > MAX_TASKS) + throw new Error(`dynamic tasks exceed MAX_TASKS`); + idMapping.nextDynamic += count; + const newCount = base + count; + if (newCount > Atomics.load(views.taskCount, 0)) + Atomics.store(views.taskCount, 0, newCount); + return base; +} +``` + +**`wireDynamicEdges(views, edges)`** --- appends successor edges for +dynamic tasks to the global `succList`. Each entry in `edges` is +`{ from, to: [succIdx, ...] }`. Called once after all slots are +allocated and metadata written. Each `from` must have no prior +successors (`succCount === 0`); to extend a task that already has +static successors, use `appendDynamicSuccessors` below. + +```js +export function wireDynamicEdges(views, edges) { + let edgePos = Atomics.load(views.edgeCount, 0); + for (const { from, to } of edges) { + if (edgePos + to.length > MAX_EDGES) + throw new Error(`dynamic edges exceed MAX_EDGES`); + Atomics.store(views.succOffset, from, edgePos); + Atomics.store(views.succCount, from, to.length); + for (const s of to) views.succList[edgePos++] = s; + } + Atomics.store(views.edgeCount, 0, edgePos); +} +``` + +**`appendDynamicSuccessors(views, edges)`** --- extends a task's +successor list with new dynamic successors. Relocates the task's +existing successors to the end of `succList` (the old slots become +dead space) so the contiguous-range invariant +`succOffset[t]..succOffset[t]+succCount[t]` holds. Used when a +static task needs to fan out to dynamically-registered successors +--- specifically `prepPageDirs → flush:0..N-1` while preserving +`prepPageDirs → writeAssets`. + +```js +export function appendDynamicSuccessors(views, edges) { + let edgePos = Atomics.load(views.edgeCount, 0); + for (const { from, to } of edges) { + const oldOff = Atomics.load(views.succOffset, from); + const oldCnt = Atomics.load(views.succCount, from); + const total = oldCnt + to.length; + if (edgePos + total > MAX_EDGES) + throw new Error(`dynamic edges exceed MAX_EDGES`); + for (let i = 0; i < oldCnt; i++) + views.succList[edgePos + i] = views.succList[oldOff + i]; + for (let i = 0; i < to.length; i++) + views.succList[edgePos + oldCnt + i] = to[i]; + Atomics.store(views.succOffset, from, edgePos); + Atomics.store(views.succCount, from, total); + edgePos += total; + } + Atomics.store(views.edgeCount, 0, edgePos); +} +``` + +**`setDepCount(views, idx, count)`** --- sets a task's predecessor +count. Used for join barriers whose dep count is not known at +allocation time. + +```js +export function setDepCount(views, idx, count) { + Atomics.store(views.depCount, idx, count); +} +``` + +**`activateDynamicTasks(views, base, count)`** --- sets status to +READY for `count` tasks starting at `base`. Replaces +`activateRenderTasks`. Only activates tasks whose current `depCount` +is 0 (tasks with unsatisfied predecessors stay NOT_READY and are +activated later by `onTaskDone`). + +```js +export function activateDynamicTasks(views, base, count) { + let readyCount = 0; + for (let i = 0; i < count; i++) { + const idx = base + i; + if (Atomics.load(views.depCount, idx) === 0) { + Atomics.store(views.status, idx, READY); + readyCount++; + } + } + if (readyCount > 0) { + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, Infinity); + } +} +``` + +This is more general than `activateRenderTasks` (which assumed all +tasks have `depCount = 0`). `flush:i` tasks have `depCount = 1` +(gated on `render:i`), so they stay NOT_READY and are activated by +`onTaskDone` when their `render:i` completes. + +**`packPayloads(views, base, payloads)`** --- JSON-serializes each +payload, concatenates into one `SharedArrayBuffer`, and writes +per-task `payloadOffset` / `payloadLength` into the scheduling SAB. +Replaces `packChunkData`. + +```js +export function packPayloads(views, base, payloads) { + const buffers = payloads.map(p => encoder.encode(JSON.stringify(p))); + const totalBytes = buffers.reduce((sum, b) => sum + b.byteLength, 0); + const sab = new SharedArrayBuffer(totalBytes); + const full = new Uint8Array(sab); + let offset = 0; + for (let i = 0; i < buffers.length; i++) { + full.set(buffers[i], offset); + Atomics.store(views.payloadOffset, base + i, offset); + Atomics.store(views.payloadLength, base + i, buffers[i].byteLength); + offset += buffers[i].byteLength; + } + return sab; +} +``` + +Indexed by `base + i` (= `taskIdx`), not by a separate chunk index. +Tasks with no payload (flush, static tasks) have `payloadLength = 0`. + +#### Priority-aware `scanAndClaim` + +Replace the first-match scan with a best-match scan. `priority` is +immutable after task registration, so it can be read without +`Atomics` (plain array access). + +```js +export function scanAndClaim(views, myLane) { + const count = Atomics.load(views.taskCount, 0); + while (true) { + const start = Atomics.load(views.firstReady, 0); + let bestIdx = -1, bestPri = -1; // -1 is below valid range (0+); any real task wins + for (let i = start; i < count; i++) { + if (Atomics.load(views.status, i) !== READY) continue; + if (Atomics.load(views.flags, i) & F_RUN_ON_MAIN) continue; + const aff = Atomics.load(views.affinityLane, i); + if (aff !== -1 && aff !== myLane) continue; + const pri = Atomics.load(views.priority, i); + if (pri > bestPri) { bestPri = pri; bestIdx = i; } + } + if (bestIdx === -1) return -1; + if (Atomics.compareExchange(views.status, bestIdx, READY, CLAIMED) === READY) + return bestIdx; + // CAS lost: retry full scan. + } +} +``` + +**Cost.** The scan becomes O(taskCount - firstReady) instead of +O(first-READY), but with MAX_TASKS = 512 and mostly NOT_READY or +DONE slots, the difference is microseconds. The CAS retry +terminates quickly: the claimed task is no longer READY, so the next +scan finds the second-best. + +**Priority assignment.** `render:i` tasks get priority 0 (default). +`flush:i` tasks get priority 1. When both a `flush:i` and a +`render:j` are READY on the same worker, the flush runs first --- +clearing the stash before the next render fills it. + +#### Per-chunk `flush:i` design + +Replace the single `unique_per_worker` `flush` task with N dynamic +`flush:i` tasks, one per render chunk: + +- `depCount = 1`, gated on `render:i` --- becomes READY when + `render:i` completes. +- `pin_to_predecessor = render:i` --- runs on the worker that + rendered the chunk, so the stashed HTML is local. +- `priority = 1` --- picked before any render task (priority 0) + when both are READY on the same worker. +- Successor edge to `flushJoin` --- the join fires when all N + flush tasks complete. + +**FIFO stash invariant.** The worker-local `_pageStash` flat array +is replaced by a FIFO queue `_pendingFlush`. The `render` handler +pushes one batch; the `flush` handler shifts one batch. The priority +mechanism guarantees the queue depth is always exactly 1 when `flush` +runs and exactly 0 afterward. + +Proof: the pull loop is sequential --- a worker awaits one handler at +a time. After `render:i` completes on worker W, `onTaskDone` runs +synchronously, decrementing `flush:i`'s dep count to 0 and setting it +READY with affinity pinned to W. Worker W then calls `scanAndClaim`. +`flush:i` (priority 1, pinned to W) is preferred over any remaining +`render:j` (priority 0). So the next task W executes is `flush:i`, +which shifts the single queued batch. No second render can +interleave. + +#### Graph changes + +``` +render:i [W] (stashes pages locally; delta carries renderedContent + offlineMisses only) + render:i.submit() merges renderedContent into state.pages on main + | + |--- [successor edge] --→ renderJoin [M] (pure barrier, no-op execute) + | + +--- [successor edge] --→ flush:i [W, pin_to_predecessor, priority: 1] + writes stashed html → _site/ + writes stashed offlineHtml → _site-offline/ + | + +--- [successor edge] --→ flushJoin [M] + aggregates write stats from all flush:i results + +renderJoin + prepDest → searchData [M] + +flushJoin + mermaid + prepPageDirs + highlighterInit → writeAssets [M] + +flushJoin + searchData + deriveRedirects + deriveSitemap → writeAux [M] + +writeAux + writeAssets → writeOffline [M] + +flushJoin + mermaid + resolveBookChapters → writePdf [M] +``` + +`searchData` depends on `renderJoin` (not `flushJoin`): it needs +`renderedContent` in memory, which requires all `render:i.submit()` +calls to have run. `renderJoin` provides that guarantee --- it +becomes READY only after all `render:i` are DONE, and by that point +the main thread has processed every `render:i` result message (FIFO +property of worker-to-main postMessage: each worker's render-done +messages precede its flush-done messages, and `_onWorkerDone` +processes them in order). + +#### `dispatch.submit()` redesign + +The single orchestration point. All render- and flush-specific +knowledge lives here --- the scheduler sees only generic dynamic tasks. + +```js +submit(out, _state, scheduler) { + const N = out.chunks.length; + const views = scheduler._views; + const idMap = scheduler._idMapping; + const renderJoinIdx = idMap.nameToIdx.get("renderJoin"); + const flushJoinIdx = idMap.nameToIdx.get("flushJoin"); + const renderEnvInitIdx = idMap.nameToIdx.get("renderEnvInit"); + const prepPageDirsIdx = idMap.nameToIdx.get("prepPageDirs"); + + // 1. Allocate 2N slots from the generic pool. + const renderBase = allocDynamicSlots(views, idMap, N); + const flushBase = allocDynamicSlots(views, idMap, N); + + // 2. Write metadata into the SAB. + for (let i = 0; i < N; i++) { + writeTaskMeta(views, renderBase + i, { + handlerIdx: HANDLERS.render, + perWorkerDeps: [renderEnvInitIdx], + }); + writeTaskMeta(views, flushBase + i, { + handlerIdx: HANDLERS.flush, + priority: 1, + }); + } + + // 3a. Wire dynamic-only edges: render:i → [renderJoin, flush:i], + // flush:i → [flushJoin]. + const edges = []; + for (let i = 0; i < N; i++) { + edges.push({ from: renderBase + i, to: [renderJoinIdx, flushBase + i] }); + edges.push({ from: flushBase + i, to: [flushJoinIdx] }); + } + wireDynamicEdges(views, edges); + + // 3b. Append prepPageDirs → flush:0..N-1 (output directories must + // exist before flush writes a page). prepPageDirs already has + // writeAssets as a static successor, so use the append helper that + // preserves the existing edge. + const prepToFlush = []; + for (let i = 0; i < N; i++) prepToFlush.push(flushBase + i); + appendDynamicSuccessors(views, [{ from: prepPageDirsIdx, to: prepToFlush }]); + + // 4. Set dep counts and pinning. + setDepCount(views, renderJoinIdx, N); + setDepCount(views, flushJoinIdx, N); + for (let i = 0; i < N; i++) { + setDepCount(views, flushBase + i, 2); // gated on render:i + prepPageDirs + Atomics.store(views.pinnedTo, flushBase + i, renderBase + i); + views.flags[flushBase + i] |= F_PIN_TO_PRED; + } + + // 5. Register names + submit callbacks on the main-thread task map. + for (let i = 0; i < N; i++) { + const rName = `render:${i}`; + idMap.nameToIdx.set(rName, renderBase + i); + idMap.idxToName[renderBase + i] = rName; + scheduler.tasks.set(rName, { + expected: [], + consolidate: true, + ganttSection: "Render", + submit(renderOut, state) { + for (const r of renderOut.pages) { + const p = state.pageByDest.get(r.destPath); + if (!p) continue; + p.renderedContent = r.renderedContent; + if (r.offlineMisses !== undefined) p.offlineMisses = r.offlineMisses; + } + }, + }); + + const fName = `flush:${i}`; + idMap.nameToIdx.set(fName, flushBase + i); + idMap.idxToName[flushBase + i] = fName; + scheduler.tasks.set(fName, { + expected: [`render:${i}`], + consolidate: true, + ganttSection: "Write", + submit() {}, + }); + } + + // Populate flushJoin's expected array so _assembleInputs delivers + // all flush results to its execute(). Reset first --- in serve mode + // the task def object is reused across rebuilds; without the reset, + // names from the previous build would accumulate. + const flushJoinDef = scheduler.tasks.get("flushJoin"); + flushJoinDef.expected = []; + for (let i = 0; i < N; i++) flushJoinDef.expected.push(`flush:${i}`); + + // 6. Pack payload, broadcast, and activate. + const payloadSAB = packPayloads(views, renderBase, out.chunks); + scheduler.addDynamicTasks(2 * N + 2); // N render + N flush + renderJoin + flushJoin + scheduler.pool.broadcastDynamicData(payloadSAB, out.sharedSAB); + activateDynamicTasks(views, renderBase, 2 * N); // render tasks activate (depCount 0); + // flush tasks stay NOT_READY (depCount 1) +}, +``` + +No `_renderExpected`, no `wireJoins()`, no name-prefix matching. +The successor edges, dep counts, and pinning are all explicit data +written into the SAB before any task activates. + +**Ordering guarantee.** `wireDynamicEdges` and `setDepCount` run +before `activateDynamicTasks`. No render task can *complete* before +its successor edges and the join dep counts are in place. + +#### `allocSchedulerSAB` changes + +1. **Remove the render pre-reservation loop** (current lines 98--103) + and the `taskMeta` pre-fill loop (lines 216--223). + +2. **Replace `taskMeta` construction** with `writeTaskMeta` calls in + the existing per-task loop. For each static task that is NOT + `runOnMain`, resolve the handler name through `HANDLERS`, resolve + dep names through `nameToIdx`, and call `writeTaskMeta`. Skip + main-thread tasks (their metadata slots stay at initialized + defaults; workers never read them). + +3. **Initialize `idMapping.nextDynamic = 0`.** Dynamic slots start + at `DYNAMIC_BASE` (= static task count) and grow upward. + +4. **Remove `taskMeta` from the return value.** Return + `{ sab, views, idMapping }` only. + +5. **Remove `MAX_RENDER_CHUNKS`.** `payloadOffset` and + `payloadLength` are sized to `MAX_TASKS`. + +6. **Pre-fill the `-1`-default arrays for the whole table, not just + the static slots.** `SharedArrayBuffer` is zero-initialized, so + any dynamic slot whose `affinityLane`, `pinnedTo`, or + `completedOnLane` is not explicitly written looks pinned to lane + 0 / task 0. Add `views.affinityLane.fill(-1)`, + `views.pinnedTo.fill(-1)`, `views.completedOnLane.fill(-1)` (and + `handlerIdx.fill(-1)`, `perWorkerDep.fill(-1)`, + `expectedDep.fill(-1)` --- already required for the metadata + arrays). The existing per-static-task assignments become no-ops + that overwrite with the same value. + +7. **Update `verifySchedulerSAB`.** The current verification + function checks `taskMeta`-derived properties (dep counts, flags, + successor edges, seed status). Extend it to verify the new SAB + arrays: `handlerIdx` matches `HANDLERS[def.handler]` for worker + tasks, `perWorkerDep` / `expectedDep` match the resolved indices, + and main-thread task slots have `handlerIdx = -1`. + +#### `cpu-worker.mjs` changes + +1. **Remove `taskMeta`** from the init message handler and module + scope. + +2. **Build `handlerById` at init** from the `HANDLERS` constant (or + receive it in the init message and invert). + +3. **Replace all `taskMeta[idx]` reads with `readTaskMeta(views, idx)`.** + Five call sites: pull loop after claim (~line 311), idle-task scan + (~line 232), nested dep check (~line 330), direct dep check + (~line 326), idle-task execution (~line 280). + +4. **Replace `meta.handler` lookup** (`handlers[meta.handler]`) with + `handlerById[meta.handlerIdx]`. + +5. **Replace `perWorkerTiming` name field** in all three send sites. + Send `taskIdx` instead of `taskName`: + + ```js + parentPort.postMessage({ + perWorkerTiming: true, + taskIdx: idleTask, // was: taskName: idleMeta.name + timing: { start: t0, end: t1 }, + lane: myLane, + output: idleResult, + }); + ``` + +6. **Render handler: read payload from SAB.** Replace: + ```js + const chunkIndex = taskIdx - idMapping.DYNAMIC_BASE; + const offset = Atomics.load(views.chunkOffset, chunkIndex); + const length = Atomics.load(views.chunkLength, chunkIndex); + ``` + with: + ```js + const offset = Atomics.load(views.payloadOffset, taskIdx); + const length = Atomics.load(views.payloadLength, taskIdx); + ``` + No `DYNAMIC_BASE` arithmetic. + +7. **`_pageStash` → `_pendingFlush` FIFO.** The render handler + pushes one batch per chunk; the flush handler shifts one batch: + + ```js + let _pendingFlush = []; + + // In render handler, after templatePhase + offline derivation: + const batch = []; + for (const p of chunk) { + if (p.html !== undefined) + batch.push({ destPath: p.destPath, html: p.html, + offlineHtml: p.offlineHtml, offlineMisses: p.offlineMisses }); + } + _pendingFlush.push(batch); + + // flush handler: + async flush() { + const items = _pendingFlush.shift() ?? []; + let written = 0, offlineWritten = 0, offlineMisses = 0; + if (!ctx.opts.dryRun) { + let next = 0; + const limit = Math.min(64, items.length || 1); + const workers = Array.from({ length: limit }, async () => { + while (next < items.length) { + const p = items[next++]; + await fsP.writeFile(path.join(ctx.destRoot, p.destPath), p.html, "utf8"); + written++; + if (p.offlineHtml !== undefined) { + await fsP.writeFile( + path.join(ctx.destRoot + "-offline", p.destPath), p.offlineHtml, "utf8"); + offlineWritten++; + } + offlineMisses += p.offlineMisses ?? 0; + } + }); + await Promise.all(workers); + } + return { written, offlineWritten, offlineMisses }; + }, + ``` + + Reset `_pendingFlush = []` in the `msg.init` handler (serve-mode + reuse across rebuilds). + +8. **Receive `payloadSAB` via `dynamicData` message** (renamed from + `renderData`). Store as `_payloadSAB`. + +#### `scheduler.mjs` changes + +1. **Remove `taskMeta` from init message** to workers. Send + `{ init: true, sab, ctx, idMapping }`. + +2. **Remove `_renderCount`, `_renderExpected`** fields and their + constructor initialization. + +3. **Remove `_flushCount`, `_flushStats`** fields, constructor + initialization, and serve-mode reset. + +4. **Remove the `startsWith("render:")` branch** in `_onWorkerDone`. + +5. **Remove the `taskName === "flush"` branch** in + `_onPerWorkerTiming`. + +6. **Resolve task names from indices** in `_onPerWorkerTiming`. + Replace `taskName` (received from worker) with: + ```js + const taskName = this._idMapping.idxToName[msg.taskIdx]; + ``` + +7. **Rename `dispatchRender` → `broadcastDynamicData`** (or make it + a pass-through to `pool.broadcastDynamicData`). + +8. **Summary output.** Read flush stats from + `this.results.get("flushJoin")` instead of `_flushStats`. + +#### `renderJoin` and `flushJoin` task definitions + +No `joins` field, no `wireJoins()`. Both are plain `on_demand` +barrier tasks activated by the normal SAB dep-count mechanism: + +```js +renderJoin: { + expected: [], // no static predecessors + on_demand: true, + runOnMain: true, + execute() { return {}; }, + submit() {}, +}, + +flushJoin: { + expected: [], // populated by dispatch.submit + on_demand: true, + runOnMain: true, + execute(inputs) { + let written = 0, offlineWritten = 0, offlineMisses = 0; + for (const r of Object.values(inputs)) { + written += r?.written ?? 0; + offlineWritten += r?.offlineWritten ?? 0; + offlineMisses += r?.offlineMisses ?? 0; + } + return { written, offlineWritten, offlineMisses }; + }, + submit() {}, +}, +``` + +`renderJoin`'s `expected` stays empty --- it has no static +predecessors, and its dep count is set dynamically by +`dispatch.submit`. It receives no inputs. + +`flushJoin`'s `expected` is populated by `dispatch.submit` with +`flush:0`..`flush:N-1`, so `_assembleInputs` delivers all flush +results to `execute(inputs)`. + +Note: `flush:i` is a regular dynamic task, not `unique_per_worker`. +Its results flow through `_onWorkerDone` (the normal worker +completion path), not through `_onPerWorkerTiming`. The worker +posts `{ done: taskIdx, output: { written, ... } }`, the main +thread stores the result, and `onTaskDone` decrements `flushJoin`'s +dep count. When the last `flush:i` completes, `flushJoin` becomes +READY and the main thread aggregates the stats. + +#### `flush` static task definition + +Removed. The single `unique_per_worker` / `run_when_idle` `flush` +entry in `TASKS` is deleted. Per-chunk `flush:i` tasks are +registered dynamically in `dispatch.submit()`. + +#### What gets deleted + +| Current code | Status | +|---|---| +| `taskMeta` array in `allocSchedulerSAB` | Deleted; replaced by `writeTaskMeta` calls | +| `taskMeta` in `workerData` / init message | Deleted; workers read from SAB | +| `taskMeta` module var in `cpu-worker.mjs` | Deleted | +| `render:${i}` pre-reservation loop in `allocSchedulerSAB` | Deleted | +| `render:${i}` taskMeta pre-fill loop in `allocSchedulerSAB` | Deleted | +| `chunkOffset` / `chunkLength` SAB arrays | Replaced by `payloadOffset` / `payloadLength` | +| `MAX_RENDER_CHUNKS` constant | Deleted | +| `packChunkData` function | Replaced by `packPayloads` | +| `activateRenderTasks` function | Replaced by `activateDynamicTasks` | +| `_renderCount` / `_renderExpected` in `Scheduler` | Deleted | +| `_flushCount` / `_flushStats` in `Scheduler` | Deleted | +| `startsWith("render:")` branch in `_onWorkerDone` | Deleted | +| `taskName === "flush"` branch in `_onPerWorkerTiming` | Deleted | +| `flush` static task definition | Deleted | +| `taskIdx - idMapping.DYNAMIC_BASE` in render handler | Replaced by direct `payloadOffset[taskIdx]` | + +#### Init message simplification + +Workers receive: + +``` +{ init: true, sab, ctx, idMapping } +``` + +`taskMeta` is gone. `idMapping` is retained for `DYNAMIC_BASE` (used +by `allocDynamicSlots` at build start, though workers do not need it) +and for debug/error messages. Workers only strictly need the SAB and +`ctx`; `idMapping` can be trimmed in a future phase. + +#### Edge case: worker with zero render chunks + +Under high worker counts or small page sets, some workers claim no +render tasks. No `flush:i` is pinned to them; their `_pendingFlush` +stays empty. This is safe --- the joins count only the tasks that +exist, not the workers. + +#### Files touched + +| File | Changes | +|---|---| +| `sab-scheduler.mjs` | New SAB arrays (`handlerIdx`, `perWorkerDep`, `expectedDep`, `idlePriority`, `priority`, `payloadOffset`, `payloadLength`); remove `chunkOffset`, `chunkLength`, `MAX_RENDER_CHUNKS`; bump `MAX_TASKS` to 512, `MAX_EDGES` to 2048; `HANDLERS` registry; `writeTaskMeta` / `readTaskMeta`; `allocDynamicSlots` / `wireDynamicEdges` / `appendDynamicSuccessors` / `setDepCount` / `activateDynamicTasks` / `packPayloads`; `scanAndClaim` rewrite (priority + CAS retry); whole-array `-1` pre-fill for `affinityLane` / `pinnedTo` / `completedOnLane` / `handlerIdx` / `perWorkerDep` / `expectedDep`; remove render pre-reservation + taskMeta pre-fill; remove `packChunkData` + `activateRenderTasks` | +| `scheduler.mjs` | Remove `_renderCount`, `_renderExpected`, `_flushCount`, `_flushStats`; remove `startsWith("render:")` in `_onWorkerDone`; remove `taskName === "flush"` in `_onPerWorkerTiming`; resolve task names from indices in `_onPerWorkerTiming`; remove `taskMeta` from init message; rename `dispatchRender`; summary reads flush stats from `flushJoin` result | +| `tbdocs.mjs` | Remove `flush` static task def; `renderJoin` / `flushJoin` lose counter comments; `flushJoin.execute(inputs)` aggregates per-chunk write stats; `dispatch.submit()` rewritten per §dispatch.submit() redesign (including the `prepPageDirs → flush:i` append + `depCount = 2`); `GANTT_SECTION`: remove `flush`, `flushJoin` entry stays; summary output reads `flushJoin` result | +| `cpu-worker.mjs` | Remove `taskMeta` module var and init handling; build `handlerById`; replace all `taskMeta[idx]` with `readTaskMeta`; replace `meta.handler` with `handlerById[meta.handlerIdx]`; `perWorkerTiming` sends `taskIdx` not `taskName`; render handler reads `payloadOffset`/`payloadLength` directly; `_pageStash` → `_pendingFlush` FIFO; receive `dynamicData` message | +| `worker-pool.mjs` | `broadcastRenderData` → `broadcastDynamicData`; remove `taskMeta` from init message | + +#### Expected savings + +Four sources: + +1. **Distributed I/O.** Writes interleave with renders instead of + clustering at the tail. Each `render:i` is immediately followed + by its `flush:i` on the same worker; libuv file-write operations + from different workers overlap with CPU-bound renders on other + workers. + +2. **Reduced structured-clone cost.** Unchanged from Phase 12: + `html` and `offlineHtml` stay on the worker, never crossing the + `postMessage` boundary. + +3. **Scheduler simplification.** ~60 lines of special-case counters, + name-matching branches, pre-reservation loops, and `taskMeta` + construction are replaced by generic primitives (`writeTaskMeta` / + `readTaskMeta`, `allocDynamicSlots` / `wireDynamicEdges` / + `appendDynamicSuccessors` / `activateDynamicTasks`, `packPayloads`). + The scheduler has zero knowledge of what any task does. + +4. **Extensibility.** Any future fan-out pattern (`foo:0..N` with a + `fooJoin` barrier) uses the same primitives --- allocate slots, + write metadata, wire edges, set dep counts, activate. No + scheduler changes needed. + +#### Verification + +`build.bat && check.bat` clean (zero intra-site issues; the 8 +pre-existing PDF broken links from `book.html` are unchanged). The +Gantt chart shows: + +- `flush:i` bars interleaved with `render:i` bars on each worker + lane (consolidated via `consolidate: true`), instead of a single + `flush` bar at the tail. +- `renderJoin` and `flushJoin` activated by dep-count (no manual + `Atomics.store(status, joinIdx, READY)` outside of + `sabOnTaskDone`). +- `flushJoin` result carrying aggregated write stats (`written`, + `offlineWritten`, `offlineMisses`) matching the previous + single-`flush` totals. +- Render section consolidated wall-clock dropped substantially + versus the bug condition where every worker except `w0` sat idle + (see "Outcome" at the top of this section for the two divergences + that surfaced during implementation). + +### Phase 16: Persistent pool and `survives_reset` --- DONE + +**Suggested model:** Opus. + +**Outcome.** Landed as designed. `build.bat && check.bat` clean +(single-build mode unchanged; only the 8 pre-existing PDF broken +links remain). Pool reuse exercised in-process across three +back-to-back builds: build 1 = 1797 ms, build 2 = 1129 ms, +build 3 = 1011 ms --- a ~670 ms saving on the first rebuild, well +above the predicted 100--200 ms (V8 JIT settling on hot paths and +Sass startup absorption account for the rest). The boot section +correctly drops from 1514 ms wall-clock on the initial build to +~15 ms on rebuilds (no `cold:wN`, no `warmInit:wN` --- only the +fresh `renderEnvInit:wN` per-worker timings). + +Two latent issues surfaced from exercising the rebuild path that +the design did not predict; both are folded into the description +below. + +1. **`pool` had to be stripped from `ctx.opts`.** `runBuild()` + stores `opts` on the `ctx` it ships to workers via the init + message. With pool reuse, `opts.pool` carries a `WorkerPool` + instance whose live `Worker` handles cannot be structured-cloned + --- `postMessage` throws `DataCloneError`. Fix: destructure + `const { pool: externalPool = null, ...ctxOpts } = opts;` and + put `ctxOpts` (not `opts`) on `ctx`. + +2. **`dispatch.submit()` mutated the shared `TASKS.flushJoin.expected`.** + `scheduler.tasks` is `new Map(Object.entries(TASKS))`, so the + Map's `flushJoin` entry is the same object reference as + `TASKS.flushJoin`. Setting `flushJoinDef.expected = [...]` + wrote `["flush:0", "flush:1", ...]` back into `TASKS`. On the + next rebuild, `allocSchedulerSAB(TASKS, ...)` walked + `expected`, failed to resolve `"flush:0"`, and threw. Fix: + replace the Map entry with a shallow clone bearing a fresh + `expected` array + (`scheduler.tasks.set("flushJoin", { ...flushJoinDef, expected: dynamicExpected })`); + leave `TASKS.flushJoin` untouched. + +**Motivation.** In serve mode, every `runBuild()` call creates a +fresh `WorkerPool` (spawning N worker threads) and destroys it +afterward. Each rebuild pays: + +1. **Cold boot** (~100--200 ms): thread creation, `cpu-worker.mjs` + module loading, V8 JIT compilation of the worker harness. +2. **`warmInit` scheduling overhead**: the on-demand dep chain + (`render:i` → `renderEnvInit` → `warmInit`) fires on every build. + `initHighlighter()` is a module-scope singleton and returns + instantly after the first call, but the scheduling machinery + (claim `render:i`, discover unsatisfied dep, release, execute + `warmInit`, re-claim) still runs on every worker on every build. +3. **V8 JIT de-optimization**: fresh workers lose the optimized code + from the previous build's hot paths (render, template, + offline-rewrite). + +Pool persistence was part of the original SAB scheduler design +(§Build start sequence step 2, §Serve mode) but was never +implemented --- `runBuild()` unconditionally creates and destroys the +pool. This phase implements the reuse path and adds a generic +`survives_reset` flag so per-worker warm-up tasks are skipped on +rebuilds. + +#### Design + +Two pieces: + +**1. Persistent pool.** `runBuild()` accepts an optional `pool` +parameter. When provided, it reuses the existing pool and skips +`pool.destroy()` at the end. `serve.mjs` creates the pool once at +startup and passes it to every `runBuild()` call. A convenience +factory `createWorkerPool()` is exported from `tbdocs.mjs` so the +pool-creation logic (worker count, worker URL) stays centralized. + +The `WorkerPool` gains a `_buildCount` counter (initialized to 0, +incremented in `sendInit()`). `runBuild()` reads +`pool._buildCount > 0` to determine whether this is a rebuild. + +**2. `survives_reset` flag.** A new boolean on task definitions. +Semantics: for `unique_per_worker` tasks with this flag, the +handler's side effects are build-independent (e.g. loading a WASM +module) and persist in the worker's memory across init messages. +On a rebuild (`pool._buildCount > 0`), `allocSchedulerSAB` +pre-fills their `perWorkerDone` slots to 1 for all active lanes. + +Effect: the pull loop's dep check +(`perWorkerDone[task * MAX_LANES + lane] === 1`) passes +immediately. The handler never fires. The idle scan +(`findIdleTask`) skips the task (the `perWorkerDone !== 0` +short-circuit already exists). Downstream `perWorkerDeps` chains +(e.g. `renderEnvInit` depending on `warmInit`) see the dep as +satisfied and proceed without delay. + +`survives_reset` is only meaningful on `unique_per_worker` tasks. +Declaring it on a non-`unique_per_worker` task is a definition +error (caught by `allocSchedulerSAB`). The flag is a declaration +by the task author that the handler's side effects do not depend on +per-build state --- the scheduler trusts it. + +The task definition: + +```js +warmInit: { + expected: [], + on_demand: true, + unique_per_worker: true, + run_when_idle: true, + survives_reset: true, // new + handler: "warmInit", + submit() {}, +}, +``` + +`renderEnvInit` does NOT get the flag --- it depends on per-build +data (link tables, config, site paths) and must re-run on each build. + +#### Why no worker-side changes are needed + +The init message handler (`cpu-worker.mjs` lines 201--209) already +resets only per-build state (`_payloadSAB`, `_sharedSAB`, +`_renderEnv`, `_pendingFlush`). Module-scope singletons (the Shiki +highlighter inside `highlight.mjs`) persist naturally across init +messages because the worker thread and its module state survive. +The `perWorkerDone` pre-fill in the SAB is the only mechanism needed +to prevent the handler from re-executing --- the worker does not need +to know about `survives_reset`. + +#### Why the pull loop exits cleanly between builds + +When `_finish()` fires on the main thread, it sets `buildDone = 1` +in the SAB and calls `Atomics.notify(views.notify, 0, Infinity)`. +Workers that are sleeping in `Atomics.wait` wake immediately; +workers mid-iteration reach the `buildDone` check on the next loop +cycle. All workers return from `pullLoop()` and re-enter their +event loop. + +The next build's init message is sent after `runBuild()` returns (in +`serve.mjs`, after logging and Gantt injection). Workers process the +init message on their now-idle event loops, create views over the new +SAB, and call `pullLoop()` again. No overlap with the previous +`pullLoop()` instance is possible --- `pullLoop()` has already +returned before the init message is processed. + +The 300 ms debounce in `serve.mjs` provides ample margin, but the +sequencing is safe even without it: `runBuild()` is `await`ed, so +the next `runBuild()` call (and its init messages) cannot begin until +the previous one has resolved. + +#### Dependency chain correctness + +`renderEnvInit` has `perWorkerDeps: ["warmInit"]`. On a rebuild, +`warmInit`'s `perWorkerDone` is pre-filled to 1. The pull loop's +dep check (cpu-worker.mjs line 305) reads +`perWorkerDone[warmInitIdx * MAX_LANES + myLane] === 1` and +proceeds. `renderEnvInit` runs on-demand as before, using the same +handler --- which correctly rebuilds `_renderEnv` from the fresh +`sharedSAB` payload. No chain short-circuiting beyond `warmInit` +occurs. + +#### `runBuild()` changes (`tbdocs.mjs`) + +Accept an optional `pool` in the `opts` parameter. When present, +skip pool creation and destruction. Detect rebuild mode from the +pool's build count. Skip boot-timing injection on rebuilds (the +cold-boot timings are from the first build and stale). + +```js +export async function runBuild(opts) { + const buildStart = Date.now(); + const { src, dest } = opts; + const srcRoot = path.resolve(process.cwd(), src); + const destRoot = path.resolve(dest ?? path.join(srcRoot, "_site")); + + const ctx = { srcRoot, destRoot, opts, workerCount }; + + const externalPool = opts.pool ?? null; + const rebuild = externalPool?._buildCount > 0; + + const { sab, views, idMapping } = + allocSchedulerSAB(TASKS, workerCount, { rebuild }); + verifySchedulerSAB(TASKS, views, idMapping); + + const pool = externalPool ?? new WorkerPool(workerCount, CPU_WORKER_URL); + const scheduler = new Scheduler({ pool, tasks: TASKS, views, idMapping, + ganttSections: GANTT_SECTION }); + + pool.onWorkerDone = (msg) => scheduler._onWorkerDone(msg); + pool.onWorkerError = (msg) => scheduler._onWorkerError(msg); + pool.onPerWorkerTiming = (msg) => scheduler._onPerWorkerTiming(msg); + pool.onMainTaskReady = () => scheduler._onMainTaskReady(); + + pool.sendInit(sab, ctx, idMapping); + + let results; + try { + results = await scheduler.start(ctx); + } finally { + if (!externalPool) await pool.destroy(); + } + + // ... existing summary logging ... + + // Boot timings: only inject on first build. + if (!rebuild) { + for (const bt of pool.bootTimings) { + scheduler.timings.set(`${bt.type}:w${bt.lane}`, { + start: bt.start, end: bt.end, + workerStart: bt.start, workerEnd: bt.end, + lane: bt.lane, ganttSection: "Boot", + }); + } + } + + // ... Gantt injection, drift guard ... +} +``` + +Export a pool factory so `serve.mjs` does not import `WorkerPool` +or `CPU_WORKER_URL` directly: + +```js +export function createWorkerPool() { + return new WorkerPool(workerCount, CPU_WORKER_URL); +} +``` + +Add `survives_reset: true` to the `warmInit` task definition. + +#### `allocSchedulerSAB` changes (`sab-scheduler.mjs`) + +Third parameter gains `{ rebuild }`: + +```js +export function allocSchedulerSAB(taskDefs, workerCount, opts = {}) { + // ... existing allocation logic (indices, successor list, + // depCount, flags, succOffset/succCount/succList, status) ... + + // Validate: survives_reset only on unique_per_worker tasks. + for (const [name, def] of Object.entries(taskDefs)) { + if (def.survives_reset && !def.unique_per_worker) + throw new Error( + `"${name}" has survives_reset without unique_per_worker`); + } + + // Pre-fill perWorkerDone for surviving tasks on rebuilds. + if (opts.rebuild) { + for (const [name, def] of Object.entries(taskDefs)) { + if (!def.survives_reset || !def.unique_per_worker) continue; + const idx = nameToIdx.get(name); + for (let lane = 0; lane < workerCount; lane++) { + views.perWorkerDone[idx * MAX_LANES + lane] = 1; + } + } + } + + // ... existing writeTaskMeta loop and return ... +} +``` + +The pre-fill runs after all per-task arrays are written and before +the return. `verifySchedulerSAB` does not check `perWorkerDone`, +so no changes needed there. + +#### `serve.mjs` changes + +Create the pool once at startup. Pass it to every `runBuild()` +call. Destroy on shutdown. + +```js +import { runBuild, createWorkerPool } from "./tbdocs.mjs"; + +export async function runServe(opts) { + // ... existing setup (srcRoot, destRoot, port) ... + + const pool = createWorkerPool(); + + // Initial build + try { + await runBuild({ ...opts, dest: destRoot, + skipOffline: true, skipPdf: true, pool }); + } catch (err) { + console.error("serve: initial build failed:", err.message); + await pool.destroy(); + process.exit(1); + } + + // ... existing server + SSE setup ... + + async function fire() { + if (running) { pending = true; return; } + running = true; + const files = [...changedFiles].sort(); + changedFiles.clear(); + console.log(`\nChanged: ${files.join(", ")}`); + try { + await runBuild({ ...opts, dest: destRoot, + skipOffline: true, skipPdf: true, pool }); + notifyReload(); + } catch (err) { + console.error("rebuild failed:", err.message); + } finally { + running = false; + if (pending) { pending = false; schedule(); } + } + } + + // ... existing watcher ... + + // Shutdown + process.on("SIGINT", () => { + console.log("serve: shutting down."); + ac.abort(); + for (const res of sseClients) { + try { res.end(); } catch {} + } + sseClients.clear(); + pool.destroy(); + server.close(() => process.exit(0)); + setTimeout(() => process.exit(0), 100).unref(); + }); + + // ... +} +``` + +#### `WorkerPool` changes (`worker-pool.mjs`) + +Add `_buildCount`, incremented in `sendInit()`: + +```js +export class WorkerPool { + constructor(size, workerUrl) { + // ... existing fields ... + this._buildCount = 0; + } + + sendInit(sab, ctx, idMapping) { + for (const w of this._workers) { + w.postMessage({ init: true, sab, ctx, idMapping }); + } + this._buildCount++; + } +} +``` + +#### Edge cases + +1. **First build in serve mode.** `pool._buildCount === 0` → + `rebuild = false`. Workers run `warmInit` normally. Pool is + initialized. Identical to current single-build behavior. + +2. **Subsequent rebuilds.** `pool._buildCount > 0` → + `rebuild = true`. `warmInit`'s `perWorkerDone` pre-filled. + Workers skip it. `renderEnvInit` re-runs with fresh data. + +3. **Single-build mode (`build.bat`).** No `pool` option. + `externalPool = null`. Pool created, used, destroyed. + `rebuild = false`. No change from current behavior. + +4. **Worker crash during previous build.** The pool does not + respawn crashed workers. On the next rebuild, `perWorkerDone` is + pre-filled for the dead worker's lane. No task is scheduled to + that lane (the worker thread does not exist), so the pre-fill is + harmless. The render fan-out distributes across surviving + workers. + +5. **Build failure in serve mode.** `runBuild()` rejects; the + `finally` block does NOT destroy an external pool. `serve.mjs` + logs the error and waits for the next file change, which triggers + a fresh `runBuild()` with the same pool. Workers have already + exited their pull loops (either from `buildDone = 2` on abort, or + from the failed task's error propagation), and re-enter when the + next init message arrives. + +6. **Future `survives_reset` tasks.** Any `unique_per_worker` + + `on_demand` task that loads build-independent state (e.g. a WASM + module, a compiled grammar, a vendored dataset) can declare + `survives_reset: true`. The mechanism is generic. + +#### Interaction with Phase 15 + +Phase 15 landed. `taskMeta` is gone --- task metadata lives in SAB +arrays written by `writeTaskMeta` / `readTaskMeta`. The +`perWorkerDone` pre-fill loop added here operates on a separate SAB +array, independent of the task metadata layout. The +`def.survives_reset` read stays in the JS task-definition loop. No +conflict. + +Phase 15 also renamed `_pageStash` to `_pendingFlush` (FIFO queue, +one batch per render chunk) and `_chunkDataSAB` to `_payloadSAB`. +The init handler in `cpu-worker.mjs` (lines 201--209) resets +`_payloadSAB`, `_sharedSAB`, `_renderEnv`, and `_pendingFlush` --- +exactly the per-build state. Module-scope singletons (the Shiki +highlighter) persist naturally, so the `survives_reset` mechanism +works as designed. + +#### Files touched + +| File | Changes | +|---|---| +| `tbdocs.mjs` | `runBuild()`: accept `pool` option, detect rebuild, skip pool create/destroy, skip boot timings on rebuild; `warmInit` task def: add `survives_reset: true`; export `createWorkerPool()` factory | +| `sab-scheduler.mjs` | `allocSchedulerSAB`: accept `opts` parameter; validate `survives_reset` + `unique_per_worker` constraint; pre-fill `perWorkerDone` for surviving tasks in rebuild mode | +| `serve.mjs` | Create pool once at startup via `createWorkerPool()`; pass to `runBuild()`; destroy on shutdown | +| `worker-pool.mjs` | Add `_buildCount` counter, incremented in `sendInit()` | +| `cpu-worker.mjs` | No changes | +| `scheduler.mjs` | No changes | + +#### Expected savings + +Two sources: + +1. **Cold boot elimination.** ~100--200 ms per rebuild (worker + thread creation, module loading, V8 compilation). This is the + dominant saving. + +2. **`warmInit` scheduling overhead.** Per worker per rebuild: one + claim--release cycle on the first render chunk, one on-demand dep + resolution, one `initHighlighter()` call (instant but not free). + Roughly ~5--10 ms total across all workers. Small, but the + architectural benefit is that the on-demand dep chain is never + entered for `warmInit` --- downstream tasks see the dep as already + satisfied at SAB allocation time. + +A secondary benefit: V8 JIT-optimized code persists across rebuilds. +The hot paths (render, template, offline-rewrite) stay in optimized +tier after the first build, rather than being re-compiled from +scratch on each rebuild. + +#### Verification + +`build.bat && check.bat` clean. Single-build mode unchanged. + +Serve mode (`serve.bat`): + +- First build: timing summary shows normal `warmInit:wN` entries + and `cold:wN` boot entries. +- Subsequent rebuilds: `warmInit:wN` entries absent (handlers never + ran). `cold:wN` entries absent (no boot). `renderEnvInit:wN` + entries present (re-runs with fresh data). +- Total rebuild time drops by ~100--200 ms (cold boot) on a 16-core + machine. +- All rebuilds produce byte-identical output to fresh builds. + +### Phase 17: Distribute search-data derivation to render workers --- DONE + +**Suggested model:** Sonnet. + +**Outcome.** Landed as designed. `build.bat && check.bat` clean (zero +intra-site issues; the same 8 pre-existing PDF broken links remain). +`searchData` drops from ~140 ms to ~17 ms on a 16-core machine (a +~125 ms net saving on the critical path). Per-worker render times +absorb the derivation cost (~3--5 ms per chunk). The total search +entry count (2754) and the set of `{doc, title, content, url, relUrl}` +tuples are bit-identical to the pre-Phase-17 output across runs. + +One observation that adjusts the design's byte-identity claim: +**`search-data.json` byte-output already varied run-to-run before +Phase 17**, because `state.pages` ordering depends on the filesystem +traversal order returned by `fast-glob` inside [discover.mjs](discover.mjs) +(only basenames are explicitly sorted; ties hold fast-glob's input +order, which is filesystem-dependent). Five back-to-back pre-Phase-17 +builds produced five distinct SHA-256 hashes. The Phase 17 changes +preserve that pre-existing non-determinism --- they don't introduce +any new ordering variance --- and the entry SET (which is what +client-side lunr actually indexes) remains stable across runs and +matches pre-Phase-17. Making the output byte-stable across builds +would require sorting pages by `srcRel` (or another total order) in +discover, which is orthogonal to Phase 17 and tracked separately if +ever needed. + +**Motivation.** `searchData` runs on the main thread after `renderJoin` +and takes 100--200 ms (dev machine to CI). It sits on the critical +path to `writeAux` -> `writeOffline`. The task does two things: +(1) derive search entries from `renderedContent` (CPU-heavy HTML +parsing, splitting on headings, stripping tags, sanitizing --- +~80--90% of the runtime), and (2) render to JSON and write one file +(~10--20% of the runtime). The derivation is per-page with zero +cross-page dependencies, and each render worker already has the +rendered content and site config. Moving the derivation onto workers +eliminates ~80--170 ms from the main-thread critical path. + +#### Design + +Two pieces: derive on workers, consolidate on main. + +**Worker side.** The `render` handler in `cpu-worker.mjs` calls +`deriveSearchEntries(chunk, site)` after render + template + offline, +producing per-chunk search entries. The entries are returned +alongside the page delta, stripped of the `sourcePage` field (worker +pages are clones, not master refs) and the `i` field (chunk-local +indices are meaningless; the main thread assigns global indices during +consolidation). Each entry is five short strings (`doc`, `title`, +`content`, `url`, `relUrl`) --- the structured-clone cost is +negligible (~400 KB total across all workers for ~2000 entries). + +The worker already has everything `deriveSearchEntries` needs: + +- `page.renderedContent` --- set by `renderPhase`. +- `page.frontmatter.title`, `page.frontmatter.search_exclude`, + `page.permalink` --- on the chunk pages. +- `site.config.search.heading_level`, `site.config.baseurl` --- in + the shared SAB payload (via `siteData.config`). + +**Import.** `cpu-worker.mjs` adds +`import { deriveSearchEntries } from "./search.mjs"`. The transitive +import of `stripHtml` from `seo.mjs` and `writeFileMkdirp` from +`write.mjs` is harmless --- workers have full Node.js access and only +the pure-compute `deriveSearchEntries` function is called. + +**Main-thread merge.** `SharedState` gains a `searchChunks` field +(initialized to `[]`). `dispatch.submit()` pre-allocates it as +`new Array(N)` so each `render:i.submit()` can assign by chunk index: +`state.searchChunks[chunkIdx] = renderOut.searchEntries`. Indexed +assignment preserves page order across the chunks --- chunk 0's +entries come before chunk 1's, matching the serial iteration order +over `state.pages`. By the time `renderJoin` fires, every slot is +populated. + +**`searchData` task.** Dependencies unchanged: `renderJoin` + +`prepDest`. The `execute()` body changes from "derive from +state.pages + write" to "consolidate from state.searchChunks + write": + +1. Flatten `state.searchChunks` into a single array (`.flat()`). +2. Assign sequential `i` values (0, 1, 2, ...). +3. Map through `renderEntryString` (the existing per-entry JSON + formatter, newly exported from `search.mjs`). +4. Join, wrap, write. + +The CPU-heavy work (steps inside `deriveSearchEntries`: +`extractSections`, `stripHtml`, `sanitiseContent`) is gone from the +main thread. What remains is a linear scan over ~2000 small objects + +`JSON.stringify` per entry + one file write --- estimated ~5--15 ms. + +**`searchData` output shape.** Unchanged: `{ entries: number, +json: string }`. Downstream consumers (`writeAux`, `writeOffline`) +see no difference. + +**`search.mjs` changes.** Export `renderEntryString` (currently +file-local) so the consolidated `searchData.execute` can import it. +Add a `writeSearchDataFromChunks(searchChunks, destRoot)` convenience +that encapsulates the consolidate + renumber + render + write +sequence, keeping the logic in `search.mjs` alongside the existing +`writeSearchData`. + +#### Data flow + +``` +render:i [W] + ├── renderPhase + templatePhase + offline (existing) + ├── deriveSearchEntries(chunk, site) ← NEW + └── return { pages: [...], searchEntries: [...] } + │ + ▼ +render:i.submit() [M] + ├── merge renderedContent + offlineMisses into state.pages (existing) + └── state.searchChunks[i] = renderOut.searchEntries ← NEW + │ + ▼ +renderJoin [M] (barrier — all searchChunks slots populated) + │ + ▼ +searchData [M] + ├── flatten state.searchChunks (~5 ms) + ├── assign sequential i + ├── renderEntryString per entry + ├── write search-data.json + └── return { entries, json } +``` + +#### Ordering guarantee + +The serial `deriveSearchEntries` iterates `state.pages` in master- +array order, producing entries with sequential `i` values (0, 1, +2, ...). The distributed version preserves this ordering: + +1. `chunkPages()` slices `state.pages` into consecutive, non- + overlapping chunks: chunk 0 = pages[0..k), chunk 1 = pages[k..2k), + etc. Within each chunk, page order matches the master. +2. `deriveSearchEntries(chunk, site)` iterates the chunk in order, + producing entries in the same relative order as the serial version. +3. `state.searchChunks[i]` uses indexed assignment keyed by chunk + index, not push order. `searchChunks.flat()` concatenates in + index order: chunk 0, chunk 1, ..., chunk N-1. +4. Sequential `i` assignment after flattening produces the same + numbering as the serial loop. + +Result: byte-identical `search-data.json`. + +#### Changes + +**`cpu-worker.mjs`.** Import `deriveSearchEntries` from +`./search.mjs`. In the `render` handler, after the offline-derivation +block and the `_pendingFlush.push(batch)` line (Phase 15 added the +FIFO stash there), derive the per-chunk search entries and include +them in the return value: + +```js +// Per-chunk search entries (consolidated on main during searchData). +// Drop `sourcePage` (workers hold cloned page objects, not master +// refs) and `i` (chunk-local indices are meaningless; main assigns +// global indices during consolidation). +const searchEntries = deriveSearchEntries(chunk, env.site) + .map(e => ({ doc: e.doc, title: e.title, content: e.content, + url: e.url, relUrl: e.relUrl })); + +return { + pages: chunk.map(p => ({ + destPath: p.destPath, + renderedContent: p.renderedContent, + offlineMisses: p.offlineMisses, + })), + searchEntries, +}; +``` + +The five-field strip on each entry is what keeps the structured-clone +cost negligible (~400 KB total across all workers for ~2000 entries). + +**`search.mjs`.** Export `renderEntryString`. Add: + +```js +export async function writeSearchDataFromChunks(searchChunks, destRoot) { + const allEntries = searchChunks.flat(); + for (let idx = 0; idx < allEntries.length; idx++) allEntries[idx].i = idx; + const body = allEntries.map(renderEntryString).join(","); + const json = `{` + body + `\n}\n`; + await writeFileMkdirp( + path.join(destRoot, "assets/js/search-data.json"), json); + return { entries: allEntries.length, json }; +} +``` + +**`scheduler.mjs`.** Add `searchChunks = []` to `SharedState`. + +**`tbdocs.mjs`.** Four changes, all in `dispatch.submit()` and the +`searchData` task def: + +1. Import: `writeSearchDataFromChunks` from `./search.mjs` (replaces + the existing `writeSearchData` import --- the main-thread helper + is no longer called). + +2. At the top of `dispatch.submit()`, after `const N = out.chunks.length;` + (the existing first statement in Phase 15's redesigned submit), + pre-allocate the chunk array on `SharedState`: + + ```js + scheduler.state.searchChunks = new Array(N); + ``` + + Pre-allocation is required: each `render:i.submit()` writes by + chunk index, not push order, so `searchChunks.flat()` later + produces entries in `pages` order regardless of completion order. + +3. Inside the existing `for (let i = 0; i < N; i++)` loop in + `dispatch.submit()` (the one that registers the per-chunk + `render:${i}` task defs via `scheduler.tasks.set(rName, ...)`), + extend the `submit(renderOut, state)` body with one indexed + assignment. `i` is already in scope via `let`, so no `chunkIdx` + capture is needed: + + ```js + scheduler.tasks.set(rName, { + expected: [], + consolidate: true, + ganttSection: "Render", + submit(renderOut, state) { + for (const r of renderOut.pages) { + const p = state.pageByDest.get(r.destPath); + if (!p) continue; + p.renderedContent = r.renderedContent; + if (r.offlineMisses !== undefined) p.offlineMisses = r.offlineMisses; + } + state.searchChunks[i] = renderOut.searchEntries; // NEW + }, + }); + ``` + + The `flush:${i}` registration in the same loop is unchanged. + +4. `searchData.execute()` (currently calls `writeSearchData(state.pages, + state.site, ctx.destRoot)`): + + ```js + async execute(_, ctx, state) { + if (ctx.opts.dryRun) return { entries: 0, json: "" }; + return writeSearchDataFromChunks(state.searchChunks, ctx.destRoot); + }, + ``` + + The `expected: ["renderJoin", "prepDest"]` dependency list is + unchanged --- `renderJoin` still provides the "all render:i deltas + merged" guarantee that gates the consolidation. + +#### Files touched + +| File | Changes | +|---|---| +| `cpu-worker.mjs` | Import `deriveSearchEntries` from `search.mjs`. In `render` handler, call it after offline pass, add `searchEntries` (sans `sourcePage`, sans `i`) to return value. | +| `search.mjs` | Export `renderEntryString`. Add `writeSearchDataFromChunks()`. | +| `scheduler.mjs` | Add `searchChunks = []` to `SharedState`. | +| `tbdocs.mjs` | Import `writeSearchDataFromChunks`. `dispatch.submit`: pre-allocate `state.searchChunks`. `render:i.submit`: store `searchEntries` by chunk index. `searchData.execute`: call `writeSearchDataFromChunks` instead of `writeSearchData`. | + +#### Interaction with other phases + +- **Phase 15 (generic dynamic tasks) --- LANDED.** Phase 15 + rewrote `dispatch.submit()` into its current form: + `allocDynamicSlots` for 2N render + flush slots, `writeTaskMeta` + via the SAB metadata API, per-iteration `scheduler.tasks.set(rName, + ...)` for each `render:${i}` / `flush:${i}` def, and + `activateDynamicTasks` at the end. Phase 17's edits drop straight + into that structure: pre-allocate `state.searchChunks` once after + `N` is known, and add one indexed assignment inside the existing + per-chunk `render:${i}` submit closure. No other Phase 15 surface + (the generic payload SAB, the priority-aware claim, the FIFO + `_pendingFlush`) is affected --- the worker's `searchEntries` + ride out in the same `{ done, output }` message that already + carries the render delta. + +- **Phase 16 (persistent pool) --- DONE.** Pool persistence is + orthogonal. `searchChunks` lives on `SharedState`, which is fresh + per build (new `Scheduler` = new `SharedState`). Confirmed: Phase + 16's two divergences (stripping `pool` from `ctx.opts` to avoid + `DataCloneError`, and cloning `flushJoinDef` to prevent + `dispatch.submit()` from mutating `TASKS`) are in disjoint code + paths. The `ctx.opts` fix is in `runBuild()` parameter handling; + the clone fix is in the `flushJoin` portion of `dispatch.submit()`. + Phase 17's edits (pre-allocating `state.searchChunks` and adding + an indexed assignment in the `render:i` submit closure) touch + neither. No interaction. + +- **Phase 18 (per-page SEO on workers).** Phase 18 adds + `computeChunkSeo` between `renderPhase` and `templatePhase` in the + same render handler that Phase 17 extends. Both are independent + per-page transforms with no data dependency on each other; they + compose without conflict regardless of landing order. + +- **`_triage.mjs` / `_diff.mjs`.** These dev tools call + `deriveSearchEntries` directly on `state.pages` on the main thread. + They are not part of the build pipeline and are unaffected. The + `sourcePage` field they rely on is only produced by main-thread + calls to `deriveSearchEntries`, not by the worker path. + +#### Expected savings + +The derivation distributes across N workers in parallel with render + +template + offline. Per-worker added time is +~(100--200 ms) / N. + +| Machine | Per-worker added | Main-thread searchData | Net critical-path saving | +|---|---|---|---| +| 16-core (CI) | ~6--12 ms | drops to ~5--15 ms | ~85--185 ms | +| 4-core (dev) | ~25--50 ms | drops to ~5--15 ms | ~55--135 ms | + +#### Verification + +`build.bat && check.bat` clean. The set of search entries (keyed by +`{doc, title, content, url, relUrl}`) is identical to the +pre-Phase-17 output. The timing summary shows `searchData` at +~5--15 ms (down from ~100--200 ms). `render:i` timings increase by +a few ms each, absorbed within the render fan-out. Total build +wall-clock drops by the net saving. + +Note: byte-identity across runs is NOT a Phase 17 invariant. +`search-data.json` was already non-deterministic before Phase 17 +because `discover.mjs` keeps fast-glob's filesystem-dependent input +order for pages that tie under the basename sort, which propagates +into entry order in both the serial and chunked paths. Phase 17 +preserves the pre-existing variance --- it does not amplify it. + +### Phase 18: Move per-page SEO to render workers --- DONE + +**Suggested model:** Sonnet. + +**Outcome.** Landed as designed. `build.bat && check.bat` clean (zero +intra-site issues; only the 8 pre-existing PDF broken links in +`book.html` remain). The `seo` task is gone from the timing summary; +`markdownInit` absorbs the site-level `computeSiteSeo` work (~36 ms +total, up from ~5 ms baseline --- the ~30 ms of `renderTitle` for the +config title plus `absoluteUrl` for the logo). Per-page SEO now runs +on workers between `renderPhase` and `templatePhase`, absorbed within +the render fan-out wall-clock. Spot-checked rendered head output: +``, `og:title`, `canonical`, `og:site_name`, and the JSON-LD +`WebSite` / `WebPage` `@type` (driven by `seoIsHome`) all populate +identically to the pre-phase build. + +No divergences from the design surfaced during implementation. + +**Motivation.** The `seo` task runs on the main thread after +`markdownInit`, computing four per-page fields (`seoTitle`, +`seoFullTitle`, `seoCanonical`, `seoIsHome`) and two site-level +constants (`seoSiteTitle`, `seoLogoUrl`). It takes ~35 ms and sits +on the critical path between `markdownInit` and `dispatch`. The +per-page fields are only consumed by `templatePhase` inside the +render workers --- no main-thread task reads them after `dispatch` +serializes them into the chunk payloads. Moving the per-page +computation into the render workers removes the task from the +critical path and shrinks the serialized chunk payload by ~130 KB +(~150 bytes × ~858 pages). + +#### Design + +Split `precomputeSeo` into two functions: + +1. **`computeSiteSeo(config, markdown)`** --- returns + `{ seoSiteTitle, seoLogoUrl }`. Called once on the main thread, + folded into `markdownInit`. + +2. **`computeChunkSeo(pages, seoSiteTitle, config, markdown)`** --- + mutates pages in place with the four per-page fields. Called on + each render worker between `renderPhase` and `templatePhase`. + +The `seo` task is deleted. `dispatch.expected` drops `"seo"` and +gains `"markdownInit"` (the transitive dependency through `seo` is +gone; `dispatch` reads `state.site.seoSiteTitle`, `seoLogoUrl`, and +`linkTablesSerialized`, all written by `markdownInit`). + +#### Data flow + +``` +markdownInit [M] + ├── buildLinkTables + createMarkdownIt (existing) + └── computeSiteSeo(config, markdown) ← NEW + state.site.seoSiteTitle = ... + state.site.seoLogoUrl = ... + │ + ▼ +dispatch [M] (expected: drops "seo", gains "markdownInit") + └── packs seoSiteTitle + seoLogoUrl into sharedSAB (unchanged) + │ + ▼ +render:i [W] + ├── deserialize chunk + ├── renderPhase(chunk, site) (existing) + ├── computeChunkSeo(chunk, site.seoSiteTitle, ← NEW + │ site.config, site.markdown) + ├── templatePhase(chunk, site, initData) (existing) + ├── offline derivation (existing) + ├── _pendingFlush stash (Phase 15) + └── deriveSearchEntries(chunk, site) (Phase 17) +``` + +#### Why it's safe + +- **No main-thread consumer.** The four per-page SEO fields are set + on `state.pages` before dispatch, serialized into chunks, + deserialized on workers, used by `headSeoBlock` inside + `templatePhase`, and never sent back to main. No post-dispatch + main-thread task reads them. `searchData` (after Phase 17) only + consolidates pre-derived entries on main; the heavy + `deriveSearchEntries` runs on workers and reads `renderedContent` + and `frontmatter` --- not any `seo*` field (confirmed: `search.mjs` + has zero references to `seoTitle`, `seoFullTitle`, `seoCanonical`, + or `seoIsHome`). `writePdf` reads `renderedContent` via + `bookData._chapters` refs. Neither reads any `seo*` field. + +- **Identical markdown-it instance.** The render worker's + markdown-it instance is built from the same plugin stack as the + main thread's (`seo.mjs` lines 36--45 document the equivalence). + `renderTitle` produces byte-identical output. + +- **Page data available.** Each chunk page carries + `frontmatter.title` and `permalink` --- the only per-page inputs + to the SEO computation. `site.config` (for `absoluteUrl`) and + `site.seoSiteTitle` (for the full-title composition) are already + in the shared payload. + +#### Changes + +**`seo.mjs`.** Add two exported functions. `precomputeSeo` +delegates to them: + +```js +export function computeSiteSeo(config, markdown) { + if (!markdown) { + throw new Error( + "computeSiteSeo requires a markdown-it instance"); + } + const seoSiteTitle = renderTitle(config.title, markdown); + const logo = config.logo; + const seoLogoUrl = logo != null + ? uriEscape(absoluteUrl(String(logo), config)) + : null; + return { seoSiteTitle, seoLogoUrl }; +} + +export function computeChunkSeo(pages, seoSiteTitle, config, + markdown) { + for (const page of pages) { + const rawTitle = page.frontmatter.title; + const seoTitle = isNonEmpty(rawTitle) + ? renderTitle(rawTitle, markdown) : seoSiteTitle; + page.seoTitle = seoTitle; + page.seoFullTitle = seoTitle === seoSiteTitle + ? seoTitle + : `${seoTitle} | ${seoSiteTitle}`; + const url = String(page.permalink); + const canonicalInput = url + .replace(/\/index\.html$/, "/") + .replace(/\.html$/, ""); + page.seoCanonical = absoluteUrl(canonicalInput, config); + page.seoIsHome = HOMEPAGE_URLS.has(url); + } +} + +export function precomputeSeo(pages, config, markdown) { + const { seoSiteTitle, seoLogoUrl } = + computeSiteSeo(config, markdown); + computeChunkSeo(pages, seoSiteTitle, config, markdown); + return { seoSiteTitle, seoLogoUrl }; +} +``` + +`precomputeSeo` becomes a convenience wrapper. The `seo` task that +called it is deleted, so it is effectively dead code --- retained for +dev tooling. + +**`tbdocs.mjs`.** Six changes: + +1. Import: replace `precomputeSeo` with `computeSiteSeo`: + ```js + import { computeSiteSeo } from "./seo.mjs"; + ``` + +2. Delete the `seo` task definition (current lines 414--425). + +3. Fold site-level SEO into `markdownInit.execute()`: + ```js + markdownInit: { + expected: ["discover"], + runOnMain: true, + execute(_, ctx, state) { + const linkTables = buildLinkTables(state.pages); + const baseurl = + String(state.site.config.baseurl || ""); + const staticFileSet = + new Set(state.staticFiles.map(s => s.srcRel)); + state.site.markdown = createMarkdownIt({ + highlighter: null, linkTables, baseurl, + staticFiles: staticFileSet, + }); + state.site.linkTablesSerialized = + serializeLinkTables(linkTables); + const { seoSiteTitle, seoLogoUrl } = + computeSiteSeo(state.site.config, state.site.markdown); + state.site.seoSiteTitle = seoSiteTitle; + state.site.seoLogoUrl = seoLogoUrl; + return {}; + }, + submit() {}, + }, + ``` + +4. Replace `"seo"` with `"markdownInit"` in `dispatch.expected`. + Replace the `seo: _seoSignal` destructure + `void _seoSignal` + with `markdownInit: _markdownInitSignal` + + `void _markdownInitSignal` in `dispatch.execute`: + ```js + dispatch: { + expected: ["nav", "buildInit", "buildInfo", "mermaid", + "deriveRedirects", "markdownInit"], + ... + execute({ nav: { sidebar }, + buildInit: { initData }, + buildInfo: { buildInfo }, + mermaid: { mermaidStats }, + markdownInit: _markdownInitSignal, + deriveRedirects: { stubs } }, ctx, state) { + void mermaidStats; + void _markdownInitSignal; + ... + }, + }, + ``` + The explicit edge replaces the transitive `markdownInit → seo → + dispatch` chain. `dispatch` reads `state.site.seoSiteTitle`, + `state.site.seoLogoUrl`, and `state.site.linkTablesSerialized`, + all written by `markdownInit`. + +5. Remove `seo: "Spine"` from `GANTT_SECTION`. + +6. Update the spine comment (~line 131) to remove the `→ seo` + segment. + +**`cpu-worker.mjs`.** Import `computeChunkSeo` from `./seo.mjs`. +In the `render` handler, call it between `renderPhase` and +`templatePhase` (before the offline derivation and the Phase 17 +`deriveSearchEntries` call that follows it): + +```js +async render(taskIdx) { + const offset = Atomics.load(views.payloadOffset, taskIdx); + const length = Atomics.load(views.payloadLength, taskIdx); + const chunk = JSON.parse( + new TextDecoder().decode( + new Uint8Array(_payloadSAB, offset, length)), + ); + + const env = _renderEnv; + + await renderPhase(chunk, env.site); + computeChunkSeo(chunk, env.site.seoSiteTitle, + env.site.config, env.site.markdown); + await templatePhase(chunk, env.site, env.initData); + + // ... offline derivation, _pendingFlush stash, deriveSearchEntries unchanged ... +} +``` + +#### Graph changes + +Before: +``` +discover → markdownInit → seo ──→ dispatch +discover → nav ──────────────────→ dispatch +``` + +After: +``` +discover → markdownInit ─────────→ dispatch +discover → nav ──────────────────→ dispatch +``` + +`seo` is removed from the static task DAG. A direct +`markdownInit → dispatch` edge replaces the two-hop +`markdownInit → seo → dispatch` chain, saving ~35 ms (the full +`seo` duration) from the critical path. + +#### Interaction with other phases + +- **Phase 11 (DECLINED).** Phase 11 was declined because `nav` and + `seo` both mutate page objects between discover and dispatch, + breaking the precondition for pre-serializing chunks during + discover. This phase removes `seo` as a mutator --- only `nav` + remains. The precondition is still not met, but the gap narrows. + If a future phase makes `nav` write its outputs to `state.site.*` + rather than mutating pages in place, Phase 11 becomes viable. + +- **Phase 15 (generic dynamic tasks --- DONE).** Phase 15 rewrote + `dispatch.submit()` to use the generic dynamic task API + (`allocDynamicSlots`, `writeTaskMeta`, `wireDynamicEdges`) and + replaced the JS `taskMeta` array with SAB-based metadata. + Confirmed: `dispatch.execute()` and `dispatch.expected` were not + changed by Phase 15, so the `seo` removal (replacing one + `expected` entry with `"markdownInit"` and updating the + destructure) applies cleanly. No conflict. + +- **Phase 17 (search data on workers --- DONE).** Phase 17 added + `deriveSearchEntries` to the render handler, after render + + template + offline + the `_pendingFlush` stash. Phase 18 adds + `computeChunkSeo` earlier (between `renderPhase` and + `templatePhase`). Both are independent per-page transforms; + neither depends on the other's output. Confirmed on landed code: + `computeChunkSeo` inserts at line 132 (between `renderPhase` and + `templatePhase`); the Phase 17 `deriveSearchEntries` call is at + line 185, well after the offline block. No positional conflict. + + Phase 17 also added `state.searchChunks` pre-allocation and an + indexed assignment inside `render:i.submit()` in + `dispatch.submit()`. Phase 18 does not touch `dispatch.submit()` + --- it only edits `dispatch.execute()` and `dispatch.expected`. + No interaction. + +#### Files touched + +| File | Changes | +|---|---| +| `seo.mjs` | Add `computeSiteSeo` and `computeChunkSeo` exports; refactor `precomputeSeo` to delegate (retained as dead code for dev tooling) | +| `tbdocs.mjs` | Import `computeSiteSeo` instead of `precomputeSeo`; delete `seo` task; fold site-level SEO into `markdownInit`; replace `"seo"` with `"markdownInit"` in `dispatch.expected`; update destructure; remove `seo` from `GANTT_SECTION`; update spine comment | +| `cpu-worker.mjs` | Import `computeChunkSeo` from `./seo.mjs`; call between `renderPhase` (line 131) and `templatePhase` (line 132) in the render handler. No interaction with the Phase 17 `deriveSearchEntries` call (line 185), which runs later in the handler. | + +#### Expected savings + +Two sources: + +1. **Critical-path reduction.** The `seo` task (~35 ms) is removed + from the `markdownInit` → `dispatch` path. If `seo` was the + last `dispatch` dependency to complete (likely when `nav` finishes + first), `dispatch` starts ~35 ms sooner. If `nav` was the + bottleneck, the saving is the difference between the `seo` path + and the `nav` path --- still non-negative. + +2. **Reduced chunk payload.** Four fields per page (`seoTitle`, + `seoFullTitle`, `seoCanonical`, `seoIsHome`) are no longer + serialized into the chunks. At ~150 bytes/page × ~858 pages, + this saves ~130 KB from the `packPayloads` step --- both + `JSON.stringify` CPU time and `TextEncoder` throughput. + +The per-worker cost of `computeChunkSeo` is negligible: ~54 +pages/worker × ~60 μs/page ≈ 3--4 ms per worker, absorbed within +the render fan-out. + +#### Verification + +`build.bat && check.bat` clean. The `seo` entry disappears from +the timing summary and Gantt chart. `dispatch` starts ~35 ms +sooner (visible as the gap between `markdownInit` and `dispatch` +shrinking). Rendered output byte-identical --- `headSeoBlock` in +every page produces the same `<title>`, `<meta>`, canonical, and +JSON-LD. diff --git a/builder/PLAN-scheduler-offline.md b/builder/PLAN-scheduler-offline.md new file mode 100644 index 00000000..48cd6df2 --- /dev/null +++ b/builder/PLAN-scheduler-offline.md @@ -0,0 +1,320 @@ +# Move offline HTML rewrite into render workers + +Companion to [PLAN-scheduler.md](PLAN-scheduler.md). Covers moving +the CPU-bound per-page offline URL rewrite from `writeOffline` (main +thread, ~700 ms) into the render worker fan-out, so it parallelises +across all CPUs and `writeOffline` becomes I/O-only (~200 ms). + +## Motivation + +`writeOffline` is the longest single task after `write`. Profiling +shows the time is dominated by `deriveOfflinePage` — a pure-compute +function that strips SEO metadata, rewrites every URL from absolute +to page-relative, and injects the offline search setup script. The +function reads only `page.html`, `page.destPath`, a `sitePaths` Set, +resolution caches, and `baseurl`. All of these can be made available +to workers before dispatch, without reading from `_site/`. + +The `sitePaths` set's current dependency on `_site/assets/` (theme +file enumeration in `buildSitePaths`) is artificial — the files come +from `builder/vendor/just-the-docs/assets/` (known statically) plus +two generated CSS paths that are deterministic. + +## Model assignments + +| Phase | Model | Rationale | +|-------|--------|-----------| +| I | Sonnet | Mechanical move-and-re-export. | +| II | Sonnet | Straightforward wiring with clear instructions. | +| III | Opus | Most judgement: SAB state reconstruction, nav-cache pre-pass, render handler integration. | +| IV | Sonnet | Small, well-specified signature changes. | +| V | Sonnet | Documentation updates. | + +## Phase I: Extract `offline-rewrite.mjs` + +Create `builder/offline-rewrite.mjs` with all pure-compute rewrite +functions extracted from `offline.mjs`. This isolates worker-safe +code from the I/O + acorn-dependent code that stays on main. + +### Exports from `offline-rewrite.mjs` + +- `deriveOfflinePage`, `deriveOfflinePageCached`, `sliceNavBlock` +- `deriveOfflineCss` (used by `copyOfflineThemeAssets` on main) +- `deriveOfflineRedirect` (used by `writeOfflineRedirects` on main) +- `normalizeBaseurl`, `posixDirname`, `fileDirSegsFromRel` +- `offlineExcluded`, `fnmatchPathname` +- All internal helpers: `stripSeo`, `rewriteHtml`, + `injectSearchSetup`, `rewriteCss`, `computeRelative`, + `resolveRaw`, `buildSegs`, `decode`, `computeRelUrl`, + `getPageCache`, `escapeRegExp`, regex constants + +### New function: `buildSitePathsSync` + +Synchronous version of `buildSitePaths` that takes an explicit +`themeAssetRels` array instead of walking `_site/assets/`. + +```js +function buildSitePathsSync(pages, staticFiles, excludePatterns, stubs, themeAssetRels) { + const paths = new Set(); + for (const p of pages) { + if (p.frontmatter?.layout === "book-combined") continue; + const rel = p.destPath.replaceAll("\\", "/"); + if (offlineExcluded(rel, excludePatterns)) continue; + paths.add("/" + rel); + } + for (const s of staticFiles) { + const rel = s.destRel.replaceAll("\\", "/"); + if (offlineExcluded(rel, excludePatterns)) continue; + paths.add("/" + rel); + } + for (const stub of stubs) { + const rel = stub.destPath.replaceAll("\\", "/"); + if (offlineExcluded(rel, excludePatterns)) continue; + paths.add("/" + rel); + } + for (const rel of themeAssetRels) { + if (offlineExcluded(rel, excludePatterns)) continue; + paths.add("/" + rel); + } + return paths; +} +``` + +### New function: `enumerateVendoredThemeAssets` + +Sync `readdirSync` walk of `builder/vendor/just-the-docs/assets/`. +Returns paths like `["assets/js/just-the-docs.js", +"assets/js/vendor/lunr.min.js"]`. Lives in `offline.mjs` (not +`offline-rewrite.mjs`) to keep the worker-imported module free of +`node:fs` dependencies. `dispatch.execute` imports it from +`offline.mjs`. + +### `offline.mjs` changes + +- Remove moved functions, import and re-export from + `offline-rewrite.mjs`. +- Keep all I/O functions: `writeOffline`, `buildOfflineState`, + `writeOfflinePages`, `writeOfflineRedirects`, + `copyOfflineStatics`, `copyOfflineThemeAssets`, + `setupOfflineDest`, `patchJustTheDocsJs`, `writeSearchDataJs`, + `collectThemeFiles`. +- Keep `buildSitePaths` (async, for diff-tool backward compat). + +### Verification + +`build.bat && check.bat` — byte-identical output, no behaviour +change. + +--- + +## Phase II: Expand `dispatch` and SAB payload + +### `dispatch.expected` + +Add `"mermaid"` and `"deriveRedirects"`. + +- `mermaid` ensures `state.staticFiles` includes freshly-generated + SVGs (appended in `mermaid.submit`). +- `deriveRedirects` provides the redirect stubs for `sitePaths`. +- Neither adds latency — both complete well before + `resolveBookChapters` (~600 ms into the build). + +### Emitter updates + +The scheduler requires every expected predecessor to emit to the +waiting task: + +- `mermaid.submit`: add `emit("dispatch", out)` +- `deriveRedirects.submit`: add `emit("dispatch", out)` + +### `dispatch.execute` + +After receiving `deriveRedirects: { stubs }`: + +1. Enumerate vendored theme assets via + `enumerateVendoredThemeAssets()`. +2. Append the two known generated-CSS paths + (`assets/css/tb-highlight.css`, + `assets/css/just-the-docs-combined.css`). +3. Call `buildSitePathsSync(state.pages, state.staticFiles, + excludePatterns, stubs, themeAssetRels)`. +4. Stash `sitePaths` on `state` for later use by `writeOffline`. +5. Compute `skipOffline` from config / CLI opts. + +### SAB payload + +Three new fields in the `shared` object: + +```js +{ + ...existing, + sitePathsArr: [...sitePaths], // ~1080 strings, ~30-50 KB + offlineExcludePatterns: [...], // from config + skipOffline: Boolean, // from --no-offline / config +} +``` + +### Verification + +Build succeeds, workers receive the expanded SAB, offline output +unchanged (workers don't use the new data yet). + +--- + +## Phase III: Worker offline rewrite + +### `cpu-worker.mjs` render handler + +Import from `offline-rewrite.mjs`: `deriveOfflinePage`, +`deriveOfflinePageCached`, `sliceNavBlock`, `normalizeBaseurl`, +`posixDirname`. + +After `templatePhase`, if `!skipOffline`: + +1. Build per-worker offline state: `new Set(sitePathsArr)`, fresh + caches, normalized baseurl. +2. Run the nav-cache pre-pass (group chunk pages by dest dir, derive + the first page per dir, cache nav block slices) — same logic as + current `writeOfflinePages` lines 207-223. +3. Set `offlineState.navCache = navCache` so + `deriveOfflinePageCached` can find it via `deps.navCache`. +4. Call `deriveOfflinePageCached` per writable page, storing + `offlineHtml` and `offlineMisses` on the page object. + +Return delta gains two fields: + +```js +{ destPath, renderedContent, html, offlineHtml, offlineMisses } +``` + +When `skipOffline` is true, the entire offline pass is skipped — no +Set construction, no rewriting, `offlineHtml` is `undefined`. + +### `render:i.submit` in `tbdocs.mjs` + +Merge `offlineHtml` and `offlineMisses` onto master pages alongside +the existing fields. + +### Nav-cache and cross-chunk dedup cost + +Works per-chunk. Pages in the same directory within a chunk share the +cache. Cross-chunk directories build their cache independently — +correct but slightly less efficient. The cache is an optimization, +not a correctness dependency. + +Two cache systems are affected by per-worker isolation: + +**Nav-cache** (per-directory sidebar substitution). The sidebar nav +block is ~80 KB, byte-identical across every page before rewrite. +The nav-cache runs `deriveOfflinePage` on the first page per +destination directory, stashes the pre/post-rewrite nav block, and +substitutes it directly for subsequent pages — avoiding re-running +the regex over 80 KB per page. With per-worker caches, a directory +that spans a chunk boundary gets its first-page rewrite done +independently in both chunks. Cost per extra rewrite: ~0.24 ms +(~200 ms / 837 pages, from the comment at `offline.mjs:189`). With +16 workers there are 15 chunk boundaries; worst case 15 directories +are split — 15 extra nav-block rewrites at 0.24 ms = ~4 ms total. +There are ~200 unique destination directories. Current single- +threaded nav-cache: 200 full rewrites. Per-worker: 200 + 15 = 215. + +**URL resolution caches** (`rawResolution`, `seg`, `result`). These +cache the resolved form of each unique URL so it isn't re-resolved +for a later page. With per-worker caches, each worker resolves URLs +independently. But the nav-cache already eliminates the dominant +source of shared URLs — the ~800 sidebar links are cached as a +block, not resolved individually. The remaining per-page body URLs +are ~5-20 links per page, many unique to that page. The common ones +(links to frequently-referenced symbols) might total ~2,000 unique +URLs across the site. Each resolution is a Set lookup + string +manipulation — ~1 us. Even if every worker re-resolves all 2,000: +16 workers x 2,000 x 1 us = ~32 ms total, spread across workers +running in parallel. + +**Net impact:** ~35 ms of redundant work total, spread across 16 +parallel workers — ~2 ms added wall-clock. Noise against the +~500 ms saved by parallelisation. + +### Transfer cost + +Roughly doubles the render delta size (adding `offlineHtml` per +page). Estimated +40 ms per worker at structured-clone throughput. +Bounded and acceptable. + +### Verification + +`page.offlineHtml` is populated on all pages. Existing `writeOffline` +still runs its own CPU path (redundant but correct). Output +identical. + +--- + +## Phase IV: Switch `writeOffline` to pre-computed HTML + +### `writeOfflinePages` in `offline.mjs` + +Add a `precomputed` option: + +- When true: skip `deriveOfflinePage` / nav-cache entirely, write + `page.offlineHtml` directly (I/O only). +- When false: existing CPU-bound path (kept for diff tools). + +### `buildOfflineState` + +Add optional `sitePaths` parameter: + +- When provided: skip the async `buildSitePaths` call (avoids the + `_site/assets/` walk). +- When absent: existing async path (for diff tools). + +### `writeOffline` task in `tbdocs.mjs` + +Pass both options: + +```js +return writeOffline(state.pages, state.staticFiles, state.site, ctx.destRoot, { + auxStats, + precomputed: true, + sitePaths: state.sitePaths, +}); +``` + +### Verification + +`build.bat && check.bat` — byte-identical offline output. + +Compare `_site-offline/` output byte-for-byte against a baseline +built before Phase I. The offline tree must be identical. + +Timing: `writeOffline` should drop from ~700 ms to ~200-300 ms. The +render worker times will increase modestly (~50-100 ms each) to +absorb the rewrite work. + +--- + +## Phase V: Documentation + +Update: + +- `builder/PLAN-scheduler.md` — dispatch dependencies, dataflow + diagram, render delta shape. +- `docs/Documentation/Builder.md` — offline build timing, structural + win description. +- `docs/Documentation/Pipeline-Stages.md` — `writeOffline` signature, + `offline-rewrite.mjs` exports. +- `docs/assets/images/mmd/scheduler-dag.mmd` — edges from + `mermaid` / `deriveRedirects` to `dispatch`. +- `offline.mjs` header comment — note the extraction to + `offline-rewrite.mjs`. + +--- + +## Files to modify + +| File | Changes | +|------|---------| +| `builder/offline-rewrite.mjs` | **New.** Pure-compute rewrite functions + `buildSitePathsSync` + `enumerateVendoredThemeAssets`. | +| `builder/offline.mjs` | Remove moved functions, re-export from `offline-rewrite.mjs`. Add `precomputed` path to `writeOfflinePages`. Add `sitePaths` option to `buildOfflineState`. | +| `builder/cpu-worker.mjs` | Import from `offline-rewrite.mjs`. Add offline rewrite pass after `templatePhase`. Expand return delta. | +| `builder/tbdocs.mjs` | `dispatch`: add deps, compute sitePaths, expand SAB. `render:i.submit`: merge offlineHtml. `writeOffline` task: pass `precomputed` + `sitePaths`. Emitter updates for `mermaid.submit` and `deriveRedirects.submit`. | +| `builder/sab-broadcast.mjs` | No changes — existing JSON serialize/deserialize handles the expanded payload. | diff --git a/builder/PLAN-scheduler.md b/builder/PLAN-scheduler.md new file mode 100644 index 00000000..bcad6679 --- /dev/null +++ b/builder/PLAN-scheduler.md @@ -0,0 +1,1564 @@ +# Task-graph scheduler -- design sketch + +## Current state + +The build pipeline lives in `builder/`. The orchestrator is +`tbdocs.mjs`'s `runBuild()`, which today is a mostly-linear sequence of +awaited async calls on the main thread, with a sprinkling of +cooperative concurrency (`Promise.all` barriers and one background +`buildInfoPromise`). There are **no worker threads**; every CPU-bound +phase blocks the main thread. + +An earlier round of parallelization work (a `CheckPool` of persistent +link-checker workers and a bespoke `render-worker.mjs` driven by a +hand-coded message protocol) was reverted; this plan replaces both +with one task-graph abstraction. + +### Key files + +| File | Role | +|---|---| +| `tbdocs.mjs` | Orchestrator: `runBuild()`, CLI parsing, summary output | +| `template.mjs` | `templatePhase()`, internal `buildInit()` | +| `render.mjs` | `renderPhase()`, `createMarkdownIt()`, `buildLinkTables()` | +| `highlight.mjs` | `initHighlighter()` -- Shiki WASM init | +| `discover.mjs` | `discover()` -- walk source tree, parse frontmatter | +| `nav.mjs` | `computeNav()` -- build nav tree from pages | +| `seo.mjs` | `precomputeSeo()` -- derive SEO titles/URLs | +| `book.mjs` | `resolveBookChapters()` -- resolve book chapter list | +| `build-info.mjs` | `captureBuildInfo()` -- git rev-parse/log | +| `scss.mjs` | `compileScss()` -- sass compilation | +| `mermaid.mjs` | `regenerateMermaid()` -- stale SVG regen via puppeteer | +| `data.mjs` | `loadData()` -- load `_book.yml` | +| `write.mjs` | `writePhase()` -- write pages + static files to `_site/` | +| `redirects.mjs` | `deriveRedirectStubs()`, `writeRedirects()` | +| `sitemap.mjs` | `deriveSitemapUrls()`, `writeSitemap()` | +| `search.mjs` | `writeSearchData()` | +| `offline.mjs` | `writeOffline()` -- produce `_site-offline/` | +| `pdf.mjs` | `writePdf()` -- produce `_site-pdf/` | +| `serve.mjs` | `runServe()` -- dev server with watcher + rebuild | + +There is no `render-worker.mjs`, no `cpu-worker.mjs`, no `CheckPool`, +no `createRenderPool()`. The build is single-threaded except for I/O. + +### Current dataflow + +`runBuild()` reads top-to-bottom; the only off-main-thread work is the +git shell-outs inside `captureBuildInfo()` (launched as a background +promise and awaited later). Approximate wall-clock numbers from a +recent clean build are noted in parentheses: + +``` +mermaid (~2 ms; ~150 ms when SVGs regenerate) + ↓ +scss (~700 ms — CPU-bound on main thread) + ↓ +load _config.yml + apply CLI overrides + ↓ +buildInfoPromise = captureBuildInfo() (background: git shell-outs) + ↓ +discover (~135 ms — fs traversal + frontmatter parse) + ↓ +nav (~8 ms) + ↓ +initHighlighter (~50–100 ms — Shiki WASM init; overlaps with buildInfo) + ↓ +buildLinkTables + createMarkdownIt + precomputeSeo + loadData ++ resolveBookChapters (~110 ms together — "markdown-init" + "seo" + "book") + ↓ +await buildInfoPromise (~0 ms; usually already settled) + ↓ +renderPhase (~2700 ms — CPU-bound; cooperative Promise.all over pages) + ↓ +templatePhase (~800 ms — CPU-bound; same shape) + ↓ +writePhase + ├─ Promise.all { writePages | copyTheme | copyStaticFiles } + └─ writeGeneratedAssets (~625 ms total) + ↓ +Promise.all { writeRedirects, writeSitemap, writeSearchData } (~200 ms) + ↓ +writeOffline (~1100 ms) + ↓ +writePdf (~240 ms) +``` + +Wall-clock total is roughly **6.7 seconds**. The dominant terms are +`render` (~2.7 s), `writeOffline` (~1.1 s), `template` (~0.8 s), +`scss` (~0.7 s), and `write` (~0.6 s). + +Visible idle time on the main thread: + +- `scss` and `mermaid` run before `discover`, both serially. Neither + depends on `discover`'s output. +- `buildInfo`'s git shell-outs already overlap with the + discover/nav/markdown-init/seo chain, but everything *else* in that + chain runs serially even though `discover` → `nav` → + `markdown-init` → `seo` → `book` is the only true dependency edge. +- `writeOffline` and `writePdf` are independent of each other; both + read `_site/` (already written by `writePhase`) and write into + independent output trees. They run sequentially today. +- `renderPhase` and `templatePhase` are CPU-bound and block the main + thread completely. `Promise.all(pages.map(...))` is cooperative + concurrency only -- it interleaves on a single thread. + +### Data shapes and in-place mutation + +The two large structures that flow through the pipeline: + +**`pages[]`** -- array of ~857 page objects. After discover, each has: +``` +{ srcPath, srcRel, ext, frontmatter, rawContent, permalink, destPath, + layoutDefault, imageScope } +``` +Later phases mutate **in place**, adding: `navPath`, `breadcrumbs`, +`children`, `navLevels` (nav); `seoTitle`, `seoFullTitle`, +`seoCanonical`, `seoIsHome` (seo); `renderedContent` (render); +`html` (template). The current pipeline relies on this in-place +enrichment -- every consumer assumes the same page object accumulates +fields as it flows through phases. + +**`staticFiles[]`** -- array of ~214 static file descriptors: +``` +{ srcPath, srcRel, destRel, size } +``` + +**`config`** -- the parsed `_config.yml` object. Small, ~30 keys. +Read-only after initial CLI override merges. + +**`navTree`** -- array of `NavNode` objects (recursive tree). ~857 +nodes total. Only consumed by `buildInit()` → `renderSidebar()`. + +The mutation-in-place pattern matters for the scheduler design: +mutations performed on a worker's structured-clone copy do not reach +the main-thread master unless explicitly merged. See §Page deltas +below. + +## Setup + +No new runtime dependencies. The scheduler, the worker pool, and the +worker dispatcher are all in-tree code -- ~150 LOC for the scheduler, +~50 LOC for `WorkerPool`, ~30 LOC for the worker dispatcher and its +handler table. A general-purpose pool library like **piscina** was +considered but the project's use is narrow enough (fixed pool size, +one task per worker at a time, no recycling, no dynamic scaling, no +abort signals) that the dependency cost outweighs the saved code, and +an added dep widens the supply-chain attack surface. + +## Model + +The build is a DAG of **tasks**. Each task has a unique string ID, +takes an input map `{ [predecessorId]: output }`, produces an immutable +output, and declares which downstream tasks receive (slices of) that +output. + +The **scheduler** lives on the main thread. It tracks task +dependencies, decides what's ready, and dispatches. The worker pool +(`WorkerPool` -- a ~50 LOC in-tree class wrapping +`node:worker_threads`) handles everything below the task-graph layer: +spawning workers at construction, named dispatch, idle/busy +bookkeeping, lifecycle. + +Each task carries a `runOnMain: true` flag if it must execute on the +main thread -- for tasks that own the master `pages[]` merge, mutate +state in place, or do I/O that coordinates with main-thread state. +All other tasks run on a worker, dispatched by name. + +``` +┌─────────────────────────────────────────────────┐ +│ Main thread (scheduler) │ +│ │ +│ tasks: Map<taskId, TaskDef> │ +│ pending: Map<taskId, {expected, received}> │ +│ ready: TaskDef[] │ +│ results: Map<taskId, output> │ +│ state: SharedState (§Shared state) │ +│ │ +│ on task complete: │ +│ store result, run task.submit() to route │ +│ output to downstream tasks, check newly │ +│ ready, dispatch each to pool or main │ +└────────────┬────────────────────────────────────┘ + │ + ▼ WorkerPool ── named handlers in cpu-worker.mjs +``` + +## Task placement (main vs worker) + +Mutating a worker's local `pages[]` doesn't reach `state.pages` unless +the mutation is explicitly shipped back as a delta. The current code +mutates pages in place across nav / seo / render / template, so every +mutating step must be modeled deliberately. The cheap way out is: +**keep small mutating steps on the main thread** and only ship pages +across the boundary when there's a real CPU win to amortize the copy. + +The split: + +| Task | Placement | Why | +|---|---|---| +| `config` | M | Trivial fs read, no benefit to round-trip | +| `discover` | M | ~135 ms; output mutates pages[] in place | +| `nav` | M | ~8 ms; mutates pages with navPath/navLevels/breadcrumbs/children | +| `markdownInit` | M | ~63 ms; produces an `md` instance (NOT serializable -- can't cross boundary) | +| `seo` | M | ~34 ms; mutates pages with seoXxx fields | +| `loadData` | M | Trivial fs read | +| `resolveBookChapters` | M | Mutates `state.site.bookData._chapters` with refs into `state.pages` -- identity-critical | +| `buildInit` | M | Tiny; consumes navTree, produces ~50 KB of html strings | +| `deriveRedirects` | M | Pure compute, ~ms | +| `deriveSitemap` | M | Pure compute, ~ms | +| `dispatch` | M | Slices `state.pages` into render chunks; no benefit on a worker | +| `renderJoin` | M | Pure barrier | +| `write` | M | Owns `state.pages` + `state.staticFiles` reads; I/O dominated | +| `searchData` | M | ~40 ms; owns pages read | +| `writeAux` | M | Owns pages + bookData reads | +| `writeOffline` | M | I/O dominated (~1.1 s); see §Post-write tasks for the workerize-or-not call | +| `writePdf` | M | I/O dominated (~240 ms); ditto | +| `buildInfo` | **W** | Free overlap with the main spine | +| `scss` | **W** | ~700 ms -- the biggest seed-task parallelism win | +| `mermaid` | **W** | ~2 ms idle, ~150 ms when SVGs regen; runs concurrently with discover | +| `render:i` | **W** | The big win -- ~2.7 s of CPU work fans out across N cores | + +The worker side ships with four handlers: `scss`, `mermaid`, +`buildInfo`, `render`. Plus the parentPort dispatcher (~15 LOC). +Everything else is plain main-thread code wrapped in a task envelope. + +This keeps `pages[]` from crossing the worker boundary except for the +render fan-out (which only ships per-page slices, not the master). +The seo / nav / markdown-init mutations stay on the main-thread +master directly -- no delta merge needed. + +## Target dataflow + +`[W]` = pool worker; `[M]` = main thread (`runOnMain: true`). + +``` +Seeds (concurrent): + buildInfo [W] ──────────────────────────────────────────┐ + scss [W] ──────────────────────────────────────────┤ + mermaid [W] ──────────────────────────────────────────┤ + prepDest [M] ──────────────────────────────────────────┤ + │ +Main spine (sequential, on M): │ + config │ + └─→ discover ──┬─→ deriveRedirects ─────────────────┐ │ + ├─→ deriveSitemap ─────────────────┤ │ + └─→ nav │ │ + └─→ markdownInit │ │ + ├─→ seo │ │ + │ └─→ loadData │ │ + │ └─→ resolveBookChapters + │ ↓ │ │ + └─→ buildInit │ │ │ + ↓ │ │ │ + └──────────┴─→ dispatch ◄── buildInfo, mermaid, deriveRedirects join here + │ +Render fan-out (workers, concurrent): │ + ┌────────────────────────────────────────────────────┘ + │ + render:0 [W] render:1 [W] ... render:N-1 [W] + │ + ▼ + renderJoin [M] ◄── waits for all render:i + │ +Write fence: │ scss [W], mermaid [W], prepDest [M] join here too + ▼ + write [M] ◄── reads state.pages, state.staticFiles + │ + (in parallel with write:) │ + │ + renderJoin + prepDest │ + │ │ + ▼ │ + searchData [M] │ + │ │ + └──────────────────────────────────┤ + │ + writeAux [M] ◄── derived redirects + sitemap join here too + │ + ▼ + writeOffline [M] + + (in parallel with write → ... → writeOffline:) + + renderJoin + mermaid + │ + ▼ + writePdf [M] + │ │ + └─────────────┬─────────────┘ + ▼ + done +``` + +Edges into `dispatch`: `buildInit`, `resolveBookChapters`, +`buildInfo`, `mermaid`, `deriveRedirects`. +Edges into `write`: `renderJoin`, `scss`, `mermaid`, `prepDest`. +Edges into `searchData`: `renderJoin`, `prepDest`. +Edges into `writePdf`: `renderJoin`, `mermaid`. +Edges into `writeAux`: `write`, `searchData`, `deriveRedirects`, `deriveSitemap`. + +Three structural wins over the serial baseline: + +1. **`scss`, `mermaid`, `buildInfo` overlap with the main spine.** The + main spine (discover → nav → markdownInit → seo → loadData → + resolveBookChapters + buildInit) takes ~250 ms total. `scss` takes + ~700 ms. The overlap saves ~250 ms of `scss` from the critical path + (not the full ~700 ms -- after the spine finishes, ~450 ms of scss + is still on the critical path until `write` runs). +2. **`render:0..N` fans out across CPUs.** Today's ~2.7 s of cooperative + render + ~0.8 s template = ~3.5 s of CPU work, all on one thread. + Across N workers this compresses to ~`3500 / N + dispatch overhead`. + On a 4-core box, ~875 ms wall-clock (saving ~2.6 s). On an 8-core + box, ~440 ms (saving ~3 s). Dispatch overhead is ~50 ms. +3. **`writeOffline` and `writePdf` overlap on async I/O.** Both stay + `runOnMain` initially -- they share the main thread for their CPU + sections but `await fs.writeFile`-style I/O windows interleave. The + gain is the shorter of their two CPU sections (~240 ms). See + §Post-write tasks for the case to workerize one of them later. + +**Mermaid → staticFiles ordering.** Under the scheduler `mermaid` and +`discover` run in parallel, so freshly-emitted SVGs aren't in +`state.staticFiles` after discover. The mermaid task's `execute()` +(running on a worker) does the full `fs.stat` for each managed SVG +and returns a list of `{ srcPath, srcRel, destRel, size }` descriptors; +the `submit()` does only a **synchronous** push into +`state.staticFiles`. Putting the stat in `submit()` would race with +downstream consumers since `submit` is called synchronously by the +scheduler and cannot await. + +## Page deltas (mutation merge pattern) + +The render fan-out is the only place where pages cross the worker +boundary. The pattern: + +- Each `render:i` task receives a chunk of pages (worker's clone). +- The worker mutates its local pages with `renderedContent`, `html`, and (when `!skipOffline`) `offlineHtml` + `offlineMisses`. +- The task returns a **delta**: an array of + `[{ destPath, renderedContent, html, offlineHtml, offlineMisses }]` -- only the changed fields, + keyed by `destPath`. +- `render:i.submit()` walks the delta on the main thread, looks up + each page via `state.pageByDest`, and assigns the fields onto the + master page object. + +The full pages array never crosses back across the boundary; only the +output deltas do. `state.pageByDest` is built once in +`discover.submit()`: + +```js +discover.submit(out, emit, state) { + state.pages = out.pages; + state.staticFiles = out.staticFiles; + state.site.config = out.config; + for (const p of out.pages) state.pageByDest.set(p.destPath, p); + emit("nav", out); + emit("deriveRedirects", out); + emit("deriveSitemap", out); +} +``` + +After this initial assignment, `state.pages` is mutated in place; +no task ever replaces it. `state.pageByDest` stays valid for the +whole build. + +For tasks that run on `[M]` (nav, seo, etc.), the mutation is direct +on `state.pages` -- no delta needed. + +## Task definition + +Each task is a plain object: + +- **`expected`**: array of predecessor task IDs. The scheduler runs + the task only when every expected ID has submitted its output. An + empty array means a seed task (dispatchable immediately). +- **`handler`** *(optional, worker tasks)*: the worker dispatcher's + named handler. Defaults to the task's own ID. Used so multiple task + IDs can share one worker function + (e.g. `render:0`, `render:1`, ... → `"render"`). +- **`runOnMain: true`** *(optional)*: execute on the main thread + instead of dispatching to the pool. The `execute()` function + receives `(inputs, ctx, state)` -- where `state` is the + `SharedState` instance -- and may mutate it. +- **`execute(inputs, ctx [, state])`**: runs the task body. On a + worker, runs as the dispatch table's named handler. On main, runs + synchronously through the scheduler. Returns an output value. +- **`submit(output, emit [, state, scheduler])`**: runs **synchronously** + on the main thread after `execute` resolves. Calls + `emit(targetTaskId, dataSlice)` to route (slices of) the output to + downstream tasks. May mutate `state`. May not perform async work -- + see §Scheduler core. The optional `scheduler` arg is used only by + tasks that dynamically register downstream tasks (see `dispatch`). + +Representative task defs: + +```js +const TASKS = { + config: { + expected: [], + runOnMain: true, + async execute(_, ctx) { + const text = await fs.readFile( + path.join(ctx.srcRoot, "_config.yml"), "utf8"); + const config = yaml.load(text); + if (ctx.opts.baseurl != null) config.baseurl = ctx.opts.baseurl; + if (ctx.opts.url != null) config.url = ctx.opts.url; + return { config }; + }, + submit(out, emit) { emit("discover", out); }, + }, + + buildInfo: { + expected: [], + async execute() { return { buildInfo: await captureBuildInfo() }; }, + submit(out, emit) { emit("dispatch", out); }, + }, + + scss: { + expected: [], + async execute(_, ctx) { return { scssResult: await compileScss(ctx.srcRoot) }; }, + submit(out, emit) { emit("write", out); }, + }, + + mermaid: { + expected: [], + async execute(_, ctx) { + // The worker stats every managed SVG and returns full descriptors. + // Stat-in-submit on main would race with downstream readers. + const stats = await regenerateMermaid(ctx.srcRoot); + // stats.svgFiles: [{ srcPath, srcRel, destRel, size }, ...] + return { mermaidStats: stats }; + }, + submit(out, emit, state) { + const known = new Set(state.staticFiles.map((f) => f.srcRel)); + for (const f of out.mermaidStats.svgFiles ?? []) { + if (!known.has(f.srcRel)) state.staticFiles.push(f); + } + emit("write", out); + emit("dispatch", out); + }, + }, + + discover: { + expected: ["config"], + runOnMain: true, + async execute({ config: { config } }, ctx) { + const { pages, staticFiles } = await discover( + ctx.srcRoot, config.exclude ?? []); + return { pages, staticFiles, config }; + }, + submit(out, emit, state) { + state.pages = out.pages; + state.staticFiles = out.staticFiles; + state.site.config = out.config; + for (const p of out.pages) state.pageByDest.set(p.destPath, p); + emit("nav", out); + emit("deriveRedirects", out); + emit("deriveSitemap", out); + }, + }, + + nav: { + expected: ["discover"], + runOnMain: true, + execute(_, ctx, state) { + const { navTree } = computeNav(state.pages, state.site.config); + state.site.navTree = navTree; + return {}; // mutates state in place + }, + submit(_, emit) { + emit("markdownInit", {}); + emit("buildInit", {}); + }, + }, + + buildInit: { + expected: ["nav"], + runOnMain: true, + execute(_, ctx, state) { + // buildInit() takes site.config + site.navTree; returns the + // ~50 KB of pre-rendered sidebar + header + svg-sprite HTML used + // by templatePhase. + return { initData: buildInitFn(state.site) }; + }, + submit(out, emit) { emit("dispatch", out); }, + }, + + markdownInit: { + expected: ["nav"], + runOnMain: true, + async execute(_, ctx, state) { + // Main's own initHighlighter cache -- workers maintain theirs + // independently. Both call paths converge on the same Shiki + // initialisation work, but the singletons are per-thread. + const highlighter = await initHighlighter(); + const linkTables = buildLinkTables(state.pages); + const baseurl = String(state.site.config.baseurl || ""); + const staticFileSet = new Set(state.staticFiles.map(s => s.srcRel)); + state.site.highlighter = highlighter; // write reads .themeCss from here + state.site.markdown = createMarkdownIt({ + highlighter, linkTables, baseurl, staticFiles: staticFileSet, + }); + // linkTables travels to render workers as a serialized payload. + state.site.linkTablesSerialized = serializeLinkTables(linkTables); + return {}; + }, + submit(_, emit) { + emit("seo", {}); + emit("loadData", {}); + }, + }, + + seo: { + expected: ["markdownInit"], + runOnMain: true, + execute(_, ctx, state) { + const { seoSiteTitle, seoLogoUrl } = precomputeSeo( + state.pages, state.site.config, state.site.markdown); + state.site.seoSiteTitle = seoSiteTitle; + state.site.seoLogoUrl = seoLogoUrl; + return {}; + }, + submit(_, emit) { emit("resolveBookChapters", {}); }, + }, + + loadData: { + expected: ["markdownInit"], + runOnMain: true, + async execute(_, ctx, state) { + const data = await loadData(ctx.srcRoot); + state.site.data = data; + state.site.bookData = data.book ?? null; + return {}; + }, + submit(_, emit) { emit("resolveBookChapters", {}); }, + }, + + resolveBookChapters: { + expected: ["seo", "loadData"], + runOnMain: true, + execute(_, ctx, state) { + // Mutates state.site.bookData with _chapters arrays whose + // entries are refs into state.pages. Identity-critical: + // render:i.submit() merges renderedContent into those same + // page objects, so writePdf later sees the rendered bodies + // via bookData._chapters[i].renderedContent. + resolveBookChapters(state.site.bookData, state.pages); + return {}; + }, + submit(_, emit) { emit("dispatch", {}); }, + }, + + deriveRedirects: { + expected: ["discover"], + runOnMain: true, + execute(_, ctx, state) { + // redirects.mjs's deriveRedirectStubs uses a layout-based + // filter (layout !== "book-combined") rather than checking + // page.html, so the derive can run before template. + return { stubs: deriveRedirectStubs(state.pages, state.site) }; + }, + submit(out, emit) { + emit("writeAux", out); + emit("dispatch", out); + }, + }, + + deriveSitemap: { + expected: ["discover"], + runOnMain: true, + execute(_, ctx, state) { + return { urls: deriveSitemapUrls(state.pages, state.site) }; + }, + submit(out, emit) { emit("writeAux", out); }, + }, + + dispatch: { + expected: ["buildInit", "resolveBookChapters", "buildInfo", "mermaid", "deriveRedirects"], + runOnMain: true, + execute({ buildInit: { initData }, buildInfo: { buildInfo }, deriveRedirects: { stubs } }, ctx, state) { + // Read pages directly from state.pages -- main-thread access, + // no need to ship them through the input map. + const chunks = chunkPages(state.pages, ctx.workerCount); + const excludePatterns = state.site.config.offline_exclude ?? []; + const skipOffline = /* from config / CLI opts */ false; + const sitePaths = buildSitePathsSync( + state.pages, state.staticFiles, excludePatterns, stubs, + enumerateVendoredThemeAssets()); + state.sitePaths = sitePaths; + + const shared = { + siteData: { + config: state.site.config, + seoSiteTitle: state.site.seoSiteTitle, + seoLogoUrl: state.site.seoLogoUrl, + }, + initData, buildInfo, + linkTablesData: state.site.linkTablesSerialized, + staticFilesArr: state.staticFiles.map(f => f.srcRel), + baseurl: String(state.site.config.baseurl || ""), + sitePathsArr: [...sitePaths], + offlineExcludePatterns: excludePatterns, + skipOffline: Boolean(skipOffline), + }; + // Pack the shared payload into a SharedArrayBuffer so each + // postMessage sends a SAB reference (shared memory) instead of + // structured-cloning ~310--330 KB per worker. + const sharedSAB = packShared(shared); + return { chunks, sharedSAB }; + }, + submit(out, emit, _state, scheduler) { + const N = out.chunks.length; + + // Register the barrier with the dynamic predecessor count. + // write declares "renderJoin" statically; emit() looks up + // pending entries by id, not by source, so the static + // declaration is satisfied as soon as renderJoin submits. + scheduler.register("renderJoin", { + expected: Array.from({ length: N }, (_, i) => `render:${i}`), + runOnMain: true, + execute() { return {}; }, + submit(_, emit) { emit("write", {}); }, + }); + + for (let i = 0; i < N; i++) { + const id = `render:${i}`; + scheduler.register(id, { + expected: [], + handler: "render", + submit(renderOut, emit, state) { + for (const r of renderOut) { + const p = state.pageByDest.get(r.destPath); + if (!p) continue; + p.renderedContent = r.renderedContent; + if (r.html !== undefined) p.html = r.html; + if (r.offlineHtml !== undefined) p.offlineHtml = r.offlineHtml; + if (r.offlineMisses !== undefined) p.offlineMisses = r.offlineMisses; + } + emit("renderJoin", renderOut); + }, + }); + scheduler.seed(id, { + sharedSAB: out.sharedSAB, + chunk: out.chunks[i], + }); + } + }, + }, +}; +``` + +`chunkPages` rounds up to keep all chunks non-empty when there are +fewer pages than workers (e.g. dry-run paths or future incremental +builds): + +```js +function chunkPages(pages, workers) { + const n = Math.min(workers, pages.length); // never more chunks than pages + if (n === 0) return []; + const size = Math.ceil(pages.length / n); + const chunks = []; + for (let i = 0; i < pages.length; i += size) chunks.push(pages.slice(i, i + size)); + return chunks; +} +``` + +Two non-obvious bits in `dispatch.submit`: + +1. **Dynamic registration.** `dispatch` doesn't know N at definition + time, so it calls `scheduler.register(taskId, def)` per chunk plus + one for `renderJoin`. +2. **Why `renderJoin` exists at all.** Each `render:i.submit()` already + emits into the page-deltas merge, and could emit directly to + `write`. But `write.expected` is declared statically with + `["renderJoin", "scss", "mermaid", "prepDest"]` -- mutating it from + `dispatch.submit` to add the N dynamic render predecessors would be + awkward. The barrier is the cleaner expression: register it once + with the right count, let write keep its static `expected`. + +## Shared state + +```js +class SharedState { + pages = []; // master copy; mutated in place by [M] tasks and by render delta merges + staticFiles = []; // master copy; mermaid.submit appends new SVG descriptors + site = {}; // config, navTree, seoSiteTitle, seoLogoUrl, bookData, data, markdown, ... + pageByDest = new Map(); // destPath → page; built once in discover.submit +} +``` + +**After the initial discover.submit assignment, `state.pages` is +never replaced** -- only mutated in place. Every phase that adds +fields to pages does so on the same object identities, which is what +keeps `bookData._chapters` refs (set by `resolveBookChapters`) +pointing at the rendered pages by the time `writePdf` walks them. + +Worker tasks receive structured-clone snapshots of whatever input they +need -- they cannot see the master and cannot mutate it. Their +`submit()` runs on the main thread, where it merges the worker's +output (a delta keyed by `destPath` for page mutations) into `state`. + +This is the explicit form of what today's `runBuild` does implicitly +through closure mutation. Making it explicit lets `serve.mjs` re-use +the scheduler across rebuilds without leaking state, and gives +post-write tasks a clean read path. + +## Scheduler core + +The scheduler is a thin coordinator. The pool is constructed externally +and passed in. + +```js +class Scheduler { + constructor({ pool, tasks }) { + this.pool = pool; // WorkerPool instance + this.tasks = new Map(Object.entries(tasks)); + this.pending = new Map(); + this.ready = []; + this.results = new Map(); + this.timings = new Map(); + this.state = new SharedState(); + this.inFlight = 0; + [this._doneP, this._doneResolve, this._doneReject] = deferred(); + for (const [id, def] of this.tasks) this._initPending(id, def); + } + + _initPending(id, def) { + this.pending.set(id, { expected: def.expected.length, received: new Map() }); + } + + register(id, def) { this.tasks.set(id, def); this._initPending(id, def); } + + // Seed a freshly-registered task directly (used by dispatch.submit + // to feed each render:i its chunk without going through emit()). + seed(id, inputs) { + const def = this.tasks.get(id); + this.pending.delete(id); + this.ready.push({ id, def, inputs }); + this._flush(); + } + + emit(targetId, data, sourceId) { + const entry = this.pending.get(targetId); + if (!entry) throw new Error(`unknown or already-dispatched task: ${targetId}`); + entry.received.set(sourceId, data); + if (entry.received.size === entry.expected) { + this.pending.delete(targetId); + const def = this.tasks.get(targetId); + this.ready.push({ id: targetId, def, inputs: Object.fromEntries(entry.received) }); + this._flush(); + } + } + + async start(ctx) { + this._ctx = ctx; + for (const [id, def] of this.tasks) { + if (def.expected.length === 0) this.ready.push({ id, def, inputs: {} }); + } + this._flush(); + return this._doneP; + } + + _flush() { + while (this.ready.length > 0) this._run(this.ready.shift()); + } + + _run(task) { + const start = Date.now(); + this.inFlight++; + const p = task.def.runOnMain + ? Promise.resolve(task.def.execute(task.inputs, this._ctx, this.state)) + : this.pool.run({ inputs: task.inputs, ctx: this._ctx }, + { name: task.def.handler ?? task.id }); + p.then( + (output) => this._onDone(task, output, start), + (err) => this._onError(task, err), + ); + } + + _onDone(task, output, start) { + this.timings.set(task.id, { start, end: Date.now() }); + this.results.set(task.id, output); + this.inFlight--; + // submit() is invoked synchronously. It must not return a Promise + // (or, if it does, must not race with the emits it makes). Async + // work belongs in execute(). + task.def.submit( + output, + (tgt, data) => this.emit(tgt, data, task.id), + this.state, + this, + ); + if (this.inFlight === 0 && this.ready.length === 0 && this.pending.size === 0) { + this._doneResolve(this.results); + } + } + + _onError(task, err) { + this._doneReject(new Error(`task ${task.id} failed`, { cause: err })); + } + + summary() { + return [...this.timings.entries()] + .sort((a, b) => a[1].start - b[1].start) + .map(([id, { start, end }]) => `${id}=${end - start}ms`) + .join(" "); + } +} + +function deferred() { + let res, rej; + const p = new Promise((r1, r2) => { res = r1; rej = r2; }); + return [p, res, rej]; +} +``` + +The `WorkerPool` instance is constructed by `runBuild()` and injected +into the scheduler; the scheduler never sees `worker_threads` +directly. + +## Worker pool + +A minimal pool over `node:worker_threads`. One file, ~50 LOC. Spawns +`size` workers eagerly at construction (so WASM warmup overlaps with +seed-task work; see §Boot sequence), routes named tasks to whichever +worker is idle, queues the rest. No dynamic scaling, no recycling. + +```js +// builder/worker-pool.mjs + +import { Worker } from "node:worker_threads"; + +export class WorkerPool { + constructor(size, workerUrl) { + this._workerUrl = workerUrl; + this._idle = []; // Worker[] + this._busy = new Map(); // Worker → { resolve, reject } + this._queue = []; // pending { message, transferList, resolve, reject } + this._workers = Array.from({ length: size }, () => this._spawn()); + } + + _spawn() { + const w = new Worker(this._workerUrl); + w.on("message", (msg) => { + const entry = this._busy.get(w); + if (!entry) return; // ignore late messages + this._busy.delete(w); + this._idle.push(w); + if (msg.error) entry.reject(Object.assign(new Error(msg.error), { stack: msg.stack })); + else entry.resolve(msg.result); + this._drain(); + }); + w.on("error", (err) => { + // Worker crash: reject the in-flight task. The dead worker + // stays in this._workers (won't respawn -- see §Worker death + // policy) so the pool degrades to size-1 for the rest of the + // run. For a one-shot build, the resulting task rejection + // aborts via the scheduler's _onError path. + const entry = this._busy.get(w); + if (entry) { this._busy.delete(w); entry.reject(err); } + }); + this._idle.push(w); + return w; + } + + run(payload, { name, transferList } = {}) { + return new Promise((resolve, reject) => { + this._queue.push({ + message: { name, ...payload }, + transferList, + resolve, reject, + }); + this._drain(); + }); + } + + _drain() { + while (this._queue.length && this._idle.length) { + const w = this._idle.shift(); + const { message, transferList, resolve, reject } = this._queue.shift(); + this._busy.set(w, { resolve, reject }); + w.postMessage(message, transferList); + } + } + + destroy() { + return Promise.all(this._workers.map(w => w.terminate())); + } +} +``` + +What we explicitly do **not** support, vs. a general-purpose pool: +dynamic resizing, per-worker concurrency above 1, worker recycling +after N tasks, abort signals, task-priority queues, utilization +histograms. Each is real complexity we don't need. + +### Worker death policy + +If a worker crashes mid-task, `w.on("error")` rejects the in-flight +task and removes it from `_busy`. The crashed worker is NOT +respawned; `_workers[]` still lists it for `destroy()` (terminate is +idempotent on a dead worker), but it never returns to `_idle`. The +pool effectively shrinks by one for the remainder of the run. + +For a one-shot `runBuild`, the rejected task surfaces through the +scheduler's `_onError` → `_doneP.reject` and `runBuild` aborts. Fine +as-is. + +For `serve` mode, a crash permanently degrades the long-lived pool. +The current policy is "tell the user to restart serve"; respawn-on- +error is a follow-up if it ever happens in practice. + +### Worker spawn cost + +`new Worker(url)` costs ~50-100 ms per worker. We spawn +`os.availableParallelism()` of them at construction; they spawn +concurrently but Node's process model adds contention -- realistically +~100-200 ms before any task can `postMessage` to a free worker. This +is a one-shot cost in `runBuild`; for `serve` mode it amortizes to +zero across rebuilds. It's worth noting against Phase 1's expected +savings (~250 ms scss overlap is partly eaten by the ~100-200 ms +worker boot). + +## Worker + +Single file with named handlers in a dispatch table. The pool sends +`{ name, ...payload }`; the worker routes to the right handler and +posts back `{ result }` or `{ error, stack }`. Four handlers total +(`scss`, `mermaid`, `buildInfo`, `render`) plus the ~15 LOC dispatcher. + +```js +// builder/cpu-worker.mjs + +import { parentPort } from "node:worker_threads"; + +import { initHighlighter } from "./highlight.mjs"; +import { compileScss } from "./scss.mjs"; +import { regenerateMermaid } from "./mermaid.mjs"; +import { captureBuildInfo } from "./build-info.mjs"; +import { + createMarkdownIt, + buildLinkTables, + renderPhase, +} from "./render.mjs"; +import { templatePhase } from "./template.mjs"; +import { unpackShared } from "./sab-broadcast.mjs"; + +// Start WASM init immediately, do NOT await. The module finishes +// loading synchronously so the parentPort.on('message') dispatcher is +// installed before the pool sends any work. Only the `render` handler +// awaits highlighterP. +const highlighterP = initHighlighter(); + +const handlers = { + async scss({ ctx }) { + return { scssResult: await compileScss(ctx.srcRoot) }; + }, + + async mermaid({ ctx }) { + return { mermaidStats: await regenerateMermaid(ctx.srcRoot) }; + }, + + async buildInfo() { + return { buildInfo: await captureBuildInfo() }; + }, + + async render({ inputs }) { + const { sharedSAB, chunk } = inputs; + const { siteData, initData, linkTablesData, staticFilesArr, + baseurl, buildInfo } = unpackShared(sharedSAB); + + const highlighter = await highlighterP; + const linkTables = reconstructLinkTables(linkTablesData); + const staticFiles = new Set(staticFilesArr); + const markdown = createMarkdownIt({ highlighter, linkTables, baseurl, staticFiles }); + + const site = { ...siteData, markdown, buildInfo }; + await renderPhase(chunk, site); + await templatePhase(chunk, site, initData); + + // book-combined pages have renderedContent but no html (Phase 8 + // handles them from renderedContent); send html: undefined for those. + // offlineHtml and offlineMisses are undefined when skipOffline is true. + return chunk.map(p => ({ + destPath: p.destPath, + renderedContent: p.renderedContent, + html: p.html, + offlineHtml: p.offlineHtml, + offlineMisses: p.offlineMisses, + })); + }, +}; + +parentPort.on("message", async (msg) => { + const { name, ...payload } = msg; + const handler = handlers[name]; + if (!handler) { + parentPort.postMessage({ error: `unknown task: ${name}` }); + return; + } + try { + const result = await handler(payload); + parentPort.postMessage({ result }); + } catch (err) { + parentPort.postMessage({ error: err.message, stack: err.stack }); + } +}); + +// linkTables values are page objects in the main pipeline, but +// resolveLink() in the relative-links plugin only reads .permalink. +// The serialized form ships [key, permalink] pairs; we reconstruct +// minimal { permalink } stubs in the worker. +function reconstructLinkTables({ byPath, byUrl, byRedirect }) { + const make = (pairs) => new Map(pairs.map(([k, pl]) => [k, { permalink: pl }])); + return { byPath: make(byPath), byUrl: make(byUrl), byRedirect: make(byRedirect) }; +} +``` + +The matching `serializeLinkTables` lives in `render.mjs` next to +`buildLinkTables` and is called from `markdownInit.execute()` on main: + +```js +// builder/render.mjs +export function serializeLinkTables(lt) { + const pairs = (m) => [...m.entries()].map(([k, p]) => [k, p.permalink]); + return { byPath: pairs(lt.byPath), byUrl: pairs(lt.byUrl), byRedirect: pairs(lt.byRedirect) }; +} +``` + +**`buildInit` export from template.mjs.** The `markdownInit` / +`buildInit` tasks both run on main; they call `template.mjs`'s +internal `buildInit()` helper, which is currently file-local. Phase 0 +of the migration re-exports it as `buildInitFn` (renamed to avoid +shadowing the task ID inside `tbdocs.mjs`). Today's `templatePhase()` +still calls the local function directly; the export adds no overhead. + +## Boot sequence (WASM init) + +Two independent `initHighlighter()` invocations: + +- **Main thread.** `markdownInit.execute()` awaits `initHighlighter()` + to build the shared markdown-it instance used by seo and (indirectly) + by `book.mjs`'s subtitle/intro rendering during `assembleBook`. The + cached singleton lives in `highlight.mjs`'s `cached` module-level + variable on the main thread. +- **Each worker.** `cpu-worker.mjs` calls `initHighlighter()` at module + scope without awaiting. Module evaluation finishes synchronously, so + `parentPort.on("message")` is installed before the pool dispatches. + Only the `render` handler awaits the promise. The `scss`, `mermaid`, + `buildInfo` handlers don't need a highlighter -- their workers can + service tasks while Shiki is still loading. + +The two contexts each have their own cached singleton. Total WASM +init cost is paid once per thread (main + N workers), all in parallel, +overlapping with worker spawn and the main spine. + +## Data transfer strategy + +### Small outputs (config, navTree, initData, buildInfo, scssResult) +Structured clone via `postMessage`. Negligible cost (< 1 ms). + +### `linkTables` (medium, ~857 entries × ~3 keys) +Serialized once on main inside `markdownInit.execute()` to +`linkTablesData` ([key, permalink] pairs, ~50 KB). Shipped to each +render worker via `dispatch`'s output; each worker reconstructs +minimal `{ permalink }` stubs. + +### Render chunk (medium-large, ~857/N pages including `rawContent`) +The biggest single transfer. The `dispatch` output's `chunks[i]` +contains roughly `pages.length / N` page objects with `rawContent` +attached. On a 4-worker box: ~215 pages × ~4 KB raw + frontmatter = +~860 KB per chunk × 4 workers = ~3.4 MB total ship-out. The deltas +returned are ~30 KB per worker (just destPath + renderedContent + +html). Two crossings per chunk; one-way ~3.4 MB total at chunk send, +much smaller at delta return. + +### Mutations that stay on the main thread +nav / seo / loadData / resolveBookChapters / buildInit run on main +against `state.pages` directly. No marshalling, no delta merge -- +mutations are immediately visible to downstream main-thread tasks. + +### SharedArrayBuffer broadcast (Phase 4) +The render fan-out's shared payload (`siteData + initData + +linkTablesData + staticFilesArr + baseurl + buildInfo + +sitePathsArr + offlineExcludePatterns + skipOffline`, ~310--330 KB) +is JSON-serialized once on the main thread into a SharedArrayBuffer +via `sab-broadcast.mjs`'s `packShared()`. Each render task receives +the SAB reference (shared memory, not cloned) alongside its per-worker +chunk. Workers call `unpackShared()` to deserialize independently and +in parallel; each builds a `new Set(sitePathsArr)` to drive the +inline offline URL rewrite. Measured saving at Phase 4 baseline +(~286 KB, pre-offline fields): ~8 ms per build (fan-out drops from +~19 ms to ~9 ms). + +## Error handling + +Three severity levels: + +1. **Fatal** (task throws): `_onError` rejects `_doneP`. The + orchestrator catches, prints, exits 1. Matches today's behavior + for nav integrity failures, unsupported layouts, redirect + collisions. The pool's outstanding work is implicitly cancelled + when the orchestrator calls `pool.destroy()` during shutdown. + +2. **Degraded** (task sets a flag): the task returns normally with a + `{ failed: true }` field in its output. Downstream tasks receive + the output (write still needs `scssResult` even if compilation + failed -- it just skips emitting the generated asset). After + `_doneP` resolves, the orchestrator checks results for degraded + flags and sets `process.exitCode`. Matches today's mermaid / scss + behavior. Applies symmetrically to both seed tasks: a sass + compile error sets `scssResult.failed`, a mermaid render error + sets `mermaidStats.failed`. + +3. **Setup skip** (puppeteer / sass missing): task returns + `{ setupSkipped: true }`. Downstream tasks see existing-on-disk + artifacts (mermaid: prior SVGs; scss: nothing emitted, but the + theme tree's hand-extracted CSS still applies). Not an error. + +4. **Worker death** (worker crashes, OOM, native segfault): the + pool's `w.on("error")` rejects the in-flight task; the rejection + surfaces through `_onError` as Fatal above. The dead worker is + not respawned (see §Worker death policy). + +## Serve / watch mode + +The pool is constructed in `serve.mjs`'s long-lived process and +re-used across rebuilds; only the `Scheduler` instance (and its +`SharedState`) is fresh per rebuild. Workers stay warm -- WASM, JIT, +module cache all survive. The worker spawn cost is paid once, at +`runServe()` startup. + +```js +// serve.mjs (sketch) +const pool = new WorkerPool(os.availableParallelism(), CPU_WORKER_URL); + +watcher.on("change", debounce(async () => { + const scheduler = new Scheduler({ pool, tasks: TASKS }); + await scheduler.start(ctx); +}, 100)); +``` + +Incremental invalidation (rebuild only changed tasks) is a much later +phase; defer. + +## Link checks (out of scope) + +The link-checker passes (`scripts/check_links.mjs`) currently run +**outside** the build, via `check.bat`. The earlier `CheckPool` +worker_threads integration in `tbdocs.mjs` has been removed; this plan +inherits that decision and does **not** re-integrate link-checking +into the scheduler. + +The scheduler design accommodates checks cleanly as `runOnMain: true` +tasks that delegate to a `CheckPool` instance passed in via `ctx`, so +the integration can be re-added as a follow-up phase if desired. The +shape would be: + +```js +checkOnline: { + expected: ["writeAux"], // _site/ must be fully written + runOnMain: true, + async execute(_, ctx) { + const r = await ctx.checkPool.run(buildCheckArgv("online", ctx.destRoot, ...)); + return { name: "online", ...r }; + }, + submit() { /* terminal */ }, +}, +``` + +But the initial scheduler migration treats `check.bat` as the +canonical post-build verifier and lands without touching it. + +## Post-write tasks + +The DAG nodes downstream of `render` and `scss`/`mermaid`. All +`write`-family tasks are `runOnMain: true` -- they own the master +`pages[]` and `state.site` reads; their CPU sections are short +relative to their I/O. + +```js +writePdf: { + expected: ["renderJoin", "mermaid"], + runOnMain: true, + // Sources CSS directly: tb-highlight.css from state.site.highlighter, + // print.css from staticFiles. No dependency on write or _site/. +}, + +write: { + expected: ["renderJoin", "scss", "mermaid", "prepDest"], + runOnMain: true, + async execute({ scss: { scssResult }, mermaid: { mermaidStats } }, ctx, state) { + // render delta merges already happened in each render:i.submit(). + // mermaid.submit() already appended new SVG descriptors to + // state.staticFiles synchronously. + const generatedAssets = []; + if (state.site.highlighter?.themeCss) generatedAssets.push(/* tb-highlight.css */); + if (scssResult.compiled) generatedAssets.push(/* just-the-docs-combined.css */); + return writePhase(state.pages, state.staticFiles, { + destRoot: ctx.destRoot, + generatedAssets, + baseurl: String(state.site.config.baseurl || ""), + dryRun: ctx.opts.dryRun, + }); + }, + submit(out, emit) { emit("writeAux", out); }, +}, + +searchData: { + expected: ["renderJoin", "prepDest"], + runOnMain: true, + // Reads only in-memory renderedContent; writes search-data.json + // into _site/ (needs prepDest). Runs in parallel with write. + async execute(_, ctx, state) { + return writeSearchData(state.pages, state.site, ctx.destRoot); + }, + submit(out, emit) { emit("writeAux", out); }, +}, + +writeAux: { + expected: ["write", "searchData", "deriveRedirects", "deriveSitemap"], + runOnMain: true, + async execute({ deriveRedirects, deriveSitemap }, ctx, state) { + await Promise.all([ + writeRedirects(state.pages, state.site, ctx.destRoot, deriveRedirects.stubs), + writeSitemap (state.pages, state.site, ctx.destRoot, deriveSitemap.urls), + ]); + }, + submit(out, emit) { + emit("writeOffline", out); + }, +}, + +writeOffline: { + expected: ["writeAux"], + runOnMain: true, + async execute(_, ctx, state) { + return writeOffline(state.pages, state.staticFiles, state.site, + ctx.destRoot, { + auxStats, + precomputed: true, + sitePaths: state.sitePaths, + }); + }, + submit() { /* terminal */ }, +}, + +// writePdf depends on renderJoin + mermaid (CSS sourced directly). +// It runs in parallel with write → searchData → writeAux → writeOffline. +``` + +**Restoring the derive-time exports.** `redirects.mjs` / +`sitemap.mjs` regain the `precomputedStubs` / `precomputedUrls` +passthrough parameters so the derive tasks' outputs can flow into the +write calls without re-deriving. + +**`deriveRedirectStubs` filter change.** The current filter +(`p.html !== undefined`) is set after template runs. Under the +scheduler, `deriveRedirects` runs concurrently with the main spine +and well before render+template. Change the filter in `redirects.mjs` +to `p.frontmatter.layout !== "book-combined"` (the property that +determines whether `html` will be set; known after discover) so the +derive can run at any point after discover. + +**Workerizing writeOffline or writePdf (measured, declined).** Both +phases have non-trivial CPU sections: writeOffline rewrites URLs +across all 856 HTML files; writePdf assembles `book.html` via +`assembleBook`. `writePdf` now depends only on `renderJoin` + `mermaid` +(CSS is sourced directly, not from `_site/`), so it runs in parallel +with the entire `write → searchData → writeAux → writeOffline` chain. +Cooperative async concurrency on the main thread interleaves their I/O +gaps. The structured-clone cost of shipping `pages[]` across the worker +boundary (~37–65 ms) would be pure overhead. See §Phase 3-follow-up +for the full measurement. + +## Timing / profiling + +The scheduler records `{ start, end }` per task. The summary is +formatted to match the current `t.lap()` output style: + +``` +config=1ms discover=98ms scss=1041ms mermaid=2ms buildInfo=8ms +nav=9ms buildInit=1ms markdownInit=63ms seo=34ms loadData=4ms +resolveBookChapters=7ms render:0=312ms render:1=298ms ... +write=542ms searchData=41ms writeAux=12ms +writeOffline=1210ms writePdf=352ms +``` + +## Entry point + +```js +// tbdocs.mjs + +import os from "node:os"; +import { WorkerPool } from "./worker-pool.mjs"; +import { Scheduler } from "./scheduler.mjs"; + +const CPU_WORKER_URL = new URL("./cpu-worker.mjs", import.meta.url); + +export async function runBuild(opts) { + const srcRoot = path.resolve(process.cwd(), opts.src); + const destRoot = path.resolve(opts.dest ?? path.join(srcRoot, "_site")); + + const workerCount = os.availableParallelism(); + const pool = new WorkerPool(workerCount, CPU_WORKER_URL); + + const scheduler = new Scheduler({ pool, tasks: TASKS }); + const ctx = { srcRoot, destRoot, opts, workerCount }; + + try { + const results = await scheduler.start(ctx); + console.log(scheduler.summary()); + // ... existing summary output using results + scheduler.state ... + return { pages: scheduler.state.pages, + staticFiles: scheduler.state.staticFiles, + site: scheduler.state.site, + destRoot }; + } finally { + await pool.destroy(); + } +} +``` + +## Migration path + +The current code is the simple serial baseline -- there is no +scaffolding to delete, only new pieces to add. Each phase keeps the +build working end-to-end and produces byte-identical output. + +**Where to start.** Implement the phases in order, beginning with +Phase 0. Commit after each phase. The verification gate at the end +of each phase block below is the done-signal -- if it doesn't pass +cleanly, the phase isn't done. + +**Historical context (cited inline below).** Some refactors restore +small carriers (function parameters, return-value fields) that +existed in an earlier WIP iteration of the parallelisation work -- +commit `5736fee4` ("WIP dataflow parallelization work") -- and were +reverted along with the threading shape they served. Where this plan +refers to "restoring" something, `git show 5736fee4 -- <file>` shows +the prior form for reference; do not re-introduce the whole reverted +shape, only the small pieces noted. + +### Suggested Claude model per phase + +To reduce session cost, the phases are labelled with the model that +fits each phase's nature. Sonnet is preferred when the spec is +precise enough that the implementation is a translation; Opus is +preferred when correctness depends on reasoning about concurrency, +data lifetime across the worker boundary, or low-level serialisation. + +| Phase | Model | Why | +|---|---|---| +| 0. Skeleton + small refactors | Sonnet | Code is given verbatim in §Worker pool / §Scheduler core. The five refactors are precisely-bounded edits to existing modules. No design judgement needed. | +| 1. Seeds + main-thread spine | Sonnet | Each task body is a thin wrapper around an existing phase function. The scheduler core is copy-from-plan. All mutation is on the main thread (no cross-worker identity yet). | +| 2. Render fan-out | Opus | Cross-thread structured-clone semantics, module-scope initialisation order, dynamic task registration, per-page delta-merge identity. Debugging concurrency bugs needs depth. | +| 3. Post-write tasks | Sonnet | All `runOnMain`; thin wrappers around existing write functions. No new concurrency surface beyond the already-built scheduler. | +| 3-follow-up. Workerize writeOffline | Opus (decided) | Profiled: zero CPU contention; cooperative async overlap is already optimal. Declined — no implementation needed. | +| 4. SAB broadcast | Opus | JSON + SAB approach; measured ~55% fan-out reduction (~8 ms saving). | + +Escalate to Opus mid-phase if a Sonnet session hits a debugging block +it can't reason through. + +### Verification gate (applies to every phase) + +After each phase, run: + +```sh +build.bat && check.bat +``` + +The phase is done iff: + +- `build.bat` exits 0; no new warnings vs. the prior phase. +- The summary line shows `pages.length` at the current baseline + (857 today; the drift guard in `tbdocs.mjs` warns if it slips + below 836). +- All three `check.bat` passes return `0 issue(s)` for online and + offline, and the PDF pass's pre-existing broken-link count + matches the baseline (8 today; unrelated to the scheduler work). +- The scheduler's timing summary shows the **expected concurrency + pattern** for the phase (each phase block notes what to look for). + +The site is the regression bar; there is no separate unit-test +harness for builder/ to satisfy. + +### Phase 0: skeleton + small refactors + +**Suggested model:** Sonnet. + +Create three new modules: + +- `builder/worker-pool.mjs` -- the `WorkerPool` class (~50 LOC, see + §Worker pool). +- `builder/scheduler.mjs` -- the `Scheduler` class + `SharedState` + (~150 LOC, see §Scheduler core). +- `builder/cpu-worker.mjs` -- the worker harness (`parentPort` message + dispatcher + empty handlers map for now, ~15 LOC). + +Refactors needed (each is small and lands cleanly under today's serial +`runBuild`): + +- Re-export `buildInit` from `template.mjs` (currently file-local) as + `buildInitFn`. Today's `templatePhase()` still calls the local + function; the export is for the upcoming main-thread `buildInit` + task. +- Export `serializeLinkTables` from `render.mjs`. Today nothing calls + it; Phase 2 wires it up. +- Change the filter in `redirects.mjs`'s `deriveRedirectStubs` from + `p.html !== undefined` to `p.frontmatter.layout !== "book-combined"`. + Behaviour is identical under the current serial pipeline (template + has already run by the time writeRedirects fires), but the new + filter lets the derive step run before template under the scheduler. +- Restore the `precomputedStubs` / `precomputedUrls` passthrough + parameters on `writeRedirects` / `writeSitemap` (these were in commit + `5736fee4` and removed in the revert). +- Add `svgFiles: [{ srcPath, srcRel, destRel, size }, ...]` to + `regenerateMermaid`'s return value (this field was in `5736fee4` + and removed; the stat happens inside the existing + `regenerateMermaid` call). + +The build still runs from the existing `runBuild()` exactly as today. +No new runtime dependencies. + +**Deliverable:** new modules compile, refactors land under the serial +pipeline, build output unchanged. + +**Verification.** `build.bat && check.bat` clean. Output and timing +summary unchanged vs. before Phase 0 -- no scheduler is wired up +yet, so the build must look identical. + +### Phase 1: Seeds + main-thread spine + +**Suggested model:** Sonnet. + +Wire `runBuild()` to construct the pool, instantiate the scheduler, +and call `scheduler.start(ctx)`. Port: + +- **Seeds:** `config`, `buildInfo`, `scss`, `mermaid`, `prepDest`. +- **Main-thread spine:** `discover`, `nav`, `markdownInit`, `seo`, + `loadData`, `resolveBookChapters`, `buildInit`, `deriveRedirects`, + `deriveSitemap`. + +The existing `renderPhase` → `templatePhase` → `writePhase` → +post-write code stays in `runBuild()` as a trailing block that +consumes `scheduler.state` after `scheduler.start()` resolves. + +`config` runs on main as a `runOnMain` task; `discover` ditto for +identity reasons (it builds `state.pages` and `state.pageByDest`). + +**`serve.mjs` is untouched.** `serve.mjs` imports `runBuild` from +`tbdocs.mjs` and calls it per change. As long as `runBuild`'s +signature stays stable (it does), `serve.mjs` needs no changes +through Phases 0-3. The dev-server flow keeps working end-to-end +without scheduler-aware code in `serve.mjs`. + +**Expected savings:** ~150 ms wall-clock. The main spine takes +~250 ms; `scss` (~700 ms) overlaps roughly half of it. Worker spawn +(~100-200 ms one-shot) eats a chunk of that. Honest end-to-end +estimate: 6.7 s → ~6.55 s. Most of the value here is structural -- +the DAG is now explicit -- not raw wall-clock. + +**Verification.** `build.bat && check.bat` clean. The timing summary +should show `scss=...ms` starting at t=0 alongside the main-thread +spine entries (`discover`, `nav`, ...), not after them. `render` / +`template` / `write` / `writeOffline` / `writePdf` still appear in +the summary at roughly their current durations -- they haven't been +moved to the scheduler yet. + +### Phase 2: Render fan-out + +**Suggested model:** Opus. + +Wire up the `render` named handler in `cpu-worker.mjs` (`renderPhase` ++ `templatePhase` over a chunk, return per-page deltas). Add the +`dispatch` task + dynamic `render:0..N` registration. Drop the serial +`renderPhase` / `templatePhase` calls from `runBuild()`. + +This is the largest single win: ~3.5 s of CPU compresses to +~`3500 / N` ms. + +**Expected savings:** on 4 cores, ~2.6 s saved (6.55 s → ~4.0 s). +On 8 cores, ~3 s saved (~3.6 s). Dispatch overhead is ~50 ms. + +**Verification.** `build.bat && check.bat` clean. The timing summary +should show N `render:i` entries (one per worker) whose individual +durations sum to roughly today's combined `render` + `template` time, +not today's per-page renders. Wall-clock drop should match the +expected savings above within ±20%. + +### Phase 3: Post-write tasks + +**Suggested model:** Sonnet. + +Port `write`, `searchData`, `writeAux`, `writeOffline`, `writePdf` as +`runOnMain` tasks. `writeOffline` and `writePdf` run in parallel on +the main thread; the gain is the shorter of their two CPU sections +(~240 ms) plus interleaved I/O. + +`runBuild()` shrinks to: pool construction + `scheduler.start(ctx)` + +summary output + `pool.destroy()`. + +**Expected savings:** ~240 ms (4.0 s → ~3.75 s on 4 cores). + +**Verification.** `build.bat && check.bat` clean. `writeOffline` and +`writePdf` should overlap in the timing summary -- their `start` +timestamps should be within a few ms of each other. + +### Phase 3-follow-up: workerize writeOffline (optional) — DECLINED + +**Decision:** do not workerize. Profiling shows zero CPU contention. + +Two independent measurement runs confirmed that `writeOffline` and +`writePdf` already achieve perfect overlap via cooperative async +concurrency on the main thread. In both runs the combined wall-clock +equalled `max(writeOffline, writePdf)` — the "wasted (CPU contention)" +metric was 0 ms. `writePdf`'s `assembleBook` synchronous section +(~150 ms) runs entirely inside `writeOffline`'s I/O await gaps. + +Structured-clone cost for shipping `pages[]` across the worker +boundary was measured at ~37–65 ms (depending on per-page HTML size), +which would be pure overhead against a 0 ms contention baseline. +Adding the `offline` handler to `cpu-worker.mjs` would also increase +worker spawn time (acorn import), complicate the worker's module +surface, and add a result-merge path — all for no measurable gain. + +The overlap already saves ~260 ms vs. sequential execution (the full +duration of `writePdf`). No further action needed. + +### Phase 4: SharedArrayBuffer broadcast + +**Suggested model:** Opus. + +For the render fan-out, serialize `siteData + initData + linkTables + +staticFilesArr + baseurl + buildInfo` once into a SharedArrayBuffer +and pass it to all render tasks. The SAB is shared memory --- each +worker deserializes its own copy from the same buffer instead of the +main thread structured-cloning the ~286 KB shared payload 16 times. + +**Implementation.** Three files: + +- `builder/sab-broadcast.mjs` (~15 LOC): `packShared(obj)` serializes + an object to JSON, encodes to UTF-8, and copies into a SAB; + `unpackShared(sab)` reverses the process. +- `tbdocs.mjs` `dispatch.execute()`: packs the shared payload into a + SAB and returns `{ chunks, sharedSAB }` instead of the flat fields. +- `cpu-worker.mjs` `render` handler: calls `unpackShared(sharedSAB)` + to reconstruct the shared fields before rendering. + +**Measurements** (16 workers, ~286 KB shared payload, 857 pages): + +| | Run 1 | Run 2 | Run 3 | Median | +|---|---|---|---|---| +| Baseline (structured-clone) | 18.1 ms | 35.0 ms | 19.5 ms | ~19 ms | +| SAB broadcast | 9.2 ms | 7.3 ms | 10.2 ms | ~9 ms | + +SAB packing cost (in `dispatch.execute()`): ~2 ms (visible as +`dispatch=2ms` vs. prior `dispatch=0ms`). Net saving: ~8 ms per build, +a ~55% reduction in fan-out overhead. The saving is modest in absolute +terms (~0.2% of a ~4 s build) but the implementation is small and the +pattern moves redundant serialization work off the main thread --- +each worker independently deserializes from shared memory in parallel +instead of the main thread serializing 16 identical copies +sequentially. + +**Verification.** `build.bat && check.bat` clean. `dispatch` now +shows ~2 ms (SAB packing) vs. 0 ms before. diff --git a/builder/PLAN.md b/builder/PLAN.md index b12ff397..4215f6f9 100644 --- a/builder/PLAN.md +++ b/builder/PLAN.md @@ -62,6 +62,14 @@ live-reloads the browser via SSE. Renames `--serving` to static server; `docs/serve.bat` becomes a one-line `--serve` shim. Closes the PLAN-10 §7.D4 and §7.D11 watch-mode deferrals. +A **task-graph scheduler** for the build pipeline is designed in +[PLAN-scheduler.md](PLAN-scheduler.md) and has been implemented +(Phases 0--4). It covers a thin in-tree scheduler + `WorkerPool` over +`node:worker_threads`, moves CPU-bound seed tasks (`scss`, `mermaid`, +`buildInfo`) onto workers, runs `prepDest` (destination clean/recreate) +as a main-thread seed in parallel with the spine, and fans out +`renderPhase` + `templatePhase` across CPUs via SAB broadcast. + Open follow-ups (deferred enhancements, divergence investigations) live in [FUTURE-WORK.md](FUTURE-WORK.md). diff --git a/builder/README.md b/builder/README.md index 4bfbf4ce..25a3b37e 100644 --- a/builder/README.md +++ b/builder/README.md @@ -84,9 +84,9 @@ the architecture overview. | 7 | [offline.mjs](offline.mjs) | Mirror to `_site-offline/` with `file://` URL rewrites | | 8 | [pdf.mjs](pdf.mjs) + [book.mjs](book.mjs) (renderer half) | Sparse `_site-pdf/` tree (book.html + CSS + images) | -A pre-step ([mermaid.mjs](mermaid.mjs)) regenerates stale -`docs/assets/images/mmd/*.svg` from their `.mmd` sources before -discover walks the tree. +A seed task ([dot.mjs](dot.mjs)) regenerates stale +`docs/assets/images/dot/*.svg` from their `.dot` sources via the WASM +build of Graphviz, concurrently with discover. ## Verification diff --git a/builder/cpu-worker.mjs b/builder/cpu-worker.mjs new file mode 100644 index 00000000..f0e5c4a3 --- /dev/null +++ b/builder/cpu-worker.mjs @@ -0,0 +1,461 @@ +// Worker harness for the tbdocs build pipeline. Phase 15: SAB-based task +// metadata (no JS-side taskMeta array), priority-aware claiming, per-chunk +// flush via FIFO _pendingFlush queue, generic payload SAB. +// See PLAN-sab-pull-scheduler.md §Phase 15. + +import { promises as fsP } from "node:fs"; +import path from "node:path"; +import { parentPort, workerData } from "node:worker_threads"; +import { compileLightScss, compileDarkScss } from "./scss.mjs"; +import { regenerateDot } from "./dot.mjs"; +import { captureBuildInfo } from "./build-info.mjs"; + +import { createMarkdownIt, renderPhase } from "./render.mjs"; +import { templatePhase } from "./template.mjs"; +import { unpackShared } from "./sab-broadcast.mjs"; +import { deriveSearchEntries } from "./search.mjs"; +import { computeChunkSeo } from "./seo.mjs"; +import { deriveOfflinePage, deriveOfflinePageCached, + sliceNavBlock, normalizeBaseurl, + posixDirname } from "./offline-rewrite.mjs"; + +import { + createViews, scanAndClaim, onTaskDone, readTaskMeta, + HANDLERS, + READY, CLAIMED, DONE, F_ON_DEMAND, F_RUN_ON_MAIN, + F_RUN_WHEN_IDLE, F_UNIQUE_PER_WORKER, + MAX_LANES, +} from "./sab-scheduler.mjs"; + +if (workerData?.spawnTime) parentPort.postMessage({ coldBoot: { start: workerData.spawnTime, end: Date.now() } }); + +const myLane = workerData?.lane ?? 0; + +// ── Mutable state set by init / dynamicData messages ──────────────────────── + +let views = null; // Int32Array views into the scheduling SAB +let ctx = null; // { srcRoot, destRoot, opts, workerCount } +let idMapping = null; // { nameToIdx, idxToName, DYNAMIC_BASE, … } + +let _payloadSAB = null; // SharedArrayBuffer with packed per-task payloads +let _sharedSAB = null; // SharedArrayBuffer with packed shared payload + +// ── Handler table ─────────────────────────────────────────────────────────── + +const handlers = { + async warmInit() { + const { initHighlighter } = await import("./highlight.mjs"); + await initHighlighter(); + return {}; + }, + + async renderEnvInit() { + while (!_sharedSAB) { + await new Promise(resolve => setImmediate(resolve)); + } + + const { siteData, initData, linkTablesData, staticFilesArr, + baseurl, buildInfo, sitePathsArr, + skipOffline } = unpackShared(_sharedSAB); + + const { initHighlighter } = await import("./highlight.mjs"); + const highlighter = await initHighlighter(); + const linkTables = reconstructLinkTables(linkTablesData); + const staticFiles = new Set(staticFilesArr); + const markdown = createMarkdownIt({ highlighter, linkTables, baseurl, staticFiles }); + const site = { ...siteData, markdown, buildInfo }; + + let offlineBase = null; + if (!skipOffline) { + offlineBase = { + sitePaths: new Set(sitePathsArr), + baseurl: normalizeBaseurl(baseurl), + }; + } + + _renderEnv = { site, initData, offlineBase }; + return {}; + }, + + async flush() { + const items = _pendingFlush.shift() ?? []; + let written = 0, offlineWritten = 0, offlineMisses = 0; + if (!ctx.opts.dryRun) { + let next = 0; + const limit = Math.min(64, items.length || 1); + const workers = Array.from({ length: limit }, async () => { + while (next < items.length) { + const p = items[next++]; + await fsP.writeFile(path.join(ctx.destRoot, p.destPath), p.html, "utf8"); + written++; + if (p.offlineHtml !== undefined) { + await fsP.writeFile(path.join(ctx.destRoot + "-offline", p.destPath), p.offlineHtml, "utf8"); + offlineWritten++; + } + offlineMisses += p.offlineMisses ?? 0; + } + }); + await Promise.all(workers); + } + return { written, offlineWritten, offlineMisses }; + }, + + async scssLight() { + const scssLightResult = await compileLightScss(ctx.srcRoot); + return { scssLightResult }; + }, + + async scssDark() { + const scssDarkResult = await compileDarkScss(ctx.srcRoot); + return { scssDarkResult }; + }, + + async dot() { + const dotStats = await regenerateDot(ctx.srcRoot); + return { dotStats }; + }, + + async buildInfo() { + const buildInfo = await captureBuildInfo(); + return { buildInfo }; + }, + + async render(taskIdx) { + const offset = Atomics.load(views.payloadOffset, taskIdx); + const length = Atomics.load(views.payloadLength, taskIdx); + const chunk = JSON.parse( + new TextDecoder().decode(new Uint8Array(_payloadSAB, offset, length)), + ); + + const env = _renderEnv; + + await renderPhase(chunk, env.site); + computeChunkSeo(chunk, env.site.seoSiteTitle, env.site.config, env.site.markdown); + await templatePhase(chunk, env.site, env.initData); + + if (env.offlineBase) { + const offlineState = { ...env.offlineBase, + caches: { rawResolution: new Map(), seg: new Map(), result: new Map() }, + }; + + const writable = chunk.filter(p => p.html !== undefined); + const byDir = new Map(); + for (const p of writable) { + const destDir = posixDirname(p.destPath); + let g = byDir.get(destDir); + if (!g) { g = []; byDir.set(destDir, g); } + g.push(p); + } + const navCache = new Map(); + for (const [destDir, group] of byDir) { + const first = group[0]; + const input = sliceNavBlock(first.html); + if (input === null) continue; + const { html: rendered } = deriveOfflinePage(first, offlineState); + const output = sliceNavBlock(rendered); + if (output === null) continue; + navCache.set(destDir, { input, output }); + } + offlineState.navCache = navCache; + + for (const p of writable) { + const { html, misses } = deriveOfflinePageCached(p, offlineState); + p.offlineHtml = html; + p.offlineMisses = misses; + } + } + + // Stash writable pages for the matching flush:i (FIFO; one batch per + // render:i, drained by exactly one flush:i on the same worker). + const batch = []; + for (const p of chunk) { + if (p.html !== undefined) { + batch.push({ + destPath: p.destPath, + html: p.html, + offlineHtml: p.offlineHtml, + offlineMisses: p.offlineMisses, + }); + } + } + _pendingFlush.push(batch); + + // Per-chunk search entries; consolidated on main during searchData. + // Drop `sourcePage` (workers hold cloned page objects, not master + // refs) and `i` (chunk-local indices are meaningless; main assigns + // global indices during consolidation). + const searchEntries = deriveSearchEntries(chunk, env.site) + .map(e => ({ doc: e.doc, title: e.title, content: e.content, + url: e.url, relUrl: e.relUrl })); + + return { + pages: chunk.map(p => ({ + destPath: p.destPath, + renderedContent: p.renderedContent, + offlineMisses: p.offlineMisses, + })), + searchEntries, + }; + }, +}; + +// Reverse handler table: HANDLERS maps name → integer, handlerById maps +// integer → function. Built once at module load. +const handlerById = []; +for (const [name, id] of Object.entries(HANDLERS)) handlerById[id] = handlers[name]; + +let _renderEnv = null; +let _pendingFlush = []; + +// ── Message handler (init + dynamicData only) ─────────────────────────────── + +parentPort.on("message", (msg) => { + if (msg.init) { + views = createViews(msg.sab); + ctx = msg.ctx; + idMapping = msg.idMapping; + _payloadSAB = null; + _sharedSAB = null; + _renderEnv = null; + _pendingFlush = []; + pullLoop(); + return; + } + if (msg.dynamicData) { + _payloadSAB = msg.payloadSAB; + _sharedSAB = msg.sharedSAB; + return; + } +}); + +// ── Idle-task scan (speculative warmup) ───────────────────────────────────── + +function findIdleTask(views, lane) { + const count = Atomics.load(views.taskCount, 0); + let bestIdx = -1; + let bestPri = Infinity; + for (let i = 0; i < count; i++) { + if (!(Atomics.load(views.flags, i) & F_RUN_WHEN_IDLE)) continue; + const meta = readTaskMeta(views, i); + const pri = meta.idlePriority; + if (pri >= bestPri) continue; + if (Atomics.load(views.flags, i) & F_UNIQUE_PER_WORKER) { + if (Atomics.load(views.perWorkerDone, i * MAX_LANES + lane) !== 0) + continue; + let skip = false; + for (const predIdx of meta.expectedDeps) { + if (Atomics.load(views.status, predIdx) !== DONE) { skip = true; break; } + } + if (skip) continue; + for (const depIdx of meta.perWorkerDeps) { + if (Atomics.load(views.perWorkerDone, depIdx * MAX_LANES + lane) === 0) { skip = true; break; } + } + if (skip) continue; + bestIdx = i; + bestPri = pri; + } else { + if (Atomics.load(views.status, i) !== READY) continue; + bestIdx = i; + bestPri = pri; + } + } + // For non-unique_per_worker tasks, CAS-claim at the end. + if (bestIdx !== -1 && !(Atomics.load(views.flags, bestIdx) & F_UNIQUE_PER_WORKER)) { + if (Atomics.compareExchange(views.status, bestIdx, READY, CLAIMED) !== READY) + return -1; + } + return bestIdx; +} + +// ── Pull loop ─────────────────────────────────────────────────────────────── + +async function pullLoop() { + while (true) { + if (Atomics.load(views.buildDone, 0) !== 0) return; + + let taskIdx = scanAndClaim(views, myLane); + + if (taskIdx === -1) { + // Speculative: run idle-eligible tasks before sleeping. + const idleTask = findIdleTask(views, myLane); + if (idleTask !== -1) { + const idleMeta = readTaskMeta(views, idleTask); + const t0 = Date.now(); + let idleResult; + try { + idleResult = await handlerById[idleMeta.handlerIdx](); + } catch (err) { + parentPort.postMessage({ taskFailed: idleTask, message: err.message, stack: err.stack }); + return; + } + const t1 = Date.now(); + Atomics.store(views.perWorkerDone, idleTask * MAX_LANES + myLane, 1); + parentPort.postMessage({ + perWorkerTiming: true, + taskIdx: idleTask, + timing: { start: t0, end: t1 }, + lane: myLane, + output: idleResult, + }); + continue; + } + + const gen = Atomics.load(views.notify, 0); + // Double-check after reading gen (race: a task may have become + // READY between the failed scan and this load). + taskIdx = scanAndClaim(views, myLane); + if (taskIdx === -1) { + Atomics.wait(views.notify, 0, gen, 50); + continue; + } + } + + // ── Per-worker deps (unique_per_worker) ── + const meta = readTaskMeta(views, taskIdx); + let unsatisfied = null; + for (const depIdx of meta.perWorkerDeps) { + if (Atomics.load(views.perWorkerDone, depIdx * MAX_LANES + myLane) === 0) { + unsatisfied = depIdx; + break; + } + } + + if (unsatisfied !== null) { + const depFlags = Atomics.load(views.flags, unsatisfied); + + if ((depFlags & F_ON_DEMAND) && !(depFlags & F_RUN_ON_MAIN)) { + const depMeta = readTaskMeta(views, unsatisfied); + + // Check the dep's own perWorkerDeps (e.g. renderEnvInit → warmInit). + let nestedUnsatisfied = null; + for (const nestedIdx of depMeta.perWorkerDeps) { + if (Atomics.load(views.perWorkerDone, nestedIdx * MAX_LANES + myLane) === 0) { + nestedUnsatisfied = nestedIdx; + break; + } + } + + if (nestedUnsatisfied !== null) { + const nestedFlags = Atomics.load(views.flags, nestedUnsatisfied); + if ((nestedFlags & F_ON_DEMAND) && !(nestedFlags & F_RUN_ON_MAIN)) { + Atomics.store(views.status, taskIdx, READY); + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, 1); + + const nestedMeta = readTaskMeta(views, nestedUnsatisfied); + const t0 = Date.now(); + let nestedResult; + try { + nestedResult = await handlerById[nestedMeta.handlerIdx](); + } catch (err) { + parentPort.postMessage({ taskFailed: nestedUnsatisfied, message: err.message, stack: err.stack }); + return; + } + const t1 = Date.now(); + Atomics.store(views.perWorkerDone, nestedUnsatisfied * MAX_LANES + myLane, 1); + parentPort.postMessage({ + perWorkerTiming: true, + taskIdx: nestedUnsatisfied, + timing: { start: t0, end: t1 }, + lane: myLane, + output: nestedResult, + }); + continue; + } + Atomics.store(views.status, taskIdx, READY); + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, 1); + continue; + } + + // Check preconditions (expected predecessors on the dep). + let precondFailed = false; + for (const predIdx of depMeta.expectedDeps) { + if (Atomics.load(views.status, predIdx) !== DONE) { + precondFailed = true; + break; + } + } + if (precondFailed) { + Atomics.store(views.status, taskIdx, READY); + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, 1); + continue; + } + + // All dep's deps satisfied. Release original task, execute the dep. + Atomics.store(views.status, taskIdx, READY); + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, 1); + + const t0 = Date.now(); + let depResult; + try { + depResult = await handlerById[depMeta.handlerIdx](); + } catch (err) { + parentPort.postMessage({ taskFailed: unsatisfied, message: err.message, stack: err.stack }); + return; + } + const t1 = Date.now(); + Atomics.store(views.perWorkerDone, unsatisfied * MAX_LANES + myLane, 1); + + parentPort.postMessage({ + perWorkerTiming: true, + taskIdx: unsatisfied, + timing: { start: t0, end: t1 }, + lane: myLane, + output: depResult, + }); + continue; + } + + // Other unsatisfied dep types: release and re-scan. + Atomics.store(views.status, taskIdx, READY); + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, 1); + continue; + } + + // ── Execute task ── + const handler = handlerById[meta.handlerIdx]; + if (!handler) { + parentPort.postMessage({ taskFailed: taskIdx, message: `unknown handlerIdx: ${meta.handlerIdx}`, stack: "" }); + return; + } + + const t0 = Date.now(); + let result; + try { + result = await handler(taskIdx); + } catch (err) { + parentPort.postMessage({ taskFailed: taskIdx, message: err.message, stack: err.stack }); + Atomics.store(views.status, taskIdx, 4); // FAILED + return; + } + const t1 = Date.now(); + + // Post output BEFORE the SAB update (ordering constraint: the merge + // message must arrive on the main thread before any downstream + // main-thread task could be claimed). + parentPort.postMessage({ + done: taskIdx, + output: result, + timing: { start: t0, end: t1 }, + lane: myLane, + }); + + const { readyCount, wakeMain } = onTaskDone(views, taskIdx, myLane); + if (readyCount > 0) { + Atomics.add(views.notify, 0, 1); + Atomics.notify(views.notify, 0, readyCount); + } + if (wakeMain) { + parentPort.postMessage({ mainTaskReady: true }); + } + } +} + +function reconstructLinkTables({ byPath, byUrl, byRedirect }) { + const make = (pairs) => new Map(pairs.map(([k, pl]) => [k, { permalink: pl }])); + return { byPath: make(byPath), byUrl: make(byUrl), byRedirect: make(byRedirect) }; +} diff --git a/builder/dot.mjs b/builder/dot.mjs new file mode 100644 index 00000000..70cc535e --- /dev/null +++ b/builder/dot.mjs @@ -0,0 +1,139 @@ +// Graphviz/DOT preprocessor: regenerates +// `<srcRoot>/assets/images/dot/*.svg` from the matching `*.dot` source +// when the SVG is missing or older than its source. Runs as a seed task +// concurrently with the rest of the build so the freshly-emitted SVGs +// land in dispatch's site-paths set and the static-file copy pass. +// +// Idempotent: a second build with no source changes is a no-op (mtime +// check). The `.dot` is the canonical source; the SVG is a build +// artifact -- editing the .dot by one character regenerates the SVG on +// the next build. +// +// Drives `@hpcc-js/wasm-graphviz` directly -- a WebAssembly build of +// Graphviz. No puppeteer, no headless Chromium, no in-tree patches. +// `Graphviz.load()` initialises the WASM module once per build (~50 ms); +// `gv.dot(src)` is synchronous after that. +// +// Failure modes split into two: +// - SETUP (@hpcc-js/wasm-graphviz not installed): warn + leave on-disk +// SVGs intact + return early with setupSkipped: true. The +// orchestrator does NOT flip the exit code so a fresh checkout +// without `npm install` still builds against the previous SVGs. +// - CONTENT (one .dot has a syntax error, gv.dot throws): warn + keep +// that diagram's old SVG + continue the rest of the batch. The +// orchestrator (tbdocs.mjs) flips process.exitCode = 1 on the +// returned `failed` count so a broken diagram surfaces in CI. + +import { promises as fs } from "node:fs"; +import path from "node:path"; + +const DOT_REL_DIR = path.join("assets", "images", "dot"); + +export async function regenerateDot(srcRoot) { + const dotRoot = path.join(srcRoot, DOT_REL_DIR); + const sources = await listDotSources(dotRoot); + if (sources.length === 0) { + return { processed: 0, regenerated: 0, svgFiles: [] }; + } + + const stale = []; + for (const src of sources) { + const svg = svgFor(src); + if (!(await isUpToDate(svg, src))) stale.push({ src, svg }); + } + if (stale.length === 0) { + return { processed: sources.length, regenerated: 0, + svgFiles: await statSvgFiles(sources, srcRoot) }; + } + + let Graphviz; + try { + ({ Graphviz } = await import("@hpcc-js/wasm-graphviz")); + } catch (err) { + console.warn( + `dot: skipped batch (${explainLoadFailure(err)}); existing SVGs retained`, + ); + return { processed: sources.length, regenerated: 0, failed: 0, setupSkipped: true, + svgFiles: await statSvgFiles(sources, srcRoot) }; + } + + let gv; + try { + gv = await Graphviz.load(); + } catch (err) { + console.warn( + `dot: skipped batch (WASM load failed: ${err.message}); existing SVGs retained`, + ); + return { processed: sources.length, regenerated: 0, failed: 0, setupSkipped: true, + svgFiles: await statSvgFiles(sources, srcRoot) }; + } + + let regenerated = 0; + let failed = 0; + for (const { src, svg } of stale) { + try { + const source = await fs.readFile(src, "utf8"); + const svgXml = gv.dot(source); + await fs.writeFile(svg, svgXml, "utf8"); + regenerated++; + } catch (err) { + console.warn( + `dot: skipped ${path.basename(src)} (${err.message}); existing SVG retained`, + ); + failed++; + } + } + return { processed: sources.length, regenerated, failed, + svgFiles: await statSvgFiles(sources, srcRoot) }; +} + +async function statSvgFiles(sources, srcRoot) { + const results = []; + for (const src of sources) { + const svgPath = svgFor(src); + try { + const stat = await fs.stat(svgPath); + const srcRel = path.relative(srcRoot, svgPath).replace(/\\/g, "/"); + results.push({ srcPath: svgPath, srcRel, destRel: srcRel, size: stat.size }); + } catch { + // SVG not on disk (render failed or never generated); skip. + } + } + return results; +} + +async function listDotSources(dotRoot) { + try { + const entries = await fs.readdir(dotRoot); + return entries + .filter((n) => n.endsWith(".dot")) + .map((n) => path.join(dotRoot, n)); + } catch (err) { + if (err.code === "ENOENT") return []; + throw err; + } +} + +function svgFor(src) { + return src.replace(/\.dot$/, ".svg"); +} + +async function isUpToDate(svg, src) { + try { + const [srcStat, svgStat] = await Promise.all([ + fs.stat(src), + fs.stat(svg), + ]); + return svgStat.mtimeMs >= srcStat.mtimeMs; + } catch { + return false; + } +} + +function explainLoadFailure(err) { + const msg = err?.message ?? String(err); + if (/cannot find module ['"]@hpcc-js\/wasm-graphviz|cannot find package ['"]@hpcc-js\/wasm-graphviz/i.test(msg)) { + return "@hpcc-js/wasm-graphviz not installed; run `npm install`"; + } + return msg; +} diff --git a/builder/gantt.mjs b/builder/gantt.mjs new file mode 100644 index 00000000..04c2ddd8 --- /dev/null +++ b/builder/gantt.mjs @@ -0,0 +1,202 @@ +// Inline SVG Gantt chart for the build timeline. +// Replaces the client-side Mermaid renderer — no JS runtime needed. + +const COLORS = { + Seeds: { light: "#86c7a3", dark: "#3d8b5e" }, + Spine: { light: "#6eb5d9", dark: "#3c7db0" }, + Render: { light: "#b09cd8", dark: "#8066a8" }, + Write: { light: "#e8a756", dark: "#c08030" }, + Boot: { light: "#e57373", dark: "#c62828" }, + Cold: { light: "#5b7fb5", dark: "#2c4a7c" }, + Env: { light: "#e8a756", dark: "#c08030" }, + Other: { light: "#bbb", dark: "#666" }, +}; + +const SECTION_W = 24; + +const BOOT_STYLE = { + cold: { label: "cold", cls: "cold" }, + warmInit: { label: "warm", cls: "boot" }, + renderEnvInit: { label: "env", cls: "env" }, +}; +const SVG_W = 900; +const CHART_W = SVG_W - SECTION_W - 20; +const ROW_H = 20; +const BAR_H = 14; +const AXIS_H = 28; +const CHAR_W = 6.2; +const BAR_PAD = 4; + +export function renderGantt(grouped) { + const all = [...grouped.values()].flat(); + if (all.length === 0) return ""; + const maxT = Math.max(...all.map(t => t.end)); + if (maxT <= 0) return ""; + + // Any task with a lane ran on a worker — pull it into the Workers + // section, tagged with its original section for bar colour. Leftover + // Render tasks (dispatch, prepDest) fold into Spine. + const seeds = [], spine = [], write = []; + const laneTasks = []; + for (const [section, tasks] of grouped) { + for (const t of tasks) { + if (t.lane != null) { t._color = section; laneTasks.push(t); } + else if (section === "Seeds") seeds.push(t); + else if (section === "Spine" || section === "Render") spine.push(t); + else if (section === "Write") write.push(t); + } + } + const mainSections = [["Seeds", seeds], ["Spine", spine], ["Write", write]]; + + const lanes = new Map(); + for (const t of laneTasks) { + if (!lanes.has(t.lane)) lanes.set(t.lane, []); + lanes.get(t.lane).push(t); + } + for (const tasks of lanes.values()) + tasks.sort((a, b) => a.workerStart - b.workerStart); + const sortedLanes = [...lanes.entries()].sort((a, b) => a[0] - b[0]); + + let rows = sortedLanes.length; + for (const [, tasks] of mainSections) rows += tasks.length; + const h = AXIS_H + rows * ROW_H + 5; + const xOf = t => SECTION_W + (t / maxT) * CHART_W; + + const tick = niceInterval(maxT); + const ticks = []; + for (let t = 0; t <= maxT + 0.5; t += tick) ticks.push(t); + + const o = []; + o.push(`<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 ${SVG_W} ${h}" style="width:100%;max-width:${SVG_W}px">`); + o.push(`<title>Build task timeline`); + + const css = [ + `.gantt{font-family:system-ui,-apple-system,sans-serif}`, + `.gl{fill:#333}.gs{fill:#333;font-weight:600}.ga{fill:#666}.gg{stroke:#e0e0e0}`, + ...Object.entries(COLORS).map(([s, c]) => `.gb-${s.toLowerCase()}{fill:${c.light}}`), + `html.dark-mode .gl{fill:#e6e1e8}`, + `html.dark-mode .gs{fill:#e6e1e8}`, + `html.dark-mode .ga{fill:#959396}`, + `html.dark-mode .gg{stroke:#44434d}`, + ...Object.entries(COLORS).map(([s, c]) => `html.dark-mode .gb-${s.toLowerCase()}{fill:${c.dark}}`), + ]; + o.push(``); + o.push(``); + + for (const t of ticks) { + const x = rd(xOf(t)); + o.push(``); + o.push(`${fmtMs(t)}`); + } + + let y = AXIS_H; + + // Seeds, Spine (with dispatch / prepDest folded in) + for (const [section, tasks] of mainSections.slice(0, 2)) { + if (tasks.length === 0) continue; + y = renderMainSection(o, section, tasks, y, xOf); + } + + // Workers — one row per lane, individual task bars + if (sortedLanes.length > 0) { + o.push(``); + const lx = Math.round(SECTION_W / 2); + const ly = rd(y + sortedLanes.length * ROW_H / 2); + o.push(`Workers`); + for (let li = 0; li < sortedLanes.length; li++) { + const [, tasks] = sortedLanes[li]; + const ty = rd(y + ROW_H / 2 + 3.5); + const by = rd(y + (ROW_H - BAR_H) / 2); + for (const t of tasks) { + const bx = rd(xOf(t.workerStart)); + const bw = rd(Math.max(xOf(t.workerEnd) - xOf(t.workerStart), 1)); + let cls; + if (t._color === "Boot") { + const base = t.id.replace(/:.*/, ""); + cls = `gb-${BOOT_STYLE[base]?.cls ?? "boot"}`; + } else { + cls = `gb-${(t._color || "render").toLowerCase()}`; + } + o.push(``); + const lbl = workerLabel(t); + if (lbl.length * CHAR_W + BAR_PAD * 2 <= bw) + o.push(`${esc(lbl)}`); + } + y += ROW_H; + } + } + + // Write + for (const [section, tasks] of mainSections.slice(2)) { + if (tasks.length === 0) continue; + y = renderMainSection(o, section, tasks, y, xOf); + } + + o.push(``); + return o.join("\n"); +} + +function renderMainSection(o, section, tasks, y, xOf) { + o.push(``); + const cls = `gb-${section.toLowerCase()}`; + const lx = Math.round(SECTION_W / 2); + const ly = rd(y + tasks.length * ROW_H / 2); + o.push(`${esc(section)}`); + for (let i = 0; i < tasks.length; i++) { + const t = tasks[i]; + const bx = rd(xOf(t.start)); + const bw = rd(Math.max(xOf(t.end) - xOf(t.start), 1)); + const by = rd(y + (ROW_H - BAR_H) / 2); + const ty = rd(y + ROW_H / 2 + 3.5); + o.push(``); + if (t.t3 != null) { + const t3x = rd(xOf(t.t3)); + const t3w = rd(Math.max(t3x - (bx + bw), 1)); + const t3h = Math.round(BAR_H / 2); + const t3by = rd(by + BAR_H - t3h); + o.push(``); + } + const lbl = taskLabel(t); + const textW = lbl.length * CHAR_W; + if (textW + BAR_PAD * 2 <= bw) { + o.push(`${esc(lbl)}`); + } else if (bx + bw + 4 + textW <= SVG_W) { + o.push(`${esc(lbl)}`); + } else { + o.push(`${esc(lbl)}`); + } + y += ROW_H; + } + return y; +} + +function niceInterval(max) { + for (const c of [100, 200, 250, 500, 1000, 2000, 2500, 5000]) + if (max / c <= 10) return c; + return Math.ceil(max / 10000) * 1000; +} + +function fmtMs(ms) { + return `${Math.floor(ms / 1000)}.${String(ms % 1000).padStart(3, "0")}`; +} + +function taskLabel(t) { + let s = t.id.replace(":", " "); + if (t.workerStart != null) { + const d = t.end - t.start; + if (d > 0) { + const a = Math.round((t.workerStart - t.start) / d * 100); + const b = Math.round((t.workerEnd - t.workerStart) / d * 100); + s += ` (${a}%+${b}%)`; + } + } + return s; +} + +function workerLabel(t) { + const base = t.id.replace(/:.*/, "").replace(/ w\d+$/, ""); + return BOOT_STYLE[base]?.label ?? base; +} + +function rd(n) { return Math.round(n * 10) / 10; } +function esc(s) { return s.replace(/&/g, "&").replace(//g, ">"); } diff --git a/builder/highlight.mjs b/builder/highlight.mjs index 688014f8..437b88ec 100644 --- a/builder/highlight.mjs +++ b/builder/highlight.mjs @@ -15,8 +15,6 @@ // import { promises as fs } from "node:fs"; -import { createHighlighter } from "shiki"; - import { loadHighlightTheme } from "./highlight-theme.mjs"; // Fenced-info aliases that select the bundled tB grammar. @@ -67,6 +65,7 @@ export async function initHighlighter() { const grammarUrl = new URL("./twinbasic.tmLanguage.json", import.meta.url); const grammarText = await fs.readFile(grammarUrl, "utf8"); const tbGrammar = JSON.parse(grammarText); + const { createHighlighter } = await import("shiki"); shiki = await createHighlighter({ themes: [], langs: [tbGrammar, ...SHIKI_LANGS], diff --git a/builder/mermaid.mjs b/builder/mermaid.mjs deleted file mode 100644 index 483f2395..00000000 --- a/builder/mermaid.mjs +++ /dev/null @@ -1,271 +0,0 @@ -// Phase 11 (B1) mermaid preprocessor: regenerates -// `/assets/images/mmd/*.svg` from the matching `*.mmd` source -// when the SVG is missing or older than its source. Runs as the first -// orchestrator step so the freshly-emitted SVGs land in Phase 1's -// discover sweep naturally; the static-file pass downstream copies them -// to `/assets/images/mmd/` like any other tracked asset. -// -// Idempotent: a second build with no source changes is a no-op (mtime -// check). The `.mmd` is the canonical source; the SVG is a build -// artifact -- editing the .mmd by one character regenerates the SVG on -// the next build. -// -// Drives `puppeteer` + the in-tree `mermaid` package directly. Replaces -// the older `npx mmdc` shell-out that needed `@mermaid-js/mermaid-cli` -// installed in builder/ (along with its own puppeteer-core and a second -// Chrome download). One browser launch covers the whole batch; previously -// every diagram forked a fresh node+chrome via npx. -// -// Failure modes split into two: -// - SETUP (puppeteer/mermaid not installed, Chrome missing): warn + -// leave on-disk SVGs intact + return early. The build continues at -// exit 0 against the previous SVGs -- this is the "dev hasn't run -// `npm install` yet" path and must not break unrelated work. -// - CONTENT (one .mmd has a syntax error, one render throws): warn + -// keep that diagram's old SVG + continue the rest of the batch. -// The orchestrator (tbdocs.mjs) sets process.exitCode = 1 on the -// returned `failed` count so a broken diagram surfaces in CI. -// -// Setup recovery: `npm install` at the repo root pulls puppeteer + the -// pinned mermaid; on a fresh machine `npx puppeteer browsers install -// chrome --install-deps` (already in the deploy workflow) lands the -// Chrome binary. -// -// ESM-from-file note: Chromium blocks the `import()` chain that -// mermaid.esm.mjs needs when loaded via file:// (the patched dagre lives -// in a lazily-loaded chunk; the IIFE bundle inlines + minifies past the -// patch). The mermaid-cli authors shipped a request-intercept shim for -// the same reason; we reproduce a stripped-down version mapping ONE root -// (mermaid/dist/) + ONE MIME type (application/javascript) under -// `https://tbdocs-mermaid.invalid`. Without this, lazy chunk loads -- -// including the patched dagre that scripts/patch-dagre.mjs edits -- all -// fail with file:// CORS errors. - -import { promises as fs } from "node:fs"; -import path from "node:path"; -import { createRequire } from "node:module"; -import { fileURLToPath } from "node:url"; - -const __dirname = path.dirname(fileURLToPath(import.meta.url)); -const require_ = createRequire(import.meta.url); -const MMD_REL_DIR = path.join("assets", "images", "mmd"); - -// Dummy origin -- the request-intercept handler resolves these back to -// files under `mermaid/dist/`. Must be a non-resolving host so a real -// network fetch never sneaks past the interceptor. -const INTERCEPT_ORIGIN = "https://tbdocs-mermaid.invalid"; - -export async function regenerateMermaid(srcRoot) { - const mmdRoot = path.join(srcRoot, MMD_REL_DIR); - const sources = await listMermaidSources(mmdRoot); - if (sources.length === 0) { - return { processed: 0, regenerated: 0 }; - } - - const stale = []; - for (const src of sources) { - const svg = svgFor(src); - if (!(await isUpToDate(svg, src))) stale.push({ src, svg }); - } - if (stale.length === 0) { - return { processed: sources.length, regenerated: 0 }; - } - - // Lazy-load puppeteer + resolve the mermaid dist directory. Either - // failure is a SETUP problem -- dev hasn't run `npm install`. Warn + - // bail so unrelated build work still runs against the existing SVGs. - let puppeteer; - let mermaidDistDir; - try { - puppeteer = (await import("puppeteer")).default; - mermaidDistDir = path.dirname( - require_.resolve("mermaid/dist/mermaid.esm.mjs"), - ); - } catch (err) { - console.warn( - `mermaid: skipped batch (${explainLoadFailure(err)}); existing SVGs retained`, - ); - return { processed: sources.length, regenerated: 0, failed: 0, setupSkipped: true }; - } - - let browser; - try { - browser = await puppeteer.launch({ - headless: true, - args: [ - "--no-sandbox", - "--disable-dev-shm-usage", - "--disable-gpu", - "--disable-software-rasterizer", - ], - }); - } catch (err) { - console.warn( - `mermaid: skipped batch (${explainLaunchFailure(err)}); existing SVGs retained`, - ); - return { processed: sources.length, regenerated: 0, failed: 0, setupSkipped: true }; - } - - // CONTENT failures (one diagram throws) don't abort the batch -- the - // orchestrator surfaces the `failed` count so every broken diagram - // produces a warning in one run, and the build's exit code reflects - // the overall outcome. - let regenerated = 0; - let failed = 0; - try { - const distReal = await fs.realpath(mermaidDistDir); - for (const { src, svg } of stale) { - const ok = await renderOne(browser, src, svg, distReal); - if (ok) regenerated++; - else failed++; - } - } finally { - await browser.close().catch(() => {}); - } - return { processed: sources.length, regenerated, failed }; -} - -async function listMermaidSources(mmdRoot) { - try { - const entries = await fs.readdir(mmdRoot); - return entries - .filter((n) => n.endsWith(".mmd")) - .map((n) => path.join(mmdRoot, n)); - } catch (err) { - if (err.code === "ENOENT") return []; - throw err; - } -} - -function svgFor(src) { - return src.replace(/\.mmd$/, ".svg"); -} - -async function isUpToDate(svg, src) { - try { - const [srcStat, svgStat] = await Promise.all([ - fs.stat(src), - fs.stat(svg), - ]); - return svgStat.mtimeMs >= srcStat.mtimeMs; - } catch { - // SVG missing (or unstattable source -- the renderer will surface it). - return false; - } -} - -// Wires up the intercept, navigates to a bare data:HTML page, dynamic- -// imports mermaid.esm.mjs via the intercept origin, runs mermaid.render -// against the diagram source, and writes the serialised SVG out. -async function renderOne(browser, srcPath, svgPath, mermaidDistReal) { - const definition = await fs.readFile(srcPath, "utf8"); - const page = await browser.newPage(); - let pageErr = null; - try { - await page.setRequestInterception(true); - page.on("request", (req) => interceptRequest(req, mermaidDistReal)); - page.on("pageerror", (err) => { pageErr = err; }); - - // data:HTML carries the container div mermaid.render writes into. - // The page origin is "data:" -- cross-origin from the intercept - // origin -- so the intercept serves CORS-permissive responses. - await page.goto( - "data:text/html;charset=utf-8," + encodeURIComponent( - "
", - ), - ); - - const mermaidEsmUrl = `${INTERCEPT_ORIGIN}/mermaid.esm.mjs`; - const svgXml = await page.evaluate( - async ({ definition, mermaidEsmUrl }) => { - const { default: mermaid } = await import(mermaidEsmUrl); - mermaid.initialize({ startOnLoad: false }); - const container = document.getElementById("container"); - // svgId `my-svg` matches mermaid-cli's default so the diff - // against pre-existing SVGs is just whatever the renderer - // actually changed (id is referenced by every `#my-svg ...` - // CSS rule mermaid scopes into the
${downloadLink}${copyLink}${downloadPng}${copyPng}
`; + const zoomScript = [ + `\n` + - ` `; -} // ---------- §6.2 / §6.3 URL helpers -------------------------------------- diff --git a/builder/worker-pool.mjs b/builder/worker-pool.mjs new file mode 100644 index 00000000..48f41831 --- /dev/null +++ b/builder/worker-pool.mjs @@ -0,0 +1,59 @@ +// Worker pool over node:worker_threads. Lifecycle manager: spawns workers, +// sends them the scheduling SAB, forwards output/error/mainTaskReady +// messages to the scheduler, and terminates workers on destroy. Workers pull +// tasks from the SAB; the pool has no dispatch or queue logic. + +import { Worker } from "node:worker_threads"; + +export class WorkerPool { + constructor(size, workerUrl) { + this._workerUrl = workerUrl; + this.bootTimings = []; + + // Incremented each time sendInit() ships a fresh SAB to the workers. + // runBuild() reads `_buildCount > 0` to detect rebuilds when serve.mjs + // reuses the pool across builds. + this._buildCount = 0; + + // Callbacks wired by the caller after construction. + this.onWorkerDone = null; // ({ done, output, timing, lane }) => void + this.onWorkerError = null; // ({ taskFailed, message, stack }) => void + this.onPerWorkerTiming = null; // ({ perWorkerTiming, taskIdx, timing, lane }) => void + this.onMainTaskReady = null; // () => void + + this._workers = Array.from({ length: size }, (_, i) => this._spawn(i)); + } + + _spawn(lane) { + const spawnTime = Date.now(); + const w = new Worker(this._workerUrl, { workerData: { lane, spawnTime } }); + w.on("message", (msg) => { + if (msg.coldBoot) { this.bootTimings.push({ lane, type: "cold", ...msg.coldBoot }); return; } + if (msg.perWorkerTiming) { this.onPerWorkerTiming?.(msg); return; } + if (msg.done != null) { this.onWorkerDone?.(msg); return; } + if (msg.taskFailed != null) { this.onWorkerError?.(msg); return; } + if (msg.mainTaskReady || msg.triggerMainTask != null) { this.onMainTaskReady?.(); return; } + }); + w.on("error", (err) => { + this.onWorkerError?.({ taskFailed: -1, message: err.message, stack: err.stack }); + }); + return w; + } + + sendInit(sab, ctx, idMapping) { + for (const w of this._workers) { + w.postMessage({ init: true, sab, ctx, idMapping }); + } + this._buildCount++; + } + + broadcastDynamicData(payloadSAB, sharedSAB) { + for (const w of this._workers) { + w.postMessage({ dynamicData: true, payloadSAB, sharedSAB }); + } + } + + destroy() { + return Promise.all(this._workers.map(w => w.terminate())); + } +} diff --git a/builder/write.mjs b/builder/write.mjs index 2c5c2f6d..3c33d0d5 100644 --- a/builder/write.mjs +++ b/builder/write.mjs @@ -35,7 +35,7 @@ export const WRITE_LIMIT = LIMIT; const mkdirCache = new Set(); const mkdirInflight = new Map(); -export async function writePhase(pages, staticFiles, { destRoot, dryRun = false, generatedAssets = [], baseurl = "" } = {}) { +export async function writePhase(pages, staticFiles, { destRoot, dryRun = false, generatedAssets = [], baseurl = "", skipPages = false } = {}) { if (!destRoot) { throw new Error("writePhase requires a destRoot"); } @@ -43,9 +43,6 @@ export async function writePhase(pages, staticFiles, { destRoot, dryRun = false, mkdirCache.clear(); mkdirInflight.clear(); - assertNoDestinationCollisions(pages, staticFiles); - await prepareDestination(destRoot, dryRun); - if (dryRun) { const pagesToWrite = pages.filter(p => p.html !== undefined).length; const skipped = pages.length - pagesToWrite; @@ -64,7 +61,7 @@ export async function writePhase(pages, staticFiles, { destRoot, dryRun = false, // generated asset ever land at the same rel as a vendored file, the // generated content wins. No such collision exists today. const [pagesStats, themeStats, staticStats] = await Promise.all([ - writePages(pages, destRoot, LIMIT), + skipPages ? { written: 0, skipped: 0 } : writePages(pages, destRoot, LIMIT), copyTheme(BUILDER_ASSETS, destRoot, LIMIT, baseurl), copyStaticFiles(staticFiles, destRoot, LIMIT, baseurl), ]); @@ -94,18 +91,34 @@ async function writeGeneratedAssets(assets, destRoot, limit, baseurl) { }); } -// ---------- §5.1 prepareDestination ------------------------------------- +// ---------- §5.1 prepareDestinations ------------------------------------ -async function prepareDestination(destRoot, dryRun) { +export async function prepareDestinations(roots, dryRun) { if (dryRun) { - console.log(`[dry-run] would clean ${destRoot}`); + for (const root of roots) console.log(`[dry-run] would clean ${root}`); return; } - if (!isUnderProject(destRoot)) { - throw new Error(`refusing to clean ${destRoot}: not under the project tree`); + await Promise.all(roots.map(async (root) => { + if (!isUnderProject(root)) { + throw new Error(`refusing to clean ${root}: not under the project tree`); + } + await fs.rm(root, { recursive: true, force: true }); + await fs.mkdir(root, { recursive: true }); + })); +} + +// Pre-create all page output directories so writePages can skip mkdir +// entirely. Runs as a separate task concurrently with render workers. +export async function preparePageDirs(pages, staticFiles, destRoot, offlineRoot) { + assertNoDestinationCollisions(pages, staticFiles); + const dirs = new Set(); + for (const page of pages) { + if (page.destPath) { + dirs.add(path.dirname(path.join(destRoot, page.destPath))); + if (offlineRoot) dirs.add(path.dirname(path.join(offlineRoot, page.destPath))); + } } - await fs.rm(destRoot, { recursive: true, force: true }); - await fs.mkdir(destRoot, { recursive: true }); + await Promise.all([...dirs].map(d => fs.mkdir(d, { recursive: true }))); } export function isUnderProject(destRoot) { @@ -120,12 +133,10 @@ async function writePages(pages, destRoot, limit) { let skipped = 0; await runLimited(pages, limit, async (page) => { if (page.html === undefined) { - // book.html (layout: book-combined) -- Phase 8 owns it. skipped++; return; } const dest = path.join(destRoot, page.destPath); - await mkdirRec(path.dirname(dest)); await safeWrite(dest, () => fs.writeFile(dest, page.html, "utf8")); written++; }); @@ -187,7 +198,7 @@ async function copyStaticFiles(staticFiles, destRoot, limit, baseurl) { // ---------- §6.4 assertNoDestinationCollisions -------------------------- -function assertNoDestinationCollisions(pages, staticFiles) { +export function assertNoDestinationCollisions(pages, staticFiles) { const pageDests = new Set( pages.filter(p => p.html !== undefined).map(p => p.destPath), ); diff --git a/docs/Documentation/BuildInfo.md b/docs/Documentation/BuildInfo.md new file mode 100644 index 00000000..f1c4d185 --- /dev/null +++ b/docs/Documentation/BuildInfo.md @@ -0,0 +1,14 @@ +--- +title: Build Info +parent: Documentation Development +nav_order: 100 +has_toc: false +permalink: /Documentation/Development/BuildInfo +--- + +# Build Info +{: .no_toc } + +Gantt chart of this build's task timeline. + + diff --git a/docs/Documentation/Builder.md b/docs/Documentation/Builder.md index ad46c598..9401dca8 100644 --- a/docs/Documentation/Builder.md +++ b/docs/Documentation/Builder.md @@ -15,277 +15,362 @@ Detailed technical documentation for the `tbdocs` static site generator at [`bui Module-level documentation lives next to the code: - [`builder/README.md`](https://github.com/twinbasic/documentation/blob/main/builder/README.md) --- quickstart and the per-module map. -- [`builder/PLAN.md`](https://github.com/twinbasic/documentation/blob/main/builder/PLAN.md) --- architecture overview and the full eleven-phase pipeline. -- [`builder/PLAN-1.md`](https://github.com/twinbasic/documentation/blob/main/builder/PLAN-1.md) through [`PLAN-11.md`](https://github.com/twinbasic/documentation/blob/main/builder/PLAN-11.md) --- per-phase specs: inputs, outputs, edge cases, acceptance checks. -- [`builder/FUTURE-WORK.md`](https://github.com/twinbasic/documentation/blob/main/builder/FUTURE-WORK.md) --- open follow-ups, grouped by divergence investigations / deferred enhancements. +- [`builder/PLAN.md`](https://github.com/twinbasic/documentation/blob/main/builder/PLAN.md) --- the original architecture overview from the port. +- [`builder/PLAN-sab-pull-scheduler.md`](https://github.com/twinbasic/documentation/blob/main/builder/PLAN-sab-pull-scheduler.md) --- the current scheduler design: pull model, SAB layout, per-phase rollout notes. +- [`builder/FUTURE-WORK.md`](https://github.com/twinbasic/documentation/blob/main/builder/FUTURE-WORK.md) --- open follow-ups. Sub-pages: -- [Pipeline Stages](Pipeline-Stages) --- complete interface reference: function signatures, per-stage reads/writes, and every exported symbol. +- [Pipeline Stages](Pipeline-Stages) --- complete interface reference: per-task signatures, per-module export tables, scheduler-level concepts. - [Book Configuration](Book-Configuration) --- `_book.yml` key reference for the PDF chapter manifest. -- [Extending the Builder](Extending) --- tutorial for adding a new pipeline stage or a markdown-it plugin. +- [Extending the Builder](Extending) --- tutorial for adding a new task or markdown-it plugin. * TOC goes here {:toc} ## Why tbdocs exists -The site was originally built with **Jekyll** + the **just-the-docs** theme. The eleven-phase port to Node.js + a tiny dependency set produces byte-equivalent output to Jekyll modulo a documented allow-list. The win is end-to-end build time (~11s → ~3s) and a 25x faster GENERATE phase --- ten Ruby plugins totalling ~1,460 lines collapsed into four JS modules of ~650 lines. The Ruby toolchain (Gemfile, `_plugins/`, `_includes/`, `_layouts/`, `_sass/`) was retained in tree for one release cycle as reference after the cutover, then dropped --- the project no longer depends on Ruby in any form. +The site was originally built with the just-the-docs Jekyll theme; tbdocs is the Node.js replacement that drives the same content model and the same output structure without a Ruby toolchain. The win once the port was settled is mostly internal: a fixed dependency set, end-to-end build time around 2--3 seconds on a modern laptop, and one process for all three output trees. -## Architecture +The rework documented here is internal. The build moved from a push-style scheduler (the main thread decides what is ready and hands work to workers) to a SAB-based pull scheduler (workers read shared task state and claim work themselves) and three pieces of per-page work that used to run serially on the main thread --- offline rewrite, per-page SEO, and search-index derivation --- now run inside the render workers. The output is byte-equivalent to the previous scheduler; the change is in how the time is spent. -One entry point, ~17 production modules. The content model is fixed (markdown + YAML frontmatter), the output structure is fixed (three trees), the template is one layout with variations. +## Architecture at a glance -| File | Role | -|---|---| -| [`tbdocs.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/tbdocs.mjs) | Entry point. Parses CLI flags, dispatches to `runBuild` or `runServe`, prints per-phase timings. | -| [`serve.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/serve.mjs) | Phase 12 dev server: HTTP static file server + recursive watcher + SSE live-reload. | -| [`discover.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/discover.mjs) | Phase 1. Traverses `docs/`, parses frontmatter, classifies each file as a page or a static file. | -| [`nav.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/nav.mjs) | Phase 2 nav substeps: nav-path, integrity check, nav tree, nav levels, breadcrumbs, children. | -| [`seo.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/seo.mjs) | Phase 2 SEO precompute: per-page title / canonical / og: tags. | -| [`book.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/book.mjs) | Phase 2 book chapter resolution + Phase 8 book.html assembly. | -| [`build-info.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/build-info.mjs) | Phase 2 git commit hash + commit date capture. | -| [`data.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/data.mjs) | Phase 2 `_book.yml` loader. | -| [`mermaid.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/mermaid.mjs) | Phase 11 (B1) preprocess: `.mmd` → `.svg` regeneration. | -| [`scss.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/scss.mjs) | Phase 11 (B3) preprocess: compiles `docs/assets/css/just-the-docs-combined.scss` via Dart Sass into the just-the-docs stylesheet. | -| [`render.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/render.mjs) | Phase 3 markdown-it pipeline: GFM admonitions, kramdown-style attributes, deflist, footnotes, header IDs, TOC, relative-link rewriting. | -| [`highlight.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/highlight.mjs) | Phase 3 Shiki bootstrap plus the twinBASIC grammar. Emits the just-the-docs wrapper structure. | -| [`highlight-theme.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/highlight-theme.mjs) | Phase 11 (B2) theme loader: reads `themes/*.theme`, derives the palette, emits `tb-highlight.css` and the scope-to-class lookup. | -| [`template.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/template.mjs) | Phase 4 layout. Replaces ~13 Liquid includes with direct JS string concatenation. | -| [`compress.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/compress.mjs) | Phase 4 HTML whitespace compression. | -| [`write.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/write.mjs) | Phase 5 online tree writer. | -| [`paths.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/paths.mjs) | Shared permalink-to-destination-path helper. | -| [`redirects.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/redirects.mjs) | Phase 6 redirect-stub generator. | -| [`sitemap.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sitemap.mjs) | Phase 6 sitemap.xml + robots.txt. | -| [`search.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/search.mjs) | Phase 6 Lunr index emitter (`search-data.json`). | -| [`offline.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/offline.mjs) | Phase 7 offline tree: URL rewriting, JS patching for `file://` browsing. | -| [`pdf.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/pdf.mjs) | Phase 8 sparse PDF source tree. | - -`builder/` lives at the repo root (not under `docs/`) so it is not part of the Jekyll source tree the legacy renderer reads. The `build.bat` path writes to `docs/_site/`, `docs/_site-offline/`, and `docs/_site-pdf/` --- the same destinations Jekyll used, so deployment tooling stays unchanged. The `serve.bat` path writes to a separate `docs/_serve/` tree so a one-off `build.bat` run (refreshing the PDF, for example) never clobbers a running serve session's output. - -## Build phases - -| Phase | Module(s) | Job | Time | -|---|---|---|---| -| 1 | `discover.mjs` | Read `.md` / `.html` with frontmatter; enumerate static files | ~120 ms | -| 2 | `nav.mjs` / `seo.mjs` / `book.mjs` / `build-info.mjs` / `data.mjs` | Compute nav tree, SEO, book chapters, git commit info, `_book.yml` | ~60 ms | -| 3 | `render.mjs` + `highlight.mjs` | Markdown → HTML body | ~1-2 s | -| 4 | `template.mjs` + `compress.mjs` | Wrap in layout, anchor headings, compress whitespace | ~200 ms | -| 5 | `write.mjs` | Write `_site/` | ~400 ms | -| 6 | `redirects.mjs` / `sitemap.mjs` / `search.mjs` | Redirect stubs, sitemap.xml, search-data.json, robots.txt | ~100 ms | -| 7 | `offline.mjs` | URL-rewritten copy to `_site-offline/` | ~1,000 ms | -| 8 | `pdf.mjs` + `book.mjs` | Sparse `_site-pdf/` tree (book.html + CSS + images) | ~150 ms | - -Phases 9, 10, and 11 are historical: Phase 9 was a no-output QoL pass, Phase 10 retired Jekyll, Phase 11 introduces the output-changing parity updates. None adds a runtime step. Phase 12 adds the `--serve` dev-server mode (a separate lifecycle, not a build phase; writes to `docs/_serve/` and skips the offline + PDF passes by default so the rebuild loop stays under one second). The per-phase `PLAN-N.md` files retain the implementation history. - -## Dependencies +One entry point, ~28 modules, three output trees, N+1 threads. -A single `package.json` at the repo root carries everything --- the static site generator's deps, the PDF renderer's deps, and the few packages both consume. There is no per-`builder/` `package.json` (an earlier split was consolidated; the previous arrangement required `npm ci --prefix builder` in CI and ended up dragging in a duplicate puppeteer-core via `@mermaid-js/mermaid-cli`): +`runBuild()` allocates a `SharedArrayBuffer` holding the scheduling state (task status, dependency counts, successor edges), spawns one worker per available CPU, sends each worker a reference to the SAB, and lets the workers and the main thread compete for ready tasks. There is no central dispatcher; each thread scans the SAB, claims a task it is eligible to run, executes it, and updates the SAB so the next task becomes claimable. The main thread participates on equal footing for tasks marked `runOnMain` --- mostly the ones that mutate the master `pages[]` array or coordinate filesystem layout. -```json -{ - "devDependencies": { - "acorn": "^8.0", - "acorn-walk": "^8.0", - "fast-glob": "^3.3", - "gray-matter": "^4.0", - "html-entities": "^2.6.0", - "htmlparser2": "^12.0.0", - "js-yaml": "^4.1", - "markdown-it": "^14.0", - "markdown-it-attrs": "^4.3", - "markdown-it-deflist": "^3.0", - "markdown-it-footnote": "^4.0", - "mermaid": "11.15.0", - "pdf-lib": "1.17.1", - "puppeteer": "25.0.4", - "sass": "^1.0", - "shiki": "^1.0" - }, - "scripts": { - "postinstall": "node builder/scripts/patch-dagre.mjs" - } -} -``` +The three output trees are unchanged from the earlier design: -No template engine, no framework, no bundler. `acorn` + `acorn-walk` parse the upstream `just-the-docs.js` so the offline patcher can target the AST instead of regex-matching strings; `markdown-it-{attrs,deflist,footnote}` cover the kramdown extensions the legacy renderer supported; `shiki` does the syntax highlighting; `lunr` powers the search index. `mermaid` and `puppeteer` together drive the `.mmd` → `.svg` pre-phase (one headless Chromium per batch, replacing the old per-diagram `npx mmdc` fork); `puppeteer` is shared with the PDF renderer (`book/render-book.mjs`). `sass` (Dart Sass) compiles the vendored just-the-docs SCSS plus our customizations into the site stylesheet on every build, replacing the Jekyll-Sass pre-compile step. `pdf-lib` + `html-entities` + `htmlparser2` are the PDF renderer's own toolchain. The `postinstall` runs `builder/scripts/patch-dagre.mjs`, which rewrites mermaid's bundled dagre adapter --- see [Mermaid Dagre Patches](Fixes/Dagre). +| Tree | Purpose | Phase | +|---|---|---| +| `_site/` | Online tree deployed to `docs.twinbasic.com`. | Render fan-out + write | +| `_site-offline/` | `file://`-browsable mirror with every URL rewritten to a page-relative path. | Per-page rewrite folded into render workers | +| `_site-pdf/` | Sparse source tree (`book.html` + CSS + images) the PDF renderer consumes. | Assembled after all pages have rendered | -`mermaid` is **exact-pinned** (`"11.15.0"`, not `"^11.15.0"`). The dagre patches target a chunk filename whose hash component (`dagre-ZXKKJJHT.mjs`) is regenerated on each mermaid release, so a floated range could break the postinstall on a transparent patch bump. +`builder/` lives at the repo root, not under `docs/`, so the generator source is not part of the content tree it reads. Build outputs go to `docs/_site/`, `docs/_site-offline/`, and `docs/_site-pdf/`; the serve-mode dev server writes to a separate `docs/_serve/` tree so a one-off `build.bat` invocation never clobbers a running serve session's output. -## Per-module deep dive +## Module map -Each subsection covers the design rationale and implementation details for one module. For function signatures, data contracts, and the complete export table of each module, see [Pipeline Stages](Pipeline-Stages). Modules are presented in pipeline order. +Modules grouped by role. Each entry has one line; deep-dive in [Pipeline Stages](Pipeline-Stages). -### [tbdocs.mjs](https://github.com/twinbasic/documentation/blob/main/builder/tbdocs.mjs) --- entry point and orchestrator +**Orchestration and scheduling** -`_config.yml` is loaded first so its `exclude:` list can be passed to `discover()`. `captureBuildInfo()` is launched as a promise immediately after the config load so the two `git` shell-outs overlap with the I/O-bound discover and the CPU-bound nav computation that follows; the result is `await`ed only once Phase 2's other substeps are done. The shared markdown-it instance is built once via `initHighlighter` + `createMarkdownIt` and stored on `site.markdown` so Phase 2's SEO precompute and Phase 3's body renderer use the same configured pipeline --- titles run through the same dash, quote, and footnote-stripping rules as page body text. +| File | Role | +|---|---| +| [`tbdocs.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/tbdocs.mjs) | Entry point. Defines the static `TASKS` graph, allocates the SAB, spawns the pool, runs the build, injects the Gantt chart. | +| [`scheduler.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/scheduler.mjs) | Main-thread side of the pull scheduler: claim loop, results map, completion detection, summary printer. | +| [`worker-pool.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/worker-pool.mjs) | Worker lifecycle wrapper: spawn, send the SAB, forward messages to the scheduler, terminate. No dispatch logic. | +| [`cpu-worker.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/cpu-worker.mjs) | Worker harness. Runs the pull loop, holds the eight named handlers, drives speculative idle execution. | +| [`sab-scheduler.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sab-scheduler.mjs) | SAB layout, allocation, task-metadata API. Constants and atomics primitives consumed by both the scheduler and the workers. | +| [`sab-broadcast.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sab-broadcast.mjs) | JSON-over-SAB pack/unpack for the shared payload (config, link tables, sidebar HTML, etc.) broadcast to every render worker. | -The drift guard at the end (`if (pages.length < 836)`) sets `process.exitCode = 1` when discover loses pages --- a discovery-rule regression that silently drops content appears as a non-zero exit even though the build itself "succeeded". +**Discovery and compute** -### [serve.mjs](https://github.com/twinbasic/documentation/blob/main/builder/serve.mjs) --- Phase 12 dev server +| File | Role | +|---|---| +| [`discover.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/discover.mjs) | Source tree walk; parses frontmatter and classifies each file as a page or a static file. | +| [`nav.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/nav.mjs) | Sidebar tree, integrity check, breadcrumbs, per-page `navLevels`. | +| [`seo.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/seo.mjs) | Site-level SEO on main (`computeSiteSeo`); per-page SEO on workers (`computeChunkSeo`). | +| [`book.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/book.mjs) | Chapter selector resolution (Phase 2 half) + `book.html` assembly (Phase 8 half). | +| [`build-info.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/build-info.mjs) | Git commit hash + date capture. Runs on a worker so the shell-outs hide behind the main spine. | +| [`data.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/data.mjs) | Loads `_book.yml`. | -The 300 ms debounce coalesces rapid file changes into a single rebuild. A lightweight inject middleware splices the SSE client script before `` at HTTP-response time so the on-disk `_serve/` stays byte-identical to what `runBuild --dest docs/_serve` would have produced outside of serve mode. +**Preprocessing** -`shouldRebuild` filters watcher events along three axes: prefixes (`_site/`, `_site-offline/`, `_site-pdf/`, `_serve/`, `_pdf/`, `node_modules/`, `.git/`), basename patterns (dotfiles, editor swap files, the `4913` sentinel vim writes), and the specific `assets/images/mmd/*.svg` path. The last bit deserves a callout: those SVGs are emitted by the mermaid pre-phase back under `srcRoot`, so without the filter every `.mmd` edit fires the watcher twice (once on the `.mmd` save, once on the `.svg` write mid-rebuild) and the queued second rebuild is a no-op that triggers a redundant browser reload ~3 s later. The filter treats the `.mmd` as the source of truth and the `.svg` as a build artifact, matching how `_site/` writes are already excluded. +| File | Role | +|---|---| +| [`dot.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/dot.mjs) | Regenerates stale `.dot` → `.svg` via the WASM build of Graphviz (`@hpcc-js/wasm-graphviz`). | +| [`scss.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/scss.mjs) | Dart Sass over the vendored just-the-docs SCSS. Split across `scssLight` + `scssDark` worker tasks, joined on main. | -### [discover.mjs](https://github.com/twinbasic/documentation/blob/main/builder/discover.mjs) --- Phase 1 +**Render hot path** -The `exclude:` list from `_config.yml` is passed in as the `ignore` parameter and forwarded directly to `fast-glob`. It skips every underscore-prefixed file and directory (`_config.yml`, `_book.yml`, `_site/`, `_site-offline/`, `_site-pdf/`, every `_Images/` at any depth), SCSS sources (`**/*.scss`, compiled separately by [`scss.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/scss.mjs)), Mermaid sources (`**/*.mmd`, the `.svg` siblings are kept), and the obvious cache dirs. +| File | Role | +|---|---| +| [`render.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/render.mjs) | markdown-it configuration + plugin stack + `renderPhase`. Built once on main and once per worker. | +| [`highlight.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/highlight.mjs) | Shiki bootstrap + the bundled twinBASIC grammar. Emits the just-the-docs wrapper structure. | +| [`highlight-theme.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/highlight-theme.mjs) | Loads `Light.theme` + `Dark.theme`, emits `tb-highlight.css` + scope-to-class lookup. | +| [`template.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/template.mjs) | `templatePhase` (per-page layout wrap) + `buildInitConfig` + `renderSidebar`. JS template literals; no template engine. | +| [`compress.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/compress.mjs) | Whitespace compression outside `
` blocks. |
 
-The final `pages.sort(byName)` mirrors Jekyll's `site.pages.sort_by!(&:name)` --- sort by basename, leaving fast-glob's input order to break ties (which `nav_order` then resolves deterministically in Phase 2).
+**Write phase**
 
-### [nav.mjs](https://github.com/twinbasic/documentation/blob/main/builder/nav.mjs) --- Phase 2 navigation
+| File | Role |
+|---|---|
+| [`write.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/write.mjs) | Asset / static-file writer + shared I/O helpers (`mkdirRec`, `runLimited`, `writeFileMkdirp`). |
+| [`paths.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/paths.mjs) | Permalink → destination-path helper. |
+| [`redirects.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/redirects.mjs) | `redirect_from:` stub generator. |
+| [`sitemap.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sitemap.mjs) | `sitemap.xml` + `robots.txt`. |
+| [`search.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/search.mjs) | `deriveSearchEntries` (per-chunk, on workers) + `writeSearchDataFromChunks` (consolidator, on main). |
 
-The shared-state approach is what gives the JS port its 25x speedup over the Ruby plugins it replaces --- each Ruby plugin used to rebuild the same intermediate maps from scratch.
+**Offline and PDF**
 
-The integrity check is the only path that can abort the build mid-Phase-2. Two failure modes: **ambiguity** (multiple pages share the title declared in `parent:` and `grand_parent:` doesn't disambiguate) and **orphan** (no page has that title at all). Both report one error per offending page plus the `srcRel` path so the fix is obvious.
+| File | Role |
+|---|---|
+| [`offline.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/offline.mjs) | Offline-tree writer + just-the-docs.js AST patcher + `search-data.js` wrapper. |
+| [`offline-rewrite.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/offline-rewrite.mjs) | Pure rewrite helpers (`deriveOfflinePageCached`, CSS url() rewrite, site-path set construction). Worker-safe; no node:fs dependency. |
+| [`pdf.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/pdf.mjs) | `_site-pdf/` writer: `book.html` + `tb-highlight.css` + `print.css` + referenced images. |
 
-`sortPages` implements Jekyll's four-bucket sort: numeric `nav_order`, then string `nav_order`, then numeric `title`, then string `title`. `case_insensitive` is opt-in via `_config.yml`. The cycle defence in `buildNavNode` (the `chain.some` check) bounds tree depth at `NAV_TREE_MAX_DEPTH = 16` so a circular `parent:` chain caps out instead of recursing forever.
+**Dev mode and reporting**
 
-### [seo.mjs](https://github.com/twinbasic/documentation/blob/main/builder/seo.mjs) --- Phase 2 SEO precompute
+| File | Role |
+|---|---|
+| [`serve.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/serve.mjs) | Long-lived dev server: HTTP, recursive watcher, SSE reload, persistent worker pool. |
+| [`gantt.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/gantt.mjs) | Inline SVG Gantt chart of the build timeline. Injected into the [Build Info](BuildInfo) page at the end of each build. |
 
-The Liquid filter chain it replaces is `text | markdownify | strip_html | normalize_whitespace | escape_once` --- `renderTitle()` is the JS port (markdown-it render, then the `stripHtml` helper, then `\s+` collapse + trim, then escape only the five HTML-active characters via `HTML_ESCAPE_ONCE_REGEXP`).
+## The pull-based SAB scheduler
 
-834 of 836 page titles on the site are plain ASCII strings where the pipeline collapses to a one-character escape; the remaining two (`Concat.md` and `LineContinuation.md` --- titles containing `&` and `\`) exercise the wrap-and-strip path. The shared markdown-it instance is mandatory; Phase 2 fails fast if the orchestrator forgot to build it via `createMarkdownIt` first.
+The scheduler models the build as a directed acyclic graph of tasks. Each task has predecessors (`expected`), a body (`execute` on main, or a named `handler` on a worker), and an output router (`submit`) that merges its result into the shared `SharedState`.
 
-`stripHtml` and `absoluteUrl` are also exported for [`search.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/search.mjs) (search-index content sanitiser) and for [`sitemap.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sitemap.mjs) / [`redirects.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/redirects.mjs) (absolute URL composition) --- the same byte-for-byte URL helper used for canonical tags is shared with the Phase 6 auxiliary writers.
+In the previous push-style design the main thread held the ready queue. While a `runOnMain` task body was running, the event loop was blocked: worker completion messages waited in the message queue, no new tasks were dispatched, and on a 16-core machine the idle time across all threads added up to roughly a second --- significant against a sub-two-second build. The current pull design eliminates that round-trip: workers read task state directly from a `SharedArrayBuffer`, claim tasks via `Atomics.compareExchange`, and wake siblings via `Atomics.notify` when they finish.
 
-### [book.mjs](https://github.com/twinbasic/documentation/blob/main/builder/book.mjs) --- Phase 2 chapter resolution + Phase 8 assembly
+### Task lifecycle
 
-The largest module by line count (~990 lines), split into two clearly-labelled halves by section comments.
+Each task slot in the SAB has a status:
 
-**§A: Phase 2 chapter resolution.** `resolveBookChapters(bookData, pages)` iterates over every entry / part / chaptered-part-chapter in `_data/book.yml` and resolves its `page` / `pages` / `nav_page` / `nav_pages` / `no_descent` selector schema to a concrete `Array` stored as `_chapters` on the entry. `landing_page` / `foreword_page` are pre-resolved to their `Page` references in the same pass so Phase 8 has no pages-walk left to do. `sortByNavOrder` implements Jekyll's group-by-owning-index sort: each index page and its leaves stay together, group order by lead-item `[nav_order, title]`.
+| Status | Meaning |
+|---|---|
+| `NOT_READY` (0) | One or more predecessors not yet done. |
+| `READY` (1) | All predecessors done. Eligible to be claimed. |
+| `CLAIMED` (2) | A thread has CAS-claimed the slot and is running it. |
+| `DONE` (3) | Body has run; the thread's `submit()` has merged the output into `SharedState`. |
+| `FAILED` (4) | Body threw. The scheduler aborts the build. |
 
-**§B--§F: Phase 8 book.html assembly.** `assembleBook(site, pages)` is the pure-compute walker --- emits the title page, then iterates over `bookData.front_matter` and `bookData.parts` in order, then runs `rewriteBookHrefs` (in-book `href="/X"` → `href="#ch-X"` for any page that contributes to the PDF), then `compressHtml`. The per-chapter body transform in `bookChapterTransform` runs five passes:
+The transitions are atomic: `READY → CLAIMED` via `Atomics.compareExchange` (so two threads cannot claim the same task), and `CLAIMED → DONE` after the executor's `submit()` runs. When a task transitions to `DONE`, its successors get their `depCount` decremented; any whose count hits zero flip to `READY` and the executor's `Atomics.notify` wakes any thread that was sleeping on the notify generation counter.
 
-1. strip the `src="/"` prefix;
-2. unwrap `
` / `` for print; -3. wrap inter-`` whitespace in `` so pagedjs's page splitter doesn't collapse it at page breaks (12 patterns, longest first); -4. shift heading levels by `n in [0, 3]` capped at `h7-stub`; -5. prefix every heading id and intra-chapter `href="#"` with the chapter anchor. +### Task flags -Each part and chapter divider page contains the entry's title as an H1/H2 heading (or a silent `

` when `no_outline_entry:` is set), which becomes the PDF bookmark target. When `landing_is_target:` is set on an entry, the heading is instead injected directly into the landing-page article so the PDF bookmark navigates there rather than to the blank divider page; `rewriteBookHrefs`'s landing-H1 strip skips the injected heading via a `data-divider-heading` attribute. `outline_closed:` stamps `data-pdf-bookmark-closed` on the heading (or on the first content article for `no_outline_entry` entries), and `parseOutline` in `book/lib/outline.mjs` reads the attribute to write a negative PDF `/Count` for that bookmark node. Full schema is documented in the `_data/book.yml` file header. +A handful of bit flags on each task encode the scheduling primitives the build needs. They compose: -`augmentWithRedirectStubs` synthesises virtual `Page` records from each real page's `redirect_from` so the cross-ref rewriter still captures legacy URLs the way Jekyll's `jekyll-redirect-from` did (its stubs appeared in `site.pages` and got swept into the lookup table). `chapterAnchorFromUrl` is the URL → `ch-…` slug helper that generates both `id="..."` and the `#…` href targets. +| Flag | Meaning | Used by | +|---|---|---| +| `runOnMain` | Body runs on the main thread. The main loop claims; workers skip. | `discover`, `nav`, `markdownInit`, every `submit`-only task. | +| `on_demand` | Seed task (no predecessors) that is **not** auto-started. Becomes claimable only when a successor would otherwise be runnable. | `warmInit`, `renderEnvInit`, `renderJoin`, `flushJoin`. | +| `unique_per_worker` | The "done" state is per-lane: lane W's instance counts only for lane W's perspective. | `warmInit`, `renderEnvInit`. | +| `run_when_idle` | When a worker has no claimable work, it may run this task speculatively. | `warmInit` (overlaps Shiki WASM init with the main spine). | +| `pin_to_predecessor` | Must run on the same lane that ran a named predecessor. | `flush:i` (pinned to `render:i`). | +| `survives_reset` | The `perWorkerDone` flag survives an SAB reset between builds in serve mode. | `warmInit` (Shiki stays loaded). | -### [build-info.mjs](https://github.com/twinbasic/documentation/blob/main/builder/build-info.mjs) --- Phase 2 git capture +A task can also declare `perWorkerDeps` --- a list of `unique_per_worker` tasks that must have run on **this** lane before the task is claimable. That is how `render:i` declares it needs `renderEnvInit` to have run on whatever lane picks it up. -Both git shell-outs fall back to `"unknown"` on failure so a tarball install or a sparse checkout never aborts the build. +### Notify protocol -### [data.mjs](https://github.com/twinbasic/documentation/blob/main/builder/data.mjs) --- Phase 2 data loader +The SAB carries a single `notify` Int32 used as a generation counter. Workers that find no claimable work read the counter, perform one more scan to close the race window, then `Atomics.wait(notify, gen, 50)` --- a fifty-millisecond timeout that also serves as a safety net against missed wakeups. Every state transition that could make a task claimable bumps the counter and calls `Atomics.notify`, so any worker sleeping on the old generation returns immediately. -Replaces the book-specific YAML load that originally lived in [`book.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/book.mjs); the latter retains `loadBookData` as a back-compat wrapper for harnesses that haven't migrated. +## SAB memory layout -### [mermaid.mjs](https://github.com/twinbasic/documentation/blob/main/builder/mermaid.mjs) --- Phase 11 (B1) preprocessor +A single `SharedArrayBuffer` carries every Int32 array the scheduler needs. The sizes are static: `MAX_TASKS = 512`, `MAX_LANES = 64`, `MAX_EDGES = 2048`, total roughly 140 KB. The arrays a reader is most likely to care about: -Drives `puppeteer` + the in-tree `mermaid` package directly. Earlier this module shelled out to `@mermaid-js/mermaid-cli` via `npx mmdc`, which forked a fresh node + Chrome process per diagram and shipped its own bundled puppeteer-core (forcing a duplicate Chrome download); the direct path collapses both costs into one browser launch for the whole batch and one entry in the dependency tree. +- `status[i]` --- the task lifecycle enum above. +- `depCount[i]` --- remaining predecessor count. Decremented atomically on each predecessor's completion. +- `succOffset[i]` / `succCount[i]` / `succList` --- flat successor edge list. `dispatch.submit()` extends the edge list at runtime to wire the dynamic render and flush tasks. +- `perWorkerDone[i*MAX_LANES + lane]` --- per-lane done flag for `unique_per_worker` tasks. +- `flags[i]` --- bitmask of the flags above. +- `notify` --- generation counter for `Atomics.wait` / `notify`. +- `buildDone` --- terminal flag set to 1 (success) or 2 (error) by `Scheduler._finish()` / `_abort()`. Workers poll this at the top of each pull-loop iteration and exit when it transitions away from 0. -The render runs in a single `page.evaluate` that dynamic-imports `mermaid.esm.mjs` and calls `mermaid.render('my-svg', definition, container)`, then serialises the resulting `` via `XMLSerializer`. The SVG id matches mermaid-cli's default so any previously-committed SVG diffs cleanly against the new output. The bare HTML page is a `data:text/html` URL with one `

`; nothing else is loaded. +The complete layout, allocation helper, and the `readTaskMeta` / `writeTaskMeta` API live in [`sab-scheduler.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sab-scheduler.mjs). -**The intercept shim.** Chromium blocks the relative-`import()` chain that `mermaid.esm.mjs` triggers when the entry is loaded over `file://`, so requests are routed through a dummy origin `https://tbdocs-mermaid.invalid/*` and `page.setRequestInterception(true)` resolves them back to `node_modules/mermaid/dist/*` --- the same trick mermaid-cli's own `puppeteerIntercept.js` uses, stripped down to one root and one MIME type. The shim is necessary because the alternative (the IIFE bundle `mermaid.min.js`) inlines + minifies past the patched dagre chunk and would silently undo the layout fixes documented in [Mermaid Dagre Patches](Fixes/Dagre). +## Task DAG by section -**Failure modes.** Two categories, handled distinctly: +The pipeline has 28 named static tasks plus 2N dynamic ones (N render chunks + N flush tasks). The Gantt chart groups them into four sections that also organise the discussion below: -- **Setup** (`puppeteer` import fails, mermaid not installed, `puppeteer.launch()` fails for lack of Chrome): warns once with the recovery command (`npm install` / `npx puppeteer browsers install chrome --install-deps`), retains every on-disk SVG, returns `{ ..., setupSkipped: true }`. The orchestrator does **not** flip the exit code --- a fresh checkout still builds, just without diagram updates. -- **Per-diagram render** (broken `.mmd` syntax, mermaid render throws inside `page.evaluate`): warns with the parser error including line + column + expected-token list, retains that diagram's previous SVG, **continues** processing the rest of the batch so every broken diagram surfaces in one run, and increments the returned `failed` count. The orchestrator flips `process.exitCode = 1` based on that count so CI catches the bad diagram. +- **Seeds**: `buildInfo`, `scssLight`, `scssDark`, `config`, `warmInit`, `highlighterInit`, `discover`, `loadData` +- **Spine**: `nav`, `dot`, `buildInit`, `markdownInit`, `deriveSitemap`, `deriveRedirects`, `resolveBookChapters` +- **Render**: `dispatch`, `prepDest`, `prepPageDirs`, `renderEnvInit`, `render:i`, `renderJoin` +- **Write**: `scss`, `flush:i`, `flushJoin`, `writeAssets`, `searchData`, `writeAux`, `writeOffline`, `writePdf` -### [scss.mjs](https://github.com/twinbasic/documentation/blob/main/builder/scss.mjs) --- Phase 11 (B3) SCSS compiler +The full task DAG, with every cross-section edge, follows: -Runs Dart Sass (the [`sass`](https://www.npmjs.com/package/sass) npm package) over `docs/assets/css/just-the-docs-combined.scss` and pushes the result onto `generatedAssets` as `assets/css/just-the-docs-combined.css`. Replaces the Jekyll-era pre-compiled CSS that used to live under `builder/assets/`; editing any SCSS partial now reflects on the next build instead of requiring a re-extraction. +![Task DAG of the SAB pull scheduler](/assets/images/dot/scheduler-dag.svg) -Load paths are stacked, searched in order: `docs/_sass/` first (our customizations under `custom/`), then `builder/vendor/just-the-docs/_sass/` (the gem at v0.10.1). The same shadowing Jekyll relied on still applies --- `@import "custom/custom"` resolves to our `docs/_sass/custom/custom.scss` because the load-path order puts our `_sass/` first. +**[M]** runs on the main thread; **[W]** runs on a worker. Solid arrows are normal predecessor edges (`expected`); dotted arrows are per-lane dependencies (`perWorkerDeps`) or implicit data dependencies between tasks that share state through `SharedState`. -The entry point replicates the gem's `_includes/css/just-the-docs.scss.liquid` Liquid template as pure SCSS: it imports `support/support`, then `custom/setup`, then `color_schemes/light` (always), then `modules` --- emitting the full light-theme rule set at root. The same import block re-runs inside an `html.dark-mode { ... }` wrapper with `color_schemes/dark` instead so every module rule lands a second time with the dark palette, scoped under the dark-mode class. +### Seeds -Failure modes: +Seeds have no predecessors and become claimable as soon as the build starts (with the exception of `on_demand` seeds that wait for a successor). They saturate the worker pool while the main thread is still walking the source tree. -- **Setup** (`sass` not installed) is a hard error with a `npm install` hint --- there is no pre-compiled CSS fallback to fall back to. -- **Content** (syntax error in any SCSS partial) prints the source location, flips `process.exitCode = 1`, and continues the build with the previous `_site/` CSS lingering if any. CI catches the non-zero exit. +- `config` (main) --- reads `_config.yml` + applies CLI overrides. +- `buildInfo` (worker) --- two `git` shell-outs. Falls back to `"unknown"` on failure. +- `scssLight` (worker) --- compiles `just-the-docs-combined.scss` against the light palette. +- `scssDark` (worker) --- same against the dark palette. The two halves were one ~700 ms compile in the old design; splitting them saves about 200 ms. +- `scss` (main) --- joins both halves, writes the combined CSS to `_site/` and `_site-offline/`. +- `dot` (worker) --- regenerates stale `.dot` → `.svg` via the WASM build of Graphviz. WASM init (~50 ms) hides behind the main spine; per-diagram render is synchronous after that. +- `highlighterInit` (main) --- loads the `Light.theme` + `Dark.theme` palette, emits `tb-highlight.css`. Does not bring up Shiki on main --- workers each init their own. +- `warmInit` (worker, `on_demand` + `unique_per_worker` + `run_when_idle` + `survives_reset`) --- per-lane Shiki bootstrap. The flag combination means workers run it during the main-thread spine if they have no other claimable work, every render-worker needs it on its own lane, and in serve mode the per-lane done flag survives across rebuilds so the second build skips warmup entirely. +- `prepDest` (main) --- cleans and recreates the three destination trees. Deferred to after `dispatch` so the wipe does not contend with `discover`'s reads. +- `prepPageDirs` (main) --- pre-creates every page output directory. Lets `flush:i` skip `mkdir` entirely. -Upstream Dart Sass emits deprecation warnings against several gem-vendored constructs (`darken()`, root-`@import`); they're upstream noise, not actionable here without forking the gem. +### Spine -### [render.mjs](https://github.com/twinbasic/documentation/blob/main/builder/render.mjs) --- Phase 3 markdown pipeline +Main-thread tasks fed by `discover`. They are mostly cheap; the point is to fork out into independent compute streams as fast as possible after the source tree is known. -The largest single module (~1,580 lines) and the runtime hot path --- this is what dominates the ~1--2 s build time. +``` +config → discover ┬→ nav ┐ + ├→ buildInit ├→ dispatch + ├→ markdownInit ┘ + ├→ deriveRedirects + ├→ loadData → highlighterInit (already running) + ├→ deriveSitemap (deferred) + └→ resolveBookChapters (after deriveSitemap) +``` -`createMarkdownIt(ctx)` is the configuration heart. The base options (`html: true`, `xhtmlOut: true`, `breaks: false`, `linkify: false`, `typographer: true`, `quotes: "“”‘’"`) match kramdown's defaults. Plugins layer on: `markdown-it-attrs` with the `{:` / `}` delimiters that kramdown uses, `markdown-it-deflist`, `markdown-it-footnote` with the kramdown render rules (`fnref:N` / `reversefootnote` / `
` shapes; see `configureFootnotes`), plus a stack of in-tree plugins: +- `discover` --- walks `docs/`, classifies pages vs static files, builds `state.pageByDest`. +- `nav` --- builds the sidebar tree, runs the integrity check (orphan / ambiguous `parent:` aborts the build here), pre-renders the sidebar HTML. +- `buildInit` --- pre-renders the config-only chrome (SVG sprites, header, search footer, favicon). No nav-tree dependency; runs in parallel with `nav`. +- `markdownInit` --- builds the link tables, instantiates the shared markdown-it, computes site-level SEO. The serialized link tables and site-level SEO constants travel to render workers as part of the shared SAB payload. +- `loadData` --- reads `_book.yml`. +- `deriveRedirects` --- pure derivation of redirect stubs. Forks off `discover` directly. +- `deriveSitemap` --- absolute-URL list for `sitemap.xml`. Deferred to `dispatch` so it runs while the main thread would otherwise be idle waiting on render workers. +- `resolveBookChapters` --- resolves the `_book.yml` chapter selectors to `Page` references. Identity-critical: the same `Page` objects must be visible to `writePdf` after the render fan-out has populated `renderedContent`. -- **`standaloneIalForwardPlugin`** --- kramdown attaches a standalone `{:...}` IAL to the next block, not the previous one; markdown-it-attrs gets that backwards. -- **`tightLooseListPlugin`** --- kramdown decides per-item whether a list item carries `

` wraps; markdown-it decides at list level. Post-pass hides `paragraph_open` / `paragraph_close` tokens to match. -- **`looseDeflistPlugin`** --- the same rule applied to `

` bodies, with the narrower trigger (only the `dt` → `dd` blank-line gap counts). -- **`headerIdPlugin`** --- the `kramdownSlug` algorithm (lowercase, drop characters outside `\p{L}\p{N}\p{M}\p{Pc}\-`, replace spaces with `-`, deduplicate with `-1`, `-2`, ...). -- **`tocPlugin`** --- detects the `* TOC\n{:toc}` pattern (a bullet list whose token carries a `toc` attribute) and replaces it with the nested `