diff --git a/.github/workflows/tbdocs-gh-pages.yml b/.github/workflows/tbdocs-gh-pages.yml index 0366dcf3..9ffbc139 100644 --- a/.github/workflows/tbdocs-gh-pages.yml +++ b/.github/workflows/tbdocs-gh-pages.yml @@ -176,7 +176,7 @@ jobs: echo "tag=$TAG" >> "$GITHUB_OUTPUT" echo "name=$NAME" >> "$GITHUB_OUTPUT" - name: Create release with offline-site zip - uses: softprops/action-gh-release@v2 + uses: softprops/action-gh-release@v3 with: tag_name: ${{ steps.tag.outputs.tag }} name: ${{ steps.tag.outputs.name }} diff --git a/docs/Documentation/Builder.md b/docs/Documentation/Builder.md index 3c28c1eb..5ddf1d66 100644 --- a/docs/Documentation/Builder.md +++ b/docs/Documentation/Builder.md @@ -30,9 +30,9 @@ Sub-pages: ## Why tbdocs exists -The site was originally built with the just-the-docs Jekyll theme; tbdocs is the Node.js replacement that drives the same content model and the same output structure without a Ruby toolchain. The win once the port was settled is mostly internal: a fixed dependency set, end-to-end build time around 2--3 seconds on a modern laptop, and one process for all three output trees. +The site was originally built with the just-the-docs Jekyll theme; tbdocs is the Node.js replacement that follows the same content model and the same output structure without a Ruby toolchain. The win once the port was settled is mostly internal: a fixed dependency set, end-to-end build time around 2--3 seconds on a modern laptop, and one process for all three output trees. -The rework documented here is internal. The build moved from a push-style scheduler (the main thread decides what is ready and hands work to workers) to a SAB-based pull scheduler (workers read shared task state and claim work themselves) and three pieces of per-page work that used to run serially on the main thread --- offline rewrite, per-page SEO, and search-index derivation --- now run inside the render workers. The output is byte-equivalent to the previous scheduler; the change is in how the time is spent. +The rework documented here is internal. The build moved from a push-style scheduler (the main thread decides what is ready and passes work to workers) to a SAB-based pull scheduler (workers read shared task state and claim work themselves) and three pieces of per-page work that used to run serially on the main thread --- offline rewrite, per-page SEO, and search-index derivation --- now run inside the render workers. The output is byte-equivalent to the previous scheduler; the change is in how the time is spent. ## Architecture at a glance @@ -61,7 +61,7 @@ Modules grouped by role. Each entry has one line; deep-dive in [Pipeline Stages] | [`tbdocs.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/tbdocs.mjs) | Entry point. Defines the static `TASKS` graph, allocates the SAB, spawns the pool, runs the build, injects the Gantt chart. | | [`scheduler.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/scheduler.mjs) | Main-thread side of the pull scheduler: claim loop, results map, completion detection, summary printer. | | [`worker-pool.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/worker-pool.mjs) | Worker lifecycle wrapper: spawn, send the SAB, forward messages to the scheduler, terminate. No dispatch logic. | -| [`cpu-worker.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/cpu-worker.mjs) | Worker harness. Runs the pull loop, holds the eight named handlers, drives speculative idle execution. | +| [`cpu-worker.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/cpu-worker.mjs) | Worker harness. Runs the pull loop, holds the eight named handlers, handles speculative idle execution. | | [`sab-scheduler.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sab-scheduler.mjs) | SAB layout, allocation, task-metadata API. Constants and atomics primitives consumed by both the scheduler and the workers. | | [`sab-broadcast.mjs`](https://github.com/twinbasic/documentation/blob/main/builder/sab-broadcast.mjs) | JSON-over-SAB pack/unpack for the shared payload (config, link tables, sidebar HTML, etc.) broadcast to every render worker. | @@ -155,11 +155,11 @@ A task can also declare `perWorkerDeps` --- a list of `unique_per_worker` tasks ### Notify protocol -The SAB carries a single `notify` Int32 used as a generation counter. Workers that find no claimable work read the counter, perform one more scan to close the race window, then `Atomics.wait(notify, gen, 50)` --- a fifty-millisecond timeout that also serves as a safety net against missed wakeups. Every state transition that could make a task claimable bumps the counter and calls `Atomics.notify`, so any worker sleeping on the old generation returns immediately. +The SAB holds a single `notify` Int32 used as a generation counter. Workers that find no claimable work read the counter, perform one more scan to close the race window, then `Atomics.wait(notify, gen, 50)` --- a fifty-millisecond timeout that also serves as a safety net against missed wakeups. Every state transition that could make a task claimable bumps the counter and calls `Atomics.notify`, so any worker sleeping on the old generation returns immediately. ## SAB memory layout -A single `SharedArrayBuffer` carries every Int32 array the scheduler needs. The sizes are static: `MAX_TASKS = 512`, `MAX_LANES = 64`, `MAX_EDGES = 2048`, total roughly 140 KB. The arrays a reader is most likely to care about: +A single `SharedArrayBuffer` contains every Int32 array the scheduler needs. The sizes are static: `MAX_TASKS = 512`, `MAX_LANES = 64`, `MAX_EDGES = 2048`, total roughly 140 KB. The arrays a reader is most likely to care about: - `status[i]` --- the task lifecycle enum above. - `depCount[i]` --- remaining predecessor count. Decremented atomically on each predecessor's completion. @@ -188,7 +188,7 @@ The full task DAG, with every cross-section edge, follows: ### Seeds -Seeds have no predecessors and become claimable as soon as the build starts (with the exception of `on_demand` seeds that wait for a successor). They saturate the worker pool while the main thread is still walking the source tree. +Seeds have no predecessors and become claimable as soon as the build starts (with the exception of `on_demand` seeds that wait for a successor). They saturate the worker pool while the main thread is still traversing the source tree. - `config` (main) --- reads `_config.yml` + applies CLI overrides. - `buildInfo` (worker) --- two `git` shell-outs. Falls back to `"unknown"` on failure. @@ -215,7 +215,7 @@ config → discover ┬→ nav ┐ └→ resolveBookChapters (after deriveSitemap) ``` -- `discover` --- walks `docs/`, classifies pages vs static files, builds `state.pageByDest`. +- `discover` --- traverses `docs/`, classifies pages vs static files, builds `state.pageByDest`. - `nav` --- builds the sidebar tree, runs the integrity check (orphan / ambiguous `parent:` aborts the build here), pre-renders the sidebar HTML. - `buildInit` --- pre-renders the config-only chrome (SVG sprites, header, search footer, favicon). No nav-tree dependency; runs in parallel with `nav`. - `markdownInit` --- builds the link tables, instantiates the shared markdown-it, computes site-level SEO. The serialized link tables and site-level SEO constants travel to render workers as part of the shared SAB payload. @@ -241,7 +241,7 @@ dispatch ┬→ render:0 ─┬→ flush:0 ─┐ - `renderEnvInit` (worker, `on_demand` + `unique_per_worker`) --- per-lane render environment setup: unpack the shared SAB, reconstruct the link-table Maps, instantiate the worker's own markdown-it. Declared as a `perWorkerDeps` on every `render:i` so the first render claim per lane pulls it in. - `render:i` (worker, dynamic) --- the per-chunk compute. Each one runs five sub-stages over its slice of `state.pages`: `renderPhase` (markdown-it body render) → `computeChunkSeo` (per-page SEO fields) → `templatePhase` (just-the-docs layout wrap) → `deriveOfflinePageCached` (offline HTML rewrite) → `deriveSearchEntries` (per-section search entries). Returns a delta containing `renderedContent` per page, plus the per-chunk search entries. -- `flush:i` (worker, dynamic, `pin_to_predecessor`) --- writes the chunk's page HTML to disk on the same worker that rendered it. Online tree always; offline tree too unless `skipOffline`. The pinning is what makes per-chunk flush correct: the worker stashes a batch on its own `_pendingFlush` FIFO at the end of `render`, and only the matching `flush:i` ever drains it. +- `flush:i` (worker, dynamic, `pin_to_predecessor`) --- writes the chunk's page HTML to disk on the same worker that rendered it. Online tree always; offline tree too unless `skipOffline`. The pinning is what makes per-chunk flush correct: the worker stores a batch on its own `_pendingFlush` FIFO at the end of `render`, and only the matching `flush:i` ever drains it. - `renderJoin` (main, `on_demand`) --- barrier that unblocks `searchData`. Its dep count is set to N by `dispatch.submit()`. - `flushJoin` (main, `on_demand`) --- barrier that aggregates per-chunk write stats and gates `writeAux` + `writePdf`. @@ -279,8 +279,8 @@ For a one-page reference, every task and its execution locus: Three pieces of work newly distributed to render workers under the current design: 1. **Per-page SEO** (`computeChunkSeo`) --- was a single Phase 2 main-thread task; now runs per chunk inside `render:i`, between `renderPhase` and `templatePhase`. The values are written into the page objects on the worker and travel back as part of the render delta. -2. **Per-page offline HTML** (`deriveOfflinePageCached`) --- was a Phase 7 main-thread pass that re-read the online tree; now runs per chunk inside `render:i` after `templatePhase`. The resulting `offlineHtml` is stashed on the page and written by the matching `flush:i` directly to `_site-offline/`. -3. **Per-chunk search entries** (`deriveSearchEntries`) --- was a Phase 6 main-thread task; now runs per chunk inside `render:i`. Each chunk's entries are stashed at `state.searchChunks[i]` by the render `submit()`; the `searchData` task only flattens, renumbers, and writes the JSON. +2. **Per-page offline HTML** (`deriveOfflinePageCached`) --- was a Phase 7 main-thread pass that re-read the online tree; now runs per chunk inside `render:i` after `templatePhase`. The resulting `offlineHtml` is stored on the page and written by the matching `flush:i` directly to `_site-offline/`. +3. **Per-chunk search entries** (`deriveSearchEntries`) --- was a Phase 6 main-thread task; now runs per chunk inside `render:i`. Each chunk's entries are stored at `state.searchChunks[i]` by the render `submit()`; the `searchData` task only flattens, renumbers, and writes the JSON. Per-chunk page HTML writes were similarly pulled off the main thread: each `flush:i` writes its chunk's pages to disk on the same worker that rendered them, with the pinning enforced by `pin_to_predecessor`. @@ -319,7 +319,7 @@ Each `render:i` runs five sub-stages over its chunk: 4. Offline rewrite (when `!skipOffline`) --- per destination directory, render the first page through `deriveOfflinePage` and slice out the nav block. Subsequent pages in the same directory substitute the sliced nav with a cached output, run the rewriter over the smaller string, and splice the output back in. Saves ~200 ms across the build. 5. `deriveSearchEntries(chunk, env.site)` --- per-section search-index entries. -The worker stashes the writable pages on its own `_pendingFlush` FIFO and returns the deltas. The matching `flush:i` --- pinned to this lane --- claims later, pops the batch, and writes the page HTML to disk. The pinning is what guarantees the batch lands on the right worker; the FIFO is what handles the case where a worker has already started a second `render:i` before its first `flush:i` claims. +The worker stores the writable pages on its own `_pendingFlush` FIFO and returns the deltas. The matching `flush:i` --- pinned to this lane --- claims later, pops the batch, and writes the page HTML to disk. The pinning is what guarantees the batch lands on the right worker; the FIFO is what handles the case where a worker has already started a second `render:i` before its first `flush:i` claims. ## Persistent worker pool and serve mode @@ -374,7 +374,7 @@ When adding a new task to `TASKS`, give it a `ganttSection` key matching one of ## Dependencies -A single `package.json` at the repo root carries everything --- the static site generator's deps, the PDF renderer's deps, and the few packages both consume: +A single `package.json` at the repo root contains everything --- the static site generator's deps, the PDF renderer's deps, and the few packages both consume: ```json { @@ -399,7 +399,7 @@ A single `package.json` at the repo root carries everything --- the static site } ``` -No template engine, no framework, no bundler, no postinstall hooks. `acorn` + `acorn-walk` parse the upstream `just-the-docs.js` for the AST-based offline patcher; the `markdown-it-*` packages cover the dialect extensions the legacy parser supported; `shiki` is the syntax highlighter; `@hpcc-js/wasm-graphviz` is the WASM build of Graphviz that renders `.dot` diagram sources; `sass` is Dart Sass for the SCSS compile. `pdf-lib` + `html-entities` + `htmlparser2` + `puppeteer` are the PDF renderer's toolchain (puppeteer drives headless Chromium for the paged.js layout pass). +No template engine, no framework, no bundler, no postinstall hooks. `acorn` + `acorn-walk` parse the upstream `just-the-docs.js` for the AST-based offline patcher; the `markdown-it-*` packages cover the dialect extensions the legacy parser supported; `shiki` is the syntax highlighter; `@hpcc-js/wasm-graphviz` is the WASM build of Graphviz that renders `.dot` diagram sources; `sass` is Dart Sass for the SCSS compile. `pdf-lib` + `html-entities` + `htmlparser2` + `puppeteer` are the PDF renderer's toolchain (puppeteer controls headless Chromium for the paged.js layout pass). Node 22+ is required: the SAB scheduler uses `Atomics.wait`, `Atomics.notify`, and `SharedArrayBuffer` --- all baseline in Node 22 without flags. @@ -430,9 +430,9 @@ The build aborts or flips the exit code under a handful of conditions: - **Page-count drift.** `runBuild()` ends with `if (pages.length < 836) process.exitCode = 1` so a discover-rule regression that silently drops content appears as a non-zero exit even though the build itself completed. - **SAB structural validation.** `verifySchedulerSAB(TASKS, views, idMapping)` runs immediately after allocation. A misconfigured `expected`/`perWorkerDeps` list, a duplicate task name, or a successor edge to an unknown task aborts the build before any task runs. -- **DOT render failure.** Per-diagram failures retain the previous SVG and continue the batch so every broken diagram surfaces in one run; the orchestrator flips `process.exitCode = 1` based on the failure count. +- **DOT render failure.** Per-diagram failures retain the previous SVG and continue the batch so every broken diagram appears in one run; the orchestrator flips `process.exitCode = 1` based on the failure count. - **SCSS compile failure.** The light/dark workers warn with the source location and continue with `failed: true`; the joiner sets `process.exitCode = 1`. Existing `_site/` CSS lingers. - **Nav integrity.** Orphan or ambiguous `parent:` declarations throw inside `nav.execute()`, which aborts the build via `Scheduler._abort()`. -- **Worker crash.** A worker handler that throws posts `{ taskFailed, message, stack }` to main; the scheduler calls `_abort()`, the build rejects, and the orchestrator surfaces the error with the task name in the message. +- **Worker crash.** A worker handler that throws posts `{ taskFailed, message, stack }` to main; the scheduler calls `_abort()`, the build rejects, and the orchestrator reports the error with the task name in the message. Setup-class failures --- `@hpcc-js/wasm-graphviz` not installed, `sass` missing --- print a one-line recovery hint and continue with stale outputs. They do not flip the exit code; a fresh checkout still builds. diff --git a/docs/Documentation/Building.md b/docs/Documentation/Building.md index 9a20498e..1b8a4e84 100644 --- a/docs/Documentation/Building.md +++ b/docs/Documentation/Building.md @@ -26,7 +26,7 @@ The documentation is rendered to HTML by `tbdocs`, a custom Node.js static site ### Requirements - **Node.js 22+** for `tbdocs` itself. -- **`npm ci`** at the repository root installs everything: the static site generator's deps and the PDF renderer's deps. A single `package.json` at the repo root carries the whole dependency set. The `build.bat` / `serve.bat` wrappers assume the install has run. +- **`npm ci`** at the repository root installs everything: the static site generator's deps and the PDF renderer's deps. A single `package.json` at the repo root contains the whole dependency set. The `build.bat` / `serve.bat` wrappers assume the install has run. - **Chromium** is required only when the PDF book is rendered. It is downloaded once by `npx puppeteer browsers install chrome --install-deps`. The day-to-day `build.bat` / `serve.bat` flow does not need it. ## Building @@ -73,7 +73,7 @@ Diagrams live as `.dot` source files under `docs/assets/images/dot/` and are ref At render time, any markdown image reference to a build-local `.svg` is replaced with the SVG content inlined directly in the HTML. Each inlined SVG gets a click-to-zoom overlay and four control links (Download SVG, Copy SVG, Download PNG, Copy PNG). The controls are hidden in print output. See the [SVG inlining](Builder#svg-inlining) section of the Builder page for the implementation details. -The renderer drives `@hpcc-js/wasm-graphviz` directly: one WASM module load (~50 ms) covers the whole batch, then each diagram is a synchronous `gv.dot(src)` call. No headless browser, no in-tree patches, no Chromium dependency for diagrams. Two failure modes are handled distinctly: +The renderer calls `@hpcc-js/wasm-graphviz` directly: one WASM module load (~50 ms) covers the whole batch, then each diagram is a synchronous `gv.dot(src)` call. No headless browser, no in-tree patches, no Chromium dependency for diagrams. Two failure modes are handled distinctly: - **Setup failures** (`@hpcc-js/wasm-graphviz` not installed, WASM load fails) emit a one-line warning, retain the existing on-disk SVGs, and let the build exit 0 --- a fresh checkout without `npm install` still builds against the committed SVGs. - **Content failures** (broken DOT syntax, render throws) emit the error verbatim, leave that diagram's previous SVG in place, continue rendering the rest of the batch, and flip `process.exitCode = 1` so CI catches the bad diagram. diff --git a/docs/Documentation/Extending.md b/docs/Documentation/Extending.md index bd0bed79..ce87b8d4 100644 --- a/docs/Documentation/Extending.md +++ b/docs/Documentation/Extending.md @@ -354,13 +354,13 @@ export function createMarkdownIt(ctx) { ### 3. Verify -Run `build.bat` and open an affected page; for live feedback, use `serve.bat`. A plugin that walks the full token stream on every page runs N+1 times per build (one main thread + N workers), so check the per-task render timing in the summary or the Gantt chart for any spike. +Run `build.bat` and open an affected page; for live feedback, use `serve.bat`. A plugin that traverses the full token stream on every page runs N+1 times per build (one main thread + N workers), so check the per-task render timing in the summary or the Gantt chart for any spike. --- ## Adding a render-worker sub-stage -When the new work is per-page CPU compute, slotting it into the existing `render` handler is the cleanest path. Skip the per-task overhead, ride the same fan-out. +When the new work is per-page CPU compute, slotting it into the existing `render` handler is the most direct path. Skip the per-task overhead, ride the same fan-out. ### Where to plug in diff --git a/docs/Documentation/Fixes-PDFLib.md b/docs/Documentation/Fixes-PDFLib.md index 1abe6509..87f0dc8a 100644 --- a/docs/Documentation/Fixes-PDFLib.md +++ b/docs/Documentation/Fixes-PDFLib.md @@ -20,7 +20,7 @@ The root cause of the need for all these patches is the same: pdf-lib is designe **Problem.** `PDFRef.of(objectNumber, generationNumber)` is the factory for every indirect reference in the PDF. The original factory built instances via `Object.create(PDFRef.prototype)` followed by individual property writes. V8 treats objects built that way as transitioning through intermediate hidden-class maps for each write, producing instances roughly twice as large as those built with `new`. Measured on the book: ~60 bytes per instance via the upstream path. With ~226 000 unique indirect references, that is ~13.5 MB of excess heap. Additionally, there was no pool: each call to `PDFRef.of(N, 0)` allocated a new instance even for previously seen object numbers. -**Fix.** Two constructor functions, `_FastRef` (gen=0) and `_FastRefGen` (gen≠0), both with their `prototype` aliased to `PDFRef.prototype`. V8 assigns each a stable hidden class from the first instance. `_FastRef` carries only `objectNumber`; `generationNumber` is provided as a prototype data-property default of `0`, so gen=0 instances need only one inline slot (~16 bytes per instance, down from ~60). Gen=0 instances are cached in a dense `pool0` Array indexed by `objectNumber`; gen≠0 instances use a `Map` keyed by `"N M"` string (vanishingly rare: only the free entry at object 0 in Chromium-emitted PDFs). The hot prototype methods `toString`, `sizeInBytes`, and `copyBytesInto` are rewritten to read `objectNumber` and `generationNumber` as plain data-property reads rather than going through the original `tag` string stored on each instance. +**Fix.** Two constructor functions, `_FastRef` (gen=0) and `_FastRefGen` (gen≠0), both with their `prototype` aliased to `PDFRef.prototype`. V8 assigns each a stable hidden class from the first instance. `_FastRef` holds only `objectNumber`; `generationNumber` is provided as a prototype data-property default of `0`, so gen=0 instances need only one inline slot (~16 bytes per instance, down from ~60). Gen=0 instances are cached in a dense `pool0` Array indexed by `objectNumber`; gen≠0 instances use a `Map` keyed by `"N M"` string (vanishingly rare: only the free entry at object 0 in Chromium-emitted PDFs). The hot prototype methods `toString`, `sizeInBytes`, and `copyBytesInto` are rewritten to read `objectNumber` and `generationNumber` as plain data-property reads rather than going through the original `tag` string stored on each instance. ## fast-inflate.mjs @@ -72,9 +72,9 @@ The four-byte case covers all PDFs under 4 GB; the fallback handles larger value ## fast-dict-onebuf.mjs -**Problem.** Each `PDFDict` instance held its key-value pairs in a `Map`. Maps carry ~200 bytes of per-instance overhead when empty and ~50 bytes per entry. On the book, ~260 000 `PDFDict` instances are created during `PDFDocument.load`. As the document grows during parse, the Maps repeatedly doubled their internal hash-table storage and discarded each previous arena to GC. +**Problem.** Each `PDFDict` instance held its key-value pairs in a `Map`. Maps have ~200 bytes of per-instance overhead when empty and ~50 bytes per entry. On the book, ~260 000 `PDFDict` instances are created during `PDFDocument.load`. As the document grows during parse, the Maps repeatedly doubled their internal hash-table storage and discarded each previous arena to GC. -**Fix.** A single append-only Array (`main`) shared across all `PDFDict` instances for the document's lifetime. Each `PDFDict` carries one encoded integer (`d`) that packs a `start` index (23 bits) and entry-pair `length` count (16 bits) into a single JavaScript number. `main[start..start+length]` holds alternating key and value references. Mutations that add a new entry either extend the dict's range in-place when it is at the array's high-water mark, or copy the range to the tail first (copy-on-write). `PDFCatalog`, `PDFPageTree`, and `PDFPageLeaf` share the same backing array; `PDFPageLeaf`'s `normalized` and `autoNormalizeCTM` booleans are encoded in two spare bits of `d` (bits 23 and 24). `PDFObjectParser.parseDict` uses a per-parser temp array as a recursion-frame stack, committing each completed frame to `main` as a single contiguous append. +**Fix.** A single append-only Array (`main`) shared across all `PDFDict` instances for the document's lifetime. Each `PDFDict` holds one encoded integer (`d`) that packs a `start` index (23 bits) and entry-pair `length` count (16 bits) into a single JavaScript number. `main[start..start+length]` holds alternating key and value references. Mutations that add a new entry either extend the dict's range in-place when it is at the array's high-water mark, or copy the range to the tail first (copy-on-write). `PDFCatalog`, `PDFPageTree`, and `PDFPageLeaf` share the same backing array; `PDFPageLeaf`'s `normalized` and `autoNormalizeCTM` booleans are encoded in two spare bits of `d` (bits 23 and 24). `PDFObjectParser.parseDict` uses a per-parser temp array as a recursion-frame stack, committing each completed frame to `main` as a single contiguous append. The `measure-pass.mjs` pre-pass counts total `dictSlots` in the raw PDF byte stream. Calling `setExpectedDictSlots(n)` before `PDFDocument.load` resizes `main` in-place to the exact required size via `main.length = n`, eliminating V8 growth reallocations during parse. An in-place resize is used rather than replacing the module-level binding; replacing it would invalidate V8's inline-cache slots in every closure that reads `main`, causing a parse-time deoptimisation spike. @@ -129,7 +129,7 @@ An additional optimisation in `parseIndirectObjects`: the upstream implementatio **Problem.** Each `PDFArray` instance allocated a per-instance `this.array = []` in its constructor. On the book, these per-instance allocations contributed ~19 MB of heap. Each `this.array` was a short-lived Array grown on demand, causing V8 to perform repeated backing-store reallocations for small arrays. -**Fix.** The same one-buffer strategy as `fast-dict-onebuf`, applied to `PDFArray`. A single append-only Array (`arrayMain`) shared across all `PDFArray` instances. Each `PDFArray` carries one encoded integer (`d`) packing `start` (24 bits) and `length` (16 bits). `arrayMain[start..start+length]` holds array elements as plain JavaScript references --- no encoding, no decode step on reads. `PDFObjectParser.parseArray` uses a per-parser `_arrayTemp` stack, committing each completed frame to `arrayMain` in one contiguous append. Mutations follow the same copy-on-write logic as `fast-dict-onebuf`. +**Fix.** The same one-buffer strategy as `fast-dict-onebuf`, applied to `PDFArray`. A single append-only Array (`arrayMain`) shared across all `PDFArray` instances. Each `PDFArray` holds one encoded integer (`d`) packing `start` (24 bits) and `length` (16 bits). `arrayMain[start..start+length]` holds array elements as plain JavaScript references --- no encoding, no decode step on reads. `PDFObjectParser.parseArray` uses a per-parser `_arrayTemp` stack, committing each completed frame to `arrayMain` in one contiguous append. Mutations follow the same copy-on-write logic as `fast-dict-onebuf`. `setExpectedArraySlots(n)` from `measure-pass.mjs` resizes `arrayMain` in-place before parse for the same reason as `setExpectedDictSlots`: in-place resize preserves V8's inline-cache slots. diff --git a/docs/Documentation/Fixes-PagedJS.md b/docs/Documentation/Fixes-PagedJS.md index ba808974..d3dbe0bf 100644 --- a/docs/Documentation/Fixes-PagedJS.md +++ b/docs/Documentation/Fixes-PagedJS.md @@ -128,7 +128,7 @@ At each call site that now receives the sync sentinel, `_assertSync(result, hook **Problem.** `[PATCH: wrap-content-move]` Upstream moved the `
` content into paged.js's layout container by serialising the entire body to a string via `innerHTML` and reparsing it into a ``. For a large book this serialisation is expensive and destroys the live DOM nodes, requiring a full reparse. -**Fix.** Children are moved directly into a plain `DocumentFragment` owned by the live document via `appendChild`. The fragment is stashed on a marker `` element's `_pagedjsContent` expando so re-entrant calls return the already-moved fragment rather than attempting to move already-detached nodes. +**Fix.** Children are moved directly into a plain `DocumentFragment` owned by the live document via `appendChild`. The fragment is stored on a marker `` element's `_pagedjsContent` expando so re-entrant calls return the already-moved fragment rather than attempting to move already-detached nodes. ### Whitespace filter diff --git a/docs/Documentation/Fixes.md b/docs/Documentation/Fixes.md index b9f65dc3..639f9f86 100644 --- a/docs/Documentation/Fixes.md +++ b/docs/Documentation/Fixes.md @@ -10,7 +10,7 @@ permalink: /Documentation/Development/Fixes # Library Patches {: .no_toc } -Two third-party libraries carry in-tree modifications. `book/lib/paged.browser.js` is a patched copy of paged.js v0.4.3 (MIT); the thirteen `fast-*.mjs` files there are side-effecting shims applied to pdf-lib's live exports before each PDF process phase. This section documents every change: what the upstream behaviour was, why it was unsuitable for the build pipeline, and what was changed. +Two third-party libraries have in-tree modifications. `book/lib/paged.browser.js` is a patched copy of paged.js v0.4.3 (MIT); the thirteen `fast-*.mjs` files there are side-effecting shims applied to pdf-lib's live exports before each PDF process phase. This section documents every change: what the upstream behaviour was, why it was unsuitable for the build pipeline, and what was changed. ## Sub-pages diff --git a/docs/Documentation/PDF-Generation.md b/docs/Documentation/PDF-Generation.md index 29ea1ebc..42a1b228 100644 --- a/docs/Documentation/PDF-Generation.md +++ b/docs/Documentation/PDF-Generation.md @@ -48,7 +48,7 @@ Always run `build.bat` first to populate `_site-pdf/`. ## render-book.mjs -`book/render-book.mjs` drives the three phases. Its helpers live in `book/lib/`. +`book/render-book.mjs` runs the three phases. Its helpers live in `book/lib/`. ### Phase 1: Render @@ -110,7 +110,7 @@ page.pdf({ Augments the raw PDF from Chromium with a bookmark tree and document metadata, then saves the final output. -The raw buffer from `page.pdf()` is a valid but minimal PDF: it has no `/Outlines` entry and carries Chromium's default metadata. The process phase runs four operations in sequence: +The raw buffer from `page.pdf()` is a valid but minimal PDF: it has no `/Outlines` entry and contains Chromium's default metadata. The process phase runs four operations in sequence: 1. **`measureRawPdf(rawPdf)`** --- traverses the raw bytes without allocating any objects. Returns `dictSlots` and `arraySlots` counts used to pre-size two shim backing arrays before the load (see [`measure-pass.mjs`](#measure-passmjs)). @@ -178,7 +178,7 @@ The function also injects a hidden `