From cddeade38b700eeca58cdc3329df62c36ab20cab Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Sun, 17 May 2026 16:25:34 +0200 Subject: [PATCH 1/6] Add timing information to book-href-rewriter and pdflify. --- docs/_plugins/book-href-rewrite.rb | 5 +++++ docs/_plugins/pdfify.rb | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/docs/_plugins/book-href-rewrite.rb b/docs/_plugins/book-href-rewrite.rb index 963ff87..d0b7486 100644 --- a/docs/_plugins/book-href-rewrite.rb +++ b/docs/_plugins/book-href-rewrite.rb @@ -228,6 +228,8 @@ def self.process(page) return if parent_map.empty? landing_anchors = build_landing_anchors(site) + start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) + rewritten = 0 landings_stripped = 0 page.output = page.output.gsub(/(]*id="(ch-[^"]+)"[^>]*>)(.*?)(<\/article>)/m) do @@ -254,6 +256,9 @@ def self.process(page) "#{article_open}#{body}#{article_end}" end Jekyll.logger.info "BookHrefRewrite:", "rewrote #{rewritten} chapter bodies, stripped #{landings_stripped} landing H3s" + + elapsed_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) * 1000).round(0) + Jekyll.logger.info "BookHrefRewrite:", "BookHrefRewriter ran in #{elapsed_ms}ms." end end diff --git a/docs/_plugins/pdfify.rb b/docs/_plugins/pdfify.rb index 6e8e0ac..a36777b 100644 --- a/docs/_plugins/pdfify.rb +++ b/docs/_plugins/pdfify.rb @@ -88,6 +88,8 @@ def self.run(site, source_root, dest_root) return end + start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) + # Wipe the destination tree so previous runs do not leave stale # images behind when source pages are deleted or renamed. FileUtils.rm_rf(dest) @@ -132,6 +134,9 @@ def self.run(site, source_root, dest_root) book_src.delete Jekyll.logger.info "Pdfify:", "wrote #{dest_root} -- copied #{copied} file(s) (#{image_paths.size} image(s)#{skipped.zero? ? "" : ", #{skipped} missing"})" + + elapsed_ms = ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time) * 1000).round(0) + Jekyll.logger.info "Pdfify:", "Pdfifier ran in #{elapsed_ms}ms." end # Walks book.html for relative `` URLs and returns the From 3ff66be99f0f8638b1b86cd2235f89be26f269fb Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Sun, 17 May 2026 16:41:07 +0200 Subject: [PATCH 2/6] Speed up offlinify by 4s. --- docs/_plugins/offlinify.md | 73 +++++++++---------- docs/_plugins/offlinify.rb | 140 +++++++++++++++++++------------------ 2 files changed, 110 insertions(+), 103 deletions(-) diff --git a/docs/_plugins/offlinify.md b/docs/_plugins/offlinify.md index 49b7785..1d1aac4 100644 --- a/docs/_plugins/offlinify.md +++ b/docs/_plugins/offlinify.md @@ -41,7 +41,7 @@ After Jekyll's WRITE phase completes, the hook fires `Offlinify.run(site, src_de 4. **Walk the source tree.** For each file: - If the file matches a pattern in `site.config["offline_exclude"]` (see [Exclude list](#exclude-list)): skip the copy so the online `_site/` keeps it and the offline tree doesn't. - - `.html`: read once, run three transformation passes in order (absolute-URL rewrite, relative-URL rewrite, search-setup injection), write back if any pass changed content. + - `.html`: read once, run two transformation passes in order (combined HTML URL rewrite, search-setup injection), write back if any pass changed content. - `.css`: read, run the `url()` rewrite, write back. - Anything else (images, fonts, JSON, JS): plain `FileUtils.cp` into the offline tree. @@ -53,11 +53,22 @@ After Jekyll's WRITE phase completes, the hook fires `Offlinify.run(site, src_de ## Transformation passes -### Pass 1: absolute URL rewriting (HTML) +### HTML URL rewriting -Regex: `\b(href|src)=(["'])(\/(?!\/)[^"']*)\2` — captures `href` or `src` attribute values that start with a single `/` (not `//`, which is protocol-relative). +A single combined regex matches both absolute and page-relative URLs in `href`/`src` attributes: -For each match, `compute_relative` does the following: +`\b(href|src)=(["'])(\/(?!\/)[^"']*|(?![#/]|[a-zA-Z][a-zA-Z0-9+.\-]*:)[^"']+)\2` + +The third capture (the URL) has two alternatives: + +- **Absolute** (`\/(?!\/)[^"']*`): starts with a single `/`, not `//` (protocol-relative). Produced by `relative_url`. Goes through `compute_relative`. +- **Page-relative** (`(?![#/]|[a-zA-Z][a-zA-Z0-9+.\-]*:)[^"']+`): does not start with `#` (fragment-only — leave alone), `/` (handled by the first alternative), or a `scheme:` prefix (`http:`, `mailto:`, `tel:`, `javascript:`, etc.). Comes from markdown sources verbatim (`[Description](Attributes#description)`-style); Jekyll passes these through without applying `relative_url`, so they reach the rendered HTML without a baseurl prefix. Goes through `compute_rel_url`. + +The two alternatives are disjoint at the start of the URL, so a single `gsub` handles both. Inside the block, dispatch on `raw.start_with?("/")`. (An earlier two-regex design ran two full gsubs and re-scanned the file for code-block ranges between them; combining them halved the per-file regex work — see [Performance](#performance).) + +#### Absolute-URL path: `compute_relative` + +For each absolute-URL match, the steps are: 1. **Split off query/fragment.** `#section` and `?foo=bar` are preserved verbatim onto the rewritten URL. @@ -96,37 +107,26 @@ descend = "Tutorials/CustomControls/Form%20Designer.html" result = "../../Tutorials/CustomControls/Form%20Designer.html#section" ``` -**Code-block skip.** Before the rewrite regex runs, the file's content is scanned once for `` and `
` blocks. The byte ranges of their bodies are passed to the regex callback, which returns the match verbatim when the match offset falls inside any range. The skip has two consequences: - -- Example URLs in tutorial code samples (e.g. `