-
Notifications
You must be signed in to change notification settings - Fork 0
Release notes
Detailed release history for wiki-search-index (the npm builder). The README carries the one-line cliff-notes; this page carries the detail.
Fix: HTML entities are now decoded in headings, not just body text. 0.1.1 decoded entities only while reducing Markdown to searchable text (toPlainText); the heading, title, and derived anchor paths still passed entities through verbatim. So a heading written with entities — e.g. ## 4.2.2 — 2026-05-29 — produced the junk anchor #422-mdash-2026-05-29 instead of GitHub's real #422--2026-05-29 (GitHub renders the entity to —, drops it, and the two flanking spaces collapse to --). That is a navigation-correctness bug: a result for such a section linked to the page top instead of the heading, and the displayed section title showed the literal —. The builder now runs one shared entity-decode step ahead of both github-slugger and the stored display text, so anchors and titles match GitHub. Verified live against the rendered wiki anchors (id="user-content-…"). As in 0.1.1, literal-Unicode wikis are byte-identical; only entity-using wikis change, and output stays deterministic so the committed-index git diff --exit-code staleness gate still holds.
Set the published CLI bin's execute bit. builder/wiki-index.mjs is now tracked in git as 100755 rather than 100644 — tidy packaging hygiene. This does not fix a real bug: npm chmods bin targets executable on install, so the prior 0o644 mode did not actually break npx wiki-search-index or global installs. (The "command not found" seen while testing was a cwd/version collision — running npx wiki-search-index from inside the package's own repo resolves the command to the local checkout, which has no installed .bin — not the file mode.) No code change; the 0.1.1 entity-decoding behavior is unchanged.
HTML entity decoding in text extraction. The builder now resolves HTML entities (—, →, Ӓ, 🔍, …) while reducing Markdown to searchable text. Previously they passed through verbatim and the tokenizer split them into junk terms (— → mdash, 🔍 → 128269); snippets also showed the literal entity. Now: numeric entities decode generally, so a genuinely letter-valued entity like α (α) is preserved as a real term while typographic/symbol entities (em dash, arrows, emoji) decode to characters the tokenizer discards — and snippets render the glyph. Indices for wikis written in literal Unicode (the common case) are byte-identical; only entity-using wikis change. Output stays deterministic, so the committed-index git diff --exit-code staleness gate still holds.
Related (the hosted app, not the npm package): the bookmarklet's ?from= detection now resolves owner/repo from any github.com/<owner>/<repo> page — the repo root, /actions, /pull/N, etc. — not just /wiki/… pages, so the bookmarklet works from anywhere in a repo.
Initial release of the wiki-search-index CLI — compiles a GitHub wiki (or any Markdown docs) into a self-describing v1 search index (see Index Format):
-
npx wiki-search-index --wiki ./wiki→<wiki>/search-index.json. - GitHub-slugger-accurate anchors; deterministic output (sorted, no timestamps) so a CI
git diff --exit-codecan gate index staleness. - Owner/repo inferred from the wiki dir's git origin;
--repo/--url-templatefor explicit or non-GitHub sites.
Ships alongside the hosted search app + the install-from-origin bookmarklet on GitHub Pages.