-
Notifications
You must be signed in to change notification settings - Fork 0
Release notes
Detailed release history for wiki-search-index (the npm builder). The README carries the one-line cliff-notes; this page carries the detail.
New: fold non-wiki repo files into the index with --file. A project's README is often its real API doc; --file <path> (repeatable) now indexes a repo file alongside the wiki pages, so a search jumps straight to the right README section — results deep-link to the rendered file on GitHub. The folded doc carries a relative page (../blob/<branch>/<path>) that the wiki URL template expands and the browser normalizes on navigation; this is purely a page-value convention, so the index format stays v1 (no new fields). --branch overrides the blob branch (default: the repo's inferred default branch); run the builder from the repo root so --file paths and the branch resolve correctly. The hosted app encodes page per path-segment so the / and .. survive — existing single-segment wiki pages stay byte-identical, so committed indices don't churn. Same-host targets only. See Add Search.
Fix: headings containing Markdown links or images now slug like GitHub. Folding READMEs surfaced that the heading→anchor path decoded HTML entities (0.1.3) but not Markdown: a link [Configuration](#config) leaked its URL into the slug, and a badge [![NPM version][img]][url] leaked its alt text and refs. The builder now reduces a heading to the text GitHub slugs against — drops images, links→text, unwraps inline code — before slugging, and (matching github-slugger) no longer trims edge whitespace, so a heading ending in a badge like # tool [![badge]…] anchors as tool-, not tool. Inline code, * / ~ emphasis, and snake_case already slugged correctly and are untouched. Verified live against rendered READMEs (wiki-search 5/5, node-re2 23/23 anchors). Output stays deterministic and wiki-only indices are byte-identical, so the committed-index git diff --exit-code staleness gate still holds.
Fix: HTML entities are now decoded in headings, not just body text. 0.1.1 decoded entities only while reducing Markdown to searchable text (toPlainText); the heading, title, and derived anchor paths still passed entities through verbatim. So a heading written with entities — e.g. ## 4.2.2 — 2026-05-29 — produced the junk anchor #422-mdash-2026-05-29 instead of GitHub's real #422--2026-05-29 (GitHub renders the entity to —, drops it, and the two flanking spaces collapse to --). That is a navigation-correctness bug: a result for such a section linked to the page top instead of the heading, and the displayed section title showed the literal —. The builder now runs one shared entity-decode step ahead of both github-slugger and the stored display text, so anchors and titles match GitHub. Verified live against the rendered wiki anchors (id="user-content-…"). As in 0.1.1, literal-Unicode wikis are byte-identical; only entity-using wikis change, and output stays deterministic so the committed-index git diff --exit-code staleness gate still holds.
Set the published CLI bin's execute bit. builder/wiki-index.mjs is now tracked in git as 100755 rather than 100644 — tidy packaging hygiene. This does not fix a real bug: npm chmods bin targets executable on install, so the prior 0o644 mode did not actually break npx wiki-search-index or global installs. (The "command not found" seen while testing was a cwd/version collision — running npx wiki-search-index from inside the package's own repo resolves the command to the local checkout, which has no installed .bin — not the file mode.) No code change; the 0.1.1 entity-decoding behavior is unchanged.
HTML entity decoding in text extraction. The builder now resolves HTML entities (—, →, Ӓ, 🔍, …) while reducing Markdown to searchable text. Previously they passed through verbatim and the tokenizer split them into junk terms (— → mdash, 🔍 → 128269); snippets also showed the literal entity. Now: numeric entities decode generally, so a genuinely letter-valued entity like α (α) is preserved as a real term while typographic/symbol entities (em dash, arrows, emoji) decode to characters the tokenizer discards — and snippets render the glyph. Indices for wikis written in literal Unicode (the common case) are byte-identical; only entity-using wikis change. Output stays deterministic, so the committed-index git diff --exit-code staleness gate still holds.
Related (the hosted app, not the npm package): the bookmarklet's ?from= detection now resolves owner/repo from any github.com/<owner>/<repo> page — the repo root, /actions, /pull/N, etc. — not just /wiki/… pages, so the bookmarklet works from anywhere in a repo.
Initial release of the wiki-search-index CLI — compiles a GitHub wiki (or any Markdown docs) into a self-describing v1 search index (see Index Format):
-
npx wiki-search-index --wiki ./wiki→<wiki>/search-index.json. - GitHub-slugger-accurate anchors; deterministic output (sorted, no timestamps) so a CI
git diff --exit-codecan gate index staleness. - Owner/repo inferred from the wiki dir's git origin;
--repo/--url-templatefor explicit or non-GitHub sites.
Ships alongside the hosted search app + the install-from-origin bookmarklet on GitHub Pages.