Skip to content

feat: replace regex README parser with markdown-it-py AST parser#2971

Merged
vinta merged 16 commits intomasterfrom
feature/markdown-it-py-parser
Mar 18, 2026
Merged

feat: replace regex README parser with markdown-it-py AST parser#2971
vinta merged 16 commits intomasterfrom
feature/markdown-it-py-parser

Conversation

@vinta
Copy link
Owner

@vinta vinta commented Mar 18, 2026

Summary

  • Replaced the regex-based README parser with a proper AST parser using markdown-it-py, improving correctness and maintainability
  • Added website/readme_parser.py module with full entry extraction, HTML rendering, and 94 passing unit + integration tests
  • Extended the website with client-side features: column sorting (name, stars, last commit), URL-reflected search/filter state, relative time display, and improved table accessibility/mobile layout
  • Replaced pushed_at with last_commit_at (fetched from default branch) for more accurate recency data
  • Misc: extracted favicon to static SVG, live-reload preview, simplified Makefile, CSS transitions and polish

Test plan

  • make test — all 94 tests pass
  • make build — site builds without errors
  • make preview — verify table sorting, search/filter URL state, expanded rows, and mobile layout in browser

🤖 Generated with Claude Code

vinta and others added 15 commits March 18, 2026 17:20
…ests

Introduce readme_parser.py which parses README.md into structured
section data using the markdown-it-py AST. Includes TypedDicts for
ParsedEntry/ParsedSection, slugify(), render_inline_html(), and
render_inline_text(). Add test_readme_parser.py covering HTML escaping,
link rendering, emphasis, strong, and code_inline for both renderers.

Co-Authored-By: Claude <noreply@anthropic.com>
Introduce parse_readme() which uses MarkdownIt to build a full AST
instead of line-by-line regex matching. The function splits the document
at the thematic break, groups nodes by h2 heading, extracts category
descriptions from leading italic paragraphs, and separates the
Categories, Resources, and Contributing sections cleanly.

Add markdown-it-py==4.0.0 (+ mdurl) as a runtime dependency to support
the new parser.

Tests cover section counts, names, slugs, descriptions, content
presence, boundary conditions (no separator, no description), and mixed
description markup.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace _parse_section_entries stub with full implementation that walks
bullet_list AST nodes to extract ParsedEntry records, including support
for subcategory labels (text-only list items) and also_see nested links.

Add _parse_list_entries, helper finders (_find_inline, _find_first_link,
_find_child), and _extract_description_html with separator stripping.

Extend test suite with TestParseSectionEntries covering flat entries,
link-only entries, subcategorized entries, also_see, entry_count, preview
first-four, and XSS escaping in description HTML.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace the _render_section_html stub with a working implementation that
converts parsed bullet-list nodes into classed div elements (entry,
entry-sub, subcat). Add _render_bullet_list_html to handle nested
structure and XSS escaping. Cover all cases with a new
TestRenderSectionHtml suite.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds TestParseRealReadme covering category count, slug generation,
descriptions, entry counts, previews, content HTML, subcategory
rendering, also-see links, and description link stripping.

Co-Authored-By: Claude <noreply@anthropic.com>
slugify, parse_readme, count_entries, extract_preview, render_content_html,
and related helpers are moved to a dedicated readme_parser module.
build.py now imports from readme_parser rather than defining these inline.
Tests for the removed functions are dropped from test_build.py since they
now live with the module they test.

Co-Authored-By: Claude <noreply@anthropic.com>
The markdown package is no longer used after switching the README parser
to markdown-it-py in the feature branch.

Co-Authored-By: Claude <noreply@anthropic.com>
load_cache was a duplicate of logic now living in build.load_stars.
Switch the call site to the shared helper and remove the redundant
local function and its tests.

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove `_has_description` which duplicated `_extract_description` logic;
  use truthiness of the description string instead
- Remove unused `resources` parameter from `extract_entries`
- Merge two sequential loops in `parse_readme` into a single pass over
  children to find hr, Resources, and Contributing indices

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split flat dev group into named groups (build, lint, test) so CI
and production installs can pull only what they need. Add watchdog
for the live-reload preview target.

Co-Authored-By: Claude <noreply@anthropic.com>
Rename site_* targets to bare names (install, fetch_stats, build,
preview). Replace the static preview target with a watchmedo-driven
live-reload loop so file changes trigger automatic rebuilds. Make
the output directory creation idempotent (exist_ok=True) and static
copy incremental (dirs_exist_ok=True) so repeated builds don't wipe
output on each run.

Co-Authored-By: Claude <noreply@anthropic.com>
Add authors, readme, license, and project URLs to pyproject.toml.
Move build deps out of the default dependencies list into named groups
(build, lint, test, dev) so each tool group can be installed independently.
uv.lock updated to reflect the new group structure.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace the inline data-URI emoji favicon with a proper Python-logo SVG
served from /static/favicon.svg. Avoids repeated base64 encoding overhead
and allows the icon to be cached and updated independently.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace two-column footer (links left, attribution right) with a
single inline row of slash-separated items. Update attribution text
to 'Made by Vinta' with a link to vinta.ws, align footer links to
match standard anchor color/hover behavior.

Co-Authored-By: Claude <noreply@anthropic.com>
- Add sr-only headings for search/filter and results regions
- Add role=region and aria-label to .table-wrap for landmark navigation
- Add tabindex=0 and focus outline to .table-wrap for keyboard reachability
- Add sr-only text to empty Details column header
- Add role=button to expandable rows
- Add .expand-tags-mobile to show category/group tags in expand row on mobile
- Show .expand-tags-mobile via media query at <=900px breakpoint

Co-Authored-By: Claude <noreply@anthropic.com>
@vinta vinta self-assigned this Mar 18, 2026
Switch readme_parser.py from regex-based parsing to markdown-it-py for
more robust and maintainable Markdown AST traversal. Update build pipeline,
templates, styles, and JS to support the new parser output. Refresh GitHub
stars data and update tests to match new parser behavior.

Co-Authored-By: Claude <noreply@anthropic.com>
@vinta vinta force-pushed the feature/markdown-it-py-parser branch from 8627419 to 280f250 Compare March 18, 2026 12:33
@vinta vinta merged commit 539edc4 into master Mar 18, 2026
@vinta vinta deleted the feature/markdown-it-py-parser branch March 18, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant