feat: replace regex README parser with markdown-it-py AST parser by vinta · Pull Request #2971 · vinta/awesome-python

vinta · 2026-03-18T12:26:01Z

Summary

Replaced the regex-based README parser with a proper AST parser using markdown-it-py, improving correctness and maintainability
Added website/readme_parser.py module with full entry extraction, HTML rendering, and 94 passing unit + integration tests
Extended the website with client-side features: column sorting (name, stars, last commit), URL-reflected search/filter state, relative time display, and improved table accessibility/mobile layout
Replaced pushed_at with last_commit_at (fetched from default branch) for more accurate recency data
Misc: extracted favicon to static SVG, live-reload preview, simplified Makefile, CSS transitions and polish

Test plan

make test — all 94 tests pass
make build — site builds without errors
make preview — verify table sorting, search/filter URL state, expanded rows, and mobile layout in browser

🤖 Generated with Claude Code

…ests Introduce readme_parser.py which parses README.md into structured section data using the markdown-it-py AST. Includes TypedDicts for ParsedEntry/ParsedSection, slugify(), render_inline_html(), and render_inline_text(). Add test_readme_parser.py covering HTML escaping, link rendering, emphasis, strong, and code_inline for both renderers. Co-Authored-By: Claude <noreply@anthropic.com>

Introduce parse_readme() which uses MarkdownIt to build a full AST instead of line-by-line regex matching. The function splits the document at the thematic break, groups nodes by h2 heading, extracts category descriptions from leading italic paragraphs, and separates the Categories, Resources, and Contributing sections cleanly. Add markdown-it-py==4.0.0 (+ mdurl) as a runtime dependency to support the new parser. Tests cover section counts, names, slugs, descriptions, content presence, boundary conditions (no separator, no description), and mixed description markup. Co-Authored-By: Claude <noreply@anthropic.com>

Replace _parse_section_entries stub with full implementation that walks bullet_list AST nodes to extract ParsedEntry records, including support for subcategory labels (text-only list items) and also_see nested links. Add _parse_list_entries, helper finders (_find_inline, _find_first_link, _find_child), and _extract_description_html with separator stripping. Extend test suite with TestParseSectionEntries covering flat entries, link-only entries, subcategorized entries, also_see, entry_count, preview first-four, and XSS escaping in description HTML. Co-Authored-By: Claude <noreply@anthropic.com>

Replace the _render_section_html stub with a working implementation that converts parsed bullet-list nodes into classed div elements (entry, entry-sub, subcat). Add _render_bullet_list_html to handle nested structure and XSS escaping. Cover all cases with a new TestRenderSectionHtml suite. Co-Authored-By: Claude <noreply@anthropic.com>

Adds TestParseRealReadme covering category count, slug generation, descriptions, entry counts, previews, content HTML, subcategory rendering, also-see links, and description link stripping. Co-Authored-By: Claude <noreply@anthropic.com>

slugify, parse_readme, count_entries, extract_preview, render_content_html, and related helpers are moved to a dedicated readme_parser module. build.py now imports from readme_parser rather than defining these inline. Tests for the removed functions are dropped from test_build.py since they now live with the module they test. Co-Authored-By: Claude <noreply@anthropic.com>

The markdown package is no longer used after switching the README parser to markdown-it-py in the feature branch. Co-Authored-By: Claude <noreply@anthropic.com>

load_cache was a duplicate of logic now living in build.load_stars. Switch the call site to the shared helper and remove the redundant local function and its tests. Co-Authored-By: Claude <noreply@anthropic.com>

- Remove `_has_description` which duplicated `_extract_description` logic; use truthiness of the description string instead - Remove unused `resources` parameter from `extract_entries` - Merge two sequential loops in `parse_readme` into a single pass over children to find hr, Resources, and Contributing indices Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split flat dev group into named groups (build, lint, test) so CI and production installs can pull only what they need. Add watchdog for the live-reload preview target. Co-Authored-By: Claude <noreply@anthropic.com>

Rename site_* targets to bare names (install, fetch_stats, build, preview). Replace the static preview target with a watchmedo-driven live-reload loop so file changes trigger automatic rebuilds. Make the output directory creation idempotent (exist_ok=True) and static copy incremental (dirs_exist_ok=True) so repeated builds don't wipe output on each run. Co-Authored-By: Claude <noreply@anthropic.com>

Add authors, readme, license, and project URLs to pyproject.toml. Move build deps out of the default dependencies list into named groups (build, lint, test, dev) so each tool group can be installed independently. uv.lock updated to reflect the new group structure. Co-Authored-By: Claude <noreply@anthropic.com>

Replace the inline data-URI emoji favicon with a proper Python-logo SVG served from /static/favicon.svg. Avoids repeated base64 encoding overhead and allows the icon to be cached and updated independently. Co-Authored-By: Claude <noreply@anthropic.com>

Replace two-column footer (links left, attribution right) with a single inline row of slash-separated items. Update attribution text to 'Made by Vinta' with a link to vinta.ws, align footer links to match standard anchor color/hover behavior. Co-Authored-By: Claude <noreply@anthropic.com>

- Add sr-only headings for search/filter and results regions - Add role=region and aria-label to .table-wrap for landmark navigation - Add tabindex=0 and focus outline to .table-wrap for keyboard reachability - Add sr-only text to empty Details column header - Add role=button to expandable rows - Add .expand-tags-mobile to show category/group tags in expand row on mobile - Show .expand-tags-mobile via media query at <=900px breakpoint Co-Authored-By: Claude <noreply@anthropic.com>

Switch readme_parser.py from regex-based parsing to markdown-it-py for more robust and maintainable Markdown AST traversal. Update build pipeline, templates, styles, and JS to support the new parser output. Refresh GitHub stars data and update tests to match new parser behavior. Co-Authored-By: Claude <noreply@anthropic.com>

vinta and others added 15 commits March 18, 2026 17:20

build: remove markdown dependency, replaced by markdown-it-py

143abbf

The markdown package is no longer used after switching the README parser to markdown-it-py in the feature branch. Co-Authored-By: Claude <noreply@anthropic.com>

refactor: consolidate load_cache into build.load_stars

af3baab

load_cache was a duplicate of logic now living in build.load_stars. Switch the call site to the shared helper and remove the redundant local function and its tests. Co-Authored-By: Claude <noreply@anthropic.com>

build: restructure dependency groups and add watchdog

74bba50

Split flat dev group into named groups (build, lint, test) so CI and production installs can pull only what they need. Add watchdog for the live-reload preview target. Co-Authored-By: Claude <noreply@anthropic.com>

vinta self-assigned this Mar 18, 2026

vinta force-pushed the feature/markdown-it-py-parser branch from 8627419 to 280f250 Compare March 18, 2026 12:33

vinta merged commit 539edc4 into master Mar 18, 2026

vinta deleted the feature/markdown-it-py-parser branch March 18, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace regex README parser with markdown-it-py AST parser#2971

feat: replace regex README parser with markdown-it-py AST parser#2971
vinta merged 16 commits intomasterfrom
feature/markdown-it-py-parser

vinta commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinta commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vinta commented Mar 18, 2026 •

edited

Loading