feat: replace regex README parser with markdown-it-py AST parser#2971
Merged
feat: replace regex README parser with markdown-it-py AST parser#2971
Conversation
…ests Introduce readme_parser.py which parses README.md into structured section data using the markdown-it-py AST. Includes TypedDicts for ParsedEntry/ParsedSection, slugify(), render_inline_html(), and render_inline_text(). Add test_readme_parser.py covering HTML escaping, link rendering, emphasis, strong, and code_inline for both renderers. Co-Authored-By: Claude <noreply@anthropic.com>
Introduce parse_readme() which uses MarkdownIt to build a full AST instead of line-by-line regex matching. The function splits the document at the thematic break, groups nodes by h2 heading, extracts category descriptions from leading italic paragraphs, and separates the Categories, Resources, and Contributing sections cleanly. Add markdown-it-py==4.0.0 (+ mdurl) as a runtime dependency to support the new parser. Tests cover section counts, names, slugs, descriptions, content presence, boundary conditions (no separator, no description), and mixed description markup. Co-Authored-By: Claude <noreply@anthropic.com>
Replace _parse_section_entries stub with full implementation that walks bullet_list AST nodes to extract ParsedEntry records, including support for subcategory labels (text-only list items) and also_see nested links. Add _parse_list_entries, helper finders (_find_inline, _find_first_link, _find_child), and _extract_description_html with separator stripping. Extend test suite with TestParseSectionEntries covering flat entries, link-only entries, subcategorized entries, also_see, entry_count, preview first-four, and XSS escaping in description HTML. Co-Authored-By: Claude <noreply@anthropic.com>
Replace the _render_section_html stub with a working implementation that converts parsed bullet-list nodes into classed div elements (entry, entry-sub, subcat). Add _render_bullet_list_html to handle nested structure and XSS escaping. Cover all cases with a new TestRenderSectionHtml suite. Co-Authored-By: Claude <noreply@anthropic.com>
Adds TestParseRealReadme covering category count, slug generation, descriptions, entry counts, previews, content HTML, subcategory rendering, also-see links, and description link stripping. Co-Authored-By: Claude <noreply@anthropic.com>
slugify, parse_readme, count_entries, extract_preview, render_content_html, and related helpers are moved to a dedicated readme_parser module. build.py now imports from readme_parser rather than defining these inline. Tests for the removed functions are dropped from test_build.py since they now live with the module they test. Co-Authored-By: Claude <noreply@anthropic.com>
The markdown package is no longer used after switching the README parser to markdown-it-py in the feature branch. Co-Authored-By: Claude <noreply@anthropic.com>
load_cache was a duplicate of logic now living in build.load_stars. Switch the call site to the shared helper and remove the redundant local function and its tests. Co-Authored-By: Claude <noreply@anthropic.com>
- Remove `_has_description` which duplicated `_extract_description` logic; use truthiness of the description string instead - Remove unused `resources` parameter from `extract_entries` - Merge two sequential loops in `parse_readme` into a single pass over children to find hr, Resources, and Contributing indices Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split flat dev group into named groups (build, lint, test) so CI and production installs can pull only what they need. Add watchdog for the live-reload preview target. Co-Authored-By: Claude <noreply@anthropic.com>
Rename site_* targets to bare names (install, fetch_stats, build, preview). Replace the static preview target with a watchmedo-driven live-reload loop so file changes trigger automatic rebuilds. Make the output directory creation idempotent (exist_ok=True) and static copy incremental (dirs_exist_ok=True) so repeated builds don't wipe output on each run. Co-Authored-By: Claude <noreply@anthropic.com>
Add authors, readme, license, and project URLs to pyproject.toml. Move build deps out of the default dependencies list into named groups (build, lint, test, dev) so each tool group can be installed independently. uv.lock updated to reflect the new group structure. Co-Authored-By: Claude <noreply@anthropic.com>
Replace the inline data-URI emoji favicon with a proper Python-logo SVG served from /static/favicon.svg. Avoids repeated base64 encoding overhead and allows the icon to be cached and updated independently. Co-Authored-By: Claude <noreply@anthropic.com>
Replace two-column footer (links left, attribution right) with a single inline row of slash-separated items. Update attribution text to 'Made by Vinta' with a link to vinta.ws, align footer links to match standard anchor color/hover behavior. Co-Authored-By: Claude <noreply@anthropic.com>
- Add sr-only headings for search/filter and results regions - Add role=region and aria-label to .table-wrap for landmark navigation - Add tabindex=0 and focus outline to .table-wrap for keyboard reachability - Add sr-only text to empty Details column header - Add role=button to expandable rows - Add .expand-tags-mobile to show category/group tags in expand row on mobile - Show .expand-tags-mobile via media query at <=900px breakpoint Co-Authored-By: Claude <noreply@anthropic.com>
Switch readme_parser.py from regex-based parsing to markdown-it-py for more robust and maintainable Markdown AST traversal. Update build pipeline, templates, styles, and JS to support the new parser output. Refresh GitHub stars data and update tests to match new parser behavior. Co-Authored-By: Claude <noreply@anthropic.com>
8627419 to
280f250
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
markdown-it-py, improving correctness and maintainabilitywebsite/readme_parser.pymodule with full entry extraction, HTML rendering, and 94 passing unit + integration testspushed_atwithlast_commit_at(fetched from default branch) for more accurate recency dataTest plan
make test— all 94 tests passmake build— site builds without errorsmake preview— verify table sorting, search/filter URL state, expanded rows, and mobile layout in browser🤖 Generated with Claude Code