-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[Bug]: md-fit strips meaningful content metadata (usernames, attribution) not just boilerplate #1900
Copy link
Copy link
Open
Labels
⚙️ In-progressIssues, Features requests that are in ProgressIssues, Features requests that are in Progress🐞 BugSomething isn't workingSomething isn't working
Description
crawl4ai version
0.8.6
Expected Behavior
md-fit should strip page chrome (navigation, footers, sidebars, cookie banners) but preserve meaningful content metadata like author names and attribution on user-generated content. Comment attribution (who said what) is semantic content, not boilerplate.
Current Behavior
md-fit strips comment author usernames and profile links from GitHub pages while keeping the comment body and date. The output shows commented Apr 6, 2026 with no indication of who wrote it.
With -o md, you get:
**[mattheworiordan](https://gist.github.com/mattheworiordan)** commented Apr 6, 2026
With -o md-fit, you get:
commented Apr 6, 2026
The username and profile link are gone. On pages with multiple commenters, it's impossible to tell who said what.
Is this reproducible?
Yes
Inputs Causing the Bug
- URL: https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac
- Settings: `-o md-fit` (default fit mode, no other flags)Steps to Reproduce
# 1. Crawl the gist with full markdown - username preserved
crwl crawl https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac -o md
# Look for: **[mattheworiordan](...)** commented Apr 6, 2026
# 2. Same URL with md-fit - username stripped
crwl crawl https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac -o md-fit
# Look for: commented Apr 6, 2026 (no username)Code snippets
# Not a code bug - this is CLI / markdown generation behaviour.
# Reproduces purely via crwl CLI as shown above.OS
macOS (Darwin 25.2.0)
Python version
3.14.3
Browser
(default Chromium managed by crawl4ai)
Browser version
(crawl4ai default)
Error logs & Screenshots (if applicable)
n/a
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
⚙️ In-progressIssues, Features requests that are in ProgressIssues, Features requests that are in Progress🐞 BugSomething isn't workingSomething isn't working