Skip to content

gh-102555: Fix comment parsing in HTMLParser #135664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 18, 2025

  • "--!>" now ends the comment.
  • "-- >" no longer ends the comment.
  • Support abnormally ended empty comments "<-->" and "<--->".

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
@serhiy-storchaka serhiy-storchaka changed the title gh-135661: Fix comment parsing in HTMLParser gh-102555: Fix comment parsing in HTMLParser Jun 25, 2025
Copy link
Contributor

@Privat33r-dev Privat33r-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation overall :)

@@ -309,6 +310,21 @@ def parse_html_declaration(self, i):
else:
return self.parse_bogus_comment(i)

# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
def parse_comment(self, i, report=True):
Copy link
Contributor

@Privat33r-dev Privat33r-dev Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the change should be made in the _markupbase.

cpython/Lib/_markupbase.py

Lines 165 to 175 in c2f2fd4

def parse_comment(self, i, report=1):
rawdata = self.rawdata
if rawdata[i:i+4] != '<!--':
raise AssertionError('unexpected call to parse_comment()')
match = _commentclose.search(rawdata, i+4)
if not match:
return -1
if report:
j = match.start(0)
self.handle_comment(rawdata[i+4: j])
return match.end(0)

If the method is overloaded here, then there are no other use cases, and the original method becomes dead code.
https://github.com/search?q=repo%3Apython%2Fcpython%20parse_comment&type=code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants