Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension md_in_html does not recognize tags with hyphens #1246

Open
igordsm opened this issue Apr 28, 2022 · 4 comments
Open

Extension md_in_html does not recognize tags with hyphens #1246

igordsm opened this issue Apr 28, 2022 · 4 comments
Labels
confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.

Comments

@igordsm
Copy link

igordsm commented Apr 28, 2022

Web components are custom HTML components that are required to have - in their names. This breaks current HTML handling since these elements are not considered. IMHO they should be treated the same as <div> ("block" elements, if I'm not mistaken).

The following was tested in current main with the extension md_in_html active.

input

<a-b>

asdf

</a-b>

output:

<p><a-b></p>
<p>asdf</p>
<p></a-b></p>

expected:

<a-b>
<p>asdf</p>
</a-b>

I went through the code and might know how to add this, but I would like the maintainers' input before proceeding.

@waylan
Copy link
Member

waylan commented Apr 29, 2022

Web components are custom HTML components that are required to have - in their names.

Can you point us to a spec for this?

@waylan waylan added more-info-needed More information needs to be provided. extension Related to one or more of the included extensions. labels Apr 29, 2022
@igordsm
Copy link
Author

igordsm commented Apr 29, 2022

The resource I use the most is MDN: https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements

The actual specification of valid names is at https://html.spec.whatwg.org/#valid-custom-element-name

@waylan
Copy link
Member

waylan commented May 2, 2022

Thank you for the links. There are two things I need to mention here.

First of all, the way Python-Markdown handles raw HTML is to define a list of known block-level tags. Any content within those block-level tags gets special treatment. Anything outside those known block-level tags is just treated as regular Markdown content, including inline raw HTML elements, which explains the behavior of the sample provided above.

Second, I will note that to use custom elements, the HTML spec requires you to register the elements with the browser first. Without registering them, then the browser would have no knowledge of how to handle them. In fact, a custom element could be an inline element or a block-level element.

Given the above, I think that the logical way to support custom elements in Python-Markdown is to require the user to "register" the elements. That is, if you have a custom element which should be treated as a block-level element, you need to inform the Markdown class about it. This would probably make a good candidate for a third party extension (extension to register custom elements), although you can do this without an extension as demonstrated below.

>>> src = '''
... <a-b>
...
... asdf
...
... </a-b>
... '''
>>> md = markdown.Markdown()
>>> md.block_level_elements.append('a-b')
>>> md.convert(src)
'<a-b>\n\nasdf\n\n</a-b>'

That said, this does not currently work correctly with the md_in_html extension. Specifically, the extension fails to allow Markdown parsing within the element.

>>> md = markdown.Markdown(extensions=['md_in_html'])
>>> md.block_level_elements.append('a-b')
>>> md.convert(src)
'<a-b>\n\nasdf\n\n</a-b>'

This would appear to be because the extension compiles its lists of element types when the class instance is created and therefore does not see the changes made to the Markdown class latter (see relevant code here). Ideally, the extension would build its list of element types after all extensions are loaded. I'm open to a PR which makes this change only. However, I do not see any need to add explicit support for custom elements specifically.

@waylan waylan added feature Feature request. someday-maybe Approved low priority request. confirmed Confirmed bug report or approved feature request. and removed more-info-needed More information needs to be provided. labels May 2, 2022
@igordsm
Copy link
Author

igordsm commented May 9, 2022

Thanks for the detailed feedback @waylan . I'll try and make a PR with the changes you outlined above this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.
Projects
None yet
Development

No branches or pull requests

2 participants