Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The a element in HTML5 #51

Closed
bytestream opened this issue Dec 10, 2019 · 9 comments
Closed

The a element in HTML5 #51

bytestream opened this issue Dec 10, 2019 · 9 comments

Comments

@bytestream
Copy link
Contributor

The a element may be wrapped around entire paragraphs, lists, tables, and so forth, even entire sections, so long as there is no interactive content within (e.g. buttons or other links).
https://html.spec.whatwg.org/dev/text-level-semantics.html#the-a-element

Example:

<a><table></table></a>

becomes

<a></a><table></table>
@bytestream
Copy link
Contributor Author

Presumably a fairly simple fix?

$def->addElement('a', 'Inline', 'Flow', 'Common');

https://github.com/xemlock/htmlpurifier-html5/blob/master/library/HTMLPurifier/HTMLModule/HTML5/Text.php#L68

xemlock added a commit that referenced this issue Dec 10, 2019
@xemlock
Copy link
Owner

xemlock commented Dec 10, 2019

Hi @bytestream,
This is not a bug of this library, it's how PHP DOMDocument (used by HTMLPurifier as the default lexer) parses the HTML. If you switch lexer implementation to DirectLex, then the parsing works as expected.

$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('Core.LexerImpl', 'DirectLex');

// now <A> won't be autoclosed when a <TABLE> is encountered

@bytestream
Copy link
Contributor Author

Hey, thanks for the quick reply.

It should be default functionality though, given this lib advertises as HTML5 support and an anchor with block level elements is supported in HTML5? DOMDocument doesn't support HTML5 so while that's used, this lib can never fully be HTML5 friendly.

What are the implications of switching to DirectLex? htmlpurifier docs seem to suggest DirectLex is the default PHP 4 lexer which is a bit old school...

@bytestream
Copy link
Contributor Author

I'm not happy with DirectLex as a replacement based on these comments:

Our in-house implementation of a parser. A pure PHP parser, DirectLex has absolutely no dependencies, making it a reasonably good default for PHP4.

It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex)

Perhaps time to look at replacing PH5P with https://github.com/Masterminds/html5-php

@xemlock
Copy link
Owner

xemlock commented Dec 10, 2019

Unfortunately DirectLex is the only working solution for now. And PH5P, the other lexer shipped with HTMLPurifier, also can't handle HTML5 properly.

Switching lexer to html5-php sounds like a proper solution. It's definitely doable, but requires more development time (which unfortunately I don't have at the moment).

@bytestream
Copy link
Contributor Author

Would you accept that here or should I send it to htmlpurifier? It's pretty simple to replace PH5P with masterminds/html5-php and it seems to resolve the issue I've mentioned here.

xemlock added a commit that referenced this issue Dec 10, 2019
Segmentation fault started occuring after adding test case related to issue #51
@xemlock
Copy link
Owner

xemlock commented Dec 10, 2019

I'd gladly accept it here, thanks 🙂. I would prefer, however, to keep the default lexer intact (DOMLex), at least for the 0.1.x versions.

@bytestream
Copy link
Contributor Author

Sent :) but pending feedback for two issues posted to masterminds/html5

@xemlock
Copy link
Owner

xemlock commented Dec 14, 2019

HTML5 parsing issues resolved by #52. Closing.

@xemlock xemlock closed this as completed Dec 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants