-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect HTML parsing, omitted closing tags #482
Comments
I see that this is actually rather a problem of a dependency of this project. I fixed the bug and requested a pull to the dependency (node-htmlparser): tautologistics/node-htmlparser#54 |
I've let @tautologistics know I'm happy to help maintain node-htmlparser; hopefully we can get this pulled in. |
@tautologistics seems to have gone offline, wouldn't it be possible to set up your own up-to-date fork and change the package dependency to that, at least for the time being? |
We're also considering moving to a different html5 parser, either the one from dom.js per #354, or @aredridel's html5. You can currently run the tests with a different parser, so if anyone gets one working that passes at least as many tests as current, plus fixes a bug or two, I'm ready to switch. |
Yeah, the html5 parser should handle this better. node-htmlparser is raw speed but not careful at all. |
Is it possible to use the HTML5 parser with jsdom.env? I'm making calls with {config:{parser:require('html5')}}, but I'm not seeing any difference from the normal parser (specifically, <p> tags aren't closing the previous unclosed <p> tag). |
I'd love to see jsdom's setup routines simplified to fix things like this ;-) |
Pull requests welcome from people who know more about what they want and how jsdom setup works than I do :) |
FWIW, it took some trickery to get the tests to work with html5. From what I recall, one problematic issue is that lib/jsdom.js passes an export from jsdom/browser/index onto its own exports: https://github.com/tmpvar/jsdom-oscon/blob/master/node_modules/jsdom/lib/jsdom.js#L10-11 If jsdom/browser/index is only partially initialized when lib/jsdom.js is sourced, this will pick up undefineds. Note that my patch that added setDefaultParser is mainly intended to run the tests; providing html5 as a parser for individual calls has so far worked for me. |
The parser should take some HTML principles into account.
There are several situations in which you can omit closing tags of some elements.
This is valid HTML5 code and should be parsed as
The parser instead nests the second paragraph into the former one as
This is a major bug in my opinion.
The text was updated successfully, but these errors were encountered: