Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Incorrect parsing of attributes that contains "<" or ">" characters. #494

Closed
inikulin opened this Issue Sep 17, 2012 · 4 comments

Comments

Projects
None yet
3 participants
Contributor

inikulin commented Sep 17, 2012

jsdom (I'm using v.0.2.13) can't correctly handle attributes that contain ">" or "<" characters. Here is a simple reproduction example:

require('jsdom').env('<div><img src="test.png" alt=">" /></div>', [],
function(errors, window) {
  console.log(window.document.innerHTML);
});

Expected output:

<html><body><div><img src="test.png" alt=">"></div></body></html>

Actual output:

<html><body><div><img src="test.png" alt="alt">" /</div></body></html>
Collaborator

domenic commented Oct 5, 2012

I believe this is related to the fact that we don't have a full HTML5-compliant parser to handle peoples' invalid HTML.

Collaborator

domenic commented Oct 10, 2012

See also #354

saurik commented Jan 3, 2013

I saw this issue, and feel the need to point out that this is not "invalid HTML": the RFC even back as far as HTML 2.0 states that an attribute value is not allowed to contain its own delimiter character, but otherwise is allowed to contain any character; it mentions >, but only in the context of "historical implementations" (historical to 1995) that break if you have a > in the attribute, and thereby has a "should" on people using entity references.

However, it is clear that this is allowed, and not just due to error handling: you can actually plug that snippet of HTML into the W3C validator and it doesn't even spit out a warning for that angle bracket (something which doesn't change if you are validating as either HTML or XHTML). Regardless, you really don't need a "full HTML5-complaint parser" to handle this case: even the simplest of HTML grammars should work just fine.

Contributor

inikulin commented Apr 7, 2013

Currently I'm working on extremly fast full spec-compliant HTML5 parser for Node, that should fix most of the parsing issues: https://github.com/inikulin/parse5

@domenic domenic closed this in b7dafd2 Apr 15, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment