-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse internal entity declarations in internal DTD #367
Changes from all commits
d949b51
57e7cf3
3ad7fba
d10f537
4f65b78
1aa4055
49b358f
311ffc3
0d665a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,7 +78,7 @@ function parse(source,defaultNSMapCopy,entityMap,domBuilder,errorHandler){ | |
if(end>start){ | ||
var xt = source.substring(start,end).replace(/&#?\w+;/g,entityReplacer); | ||
locator&&position(start); | ||
domBuilder.characters(xt,0,end-start); | ||
domBuilder.characters(xt,0,xt.length); | ||
start = end | ||
} | ||
} | ||
|
@@ -554,6 +554,38 @@ function parseDCC(source,start,domBuilder,errorHandler){//sure start with '<!' | |
domBuilder.endCDATA() | ||
return end+3; | ||
} | ||
if(source.substr(start+2,6) == 'ENTITY'){ | ||
var end = source.indexOf('>', start+8); | ||
var chunk = source.substring(start+8, end); | ||
var match = chunk.match(/^\s+(?:(%)\s+)?(\S+)\s+(["'])/); | ||
if (!match) { | ||
// NOTE: Ignoring unhandled forms of entity declarations | ||
return -1; | ||
} | ||
var declStart = start+8+match[0].length; | ||
var name = match[2]; | ||
var delim = match[3]; | ||
var delimEnd = source.indexOf(delim, declStart); | ||
if (delimEnd > end) { | ||
end = source.indexOf('>', delimEnd); | ||
} | ||
if (match[1] === '%') { | ||
// NOTE: Ignoring the PEDef form of entity declaration after forwarding | ||
// to the declaratino end. | ||
return end; | ||
} | ||
Comment on lines
+560
to
+576
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For landing this PR there are two things I would like to have added regarding this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm short on time at the moment alas, so I'd appreciate some pointers for collecting those cases, and/or suggestions for where to add new tests (even if that is just a link to some contribution guideline I may have missed). I also agree that warnings are prudent for the range of unsupported entity forms. I just want to know how to balance that with the current "lenient/silent on unsupported" approach in other parts of this code. "Leave things in a better state than when entered" is perfectly viable here, I mostly want to know where to set the bar. (There is the risk of needing to handle the nested DOCTYPE state just to emit warnings. I think we can dodge that, to avoid increasing the SAX parser complexity if it is slated for a bigger overhaul or replacement.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started looking into the (number of) cases we need to test vs support. Note:
Regarding the
regarding the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I think that https://github.com/xmldom/xmldom/blob/master/test/parse/doctype.test.js is a perfect place for those additional tests. I assume that I will soon be able to provide some samples of what I have in mind. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just invested at least another hour into doing the same for I also think that the current implementation doesn't really support everything I marked as supported in the
or maybe even just
and everything else fails as before. And if we don't find a reliable way to know when the What do you think? |
||
var value = source.substring(declStart, delimEnd); | ||
// NOTE: This value is not further processed, but treated as PCDATA. | ||
// If recursive processing is implemented, ENSURE to avoid an XML Bomb | ||
// attack vector! | ||
domBuilder.internalEntityDecl(name, value); | ||
var nextTagStart = source.indexOf('<', end); | ||
var dtdEnd = source.indexOf(']>', end); | ||
if (dtdEnd < nextTagStart) { | ||
domBuilder.endDTD(); | ||
} | ||
Comment on lines
+577
to
+586
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just found the following section of the spec:
Does the current implementation support this? (We should add a test if it's not already there.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, no, this specific recommended re-declaration isn't technically supported (and there is apparently no test for it). Of course these are predefined so they are already available (and I cannot recall seeing this re-declaration in practise, at least not internally in documents). As I read this, it seems that while re-declarations are allowed to issue warnings, these specific characters (being the And yes, I believe doing a replacement run on the entity values could open up that can. The Billion laughs attack doesn't rely upon recursion, but upon exponential growth, which this would enable. It should then be combined with e.g. monitoring of value growth, such as by carefully counting expansions, as libxml2 does with its entity reference loop detection (and corresponding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for elaborating on this topic, very much appreciated. I fully agree that there is no need for implementing this "exception" of the "first definition is binding" rule for XML entities, since the only allowed option is to redefine them to the same value, so in the worst case there is a warning about this not being supported, but the values still being correct. I just started to think about (X)HTML and entities.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, my confusion, currently xmldom basically only knows about XHTML and treats HTML in the "same way". |
||
return end+2; | ||
} | ||
//<!DOCTYPE | ||
//startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId) | ||
var matchs = split(source,start); | ||
|
@@ -570,11 +602,21 @@ function parseDCC(source,start,domBuilder,errorHandler){//sure start with '<!' | |
sysid = matchs[3][0]; | ||
} | ||
} | ||
var lastMatch = matchs[len-1] | ||
domBuilder.startDTD(name, pubid, sysid); | ||
domBuilder.endDTD(); | ||
|
||
var hasInternalDTD = matchs[2][0] === '['; | ||
// NOTE: Currently only handles entity declarations and not the full | ||
// [internal DTD subset](https://www.w3.org/TR/xml/#NT-doctypedecl). | ||
if (hasInternalDTD) { | ||
// NOTE: endDTD must be called by the last internal subset item. | ||
return matchs[2].index; | ||
} else { | ||
domBuilder.endDTD(); | ||
} | ||
|
||
return lastMatch.index+lastMatch[0].length | ||
var lastMatch = matchs[len-1]; | ||
var end = lastMatch.index+lastMatch[0].length; | ||
return end; | ||
} | ||
} | ||
return -1; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud
without checking any spec:https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-entity-decl
prototype
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question and spec reference! Since the default
entityMap
s defined inlib/entities.js
are frozen (usingObject.freeze
fromlib/conventions.js
), this kind of works right now; but only the predefined entities, not if repeating a declaration.So I added an explicit check and redeclaration warning in 311ffc3.
The prototype is defined on the type, not in the prototype, so an undefined
'prototype'
key returnsundefined
(and a&prototype;
entity can be defined in an entity declaration).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the prototypical inheritance is an old manoeuvre which ought to be efficient. But at least moving it out to a utility could be wise. I did consider using
Object.assign
here, but AFAIK that is an ES6 feature. (If a future overhaul decides to use e.g.hasOwnProperty
tests, or replace the entityMap with an ES6Map
here, this would of course have to be reworked.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be aware it will not be frozen in a runtime where the
Object.freeze
method is not available. So thank you for adding that check.I think it would be good to add a test for that, and also the prototype entity, just to be sure it doesn't break in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just played with this approach a bit and after checking how the
entityMap
is currently used to replace entities filed the bug #370.So once that bug is fixed your current solution no longer works.
I assume we will not get around copying the entities. (I wouldn't want to touch the sax parser more then absolutely required right now, since we have plans to get rid of it: #55.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mentioned bug was now fixed and the fix was released as 0.8.1.
Please update your branch (I can also do that if you prefer), after which I assume some tests would fail. The solution should be to copy the entitymap.
Since we can not just rely on
Object.assign
I will provide something similar as part oflib/conventions.js
in the next days (in a separate PR).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has been done as part of #379