Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentType XML serialization doesn't handle the presence of double quotes in system ID #71

Open
cscott opened this issue Jul 2, 2021 · 0 comments

Comments

@cscott
Copy link

cscott commented Jul 2, 2021

In https://w3c.github.io/DOM-Parsing/#dfn-xml-serializing-a-documenttype-node we read:

  1. If the require well-formed flag is true and the node's systemId attribute contains characters that are not matched by the XML Char production or that contains both a """ (U+0022 QUOTATION MARK) and a "'" (U+0027 APOSTROPHE), then throw an exception; the serialization of this node would not be a well-formed document type declaration.
    ...
  2. If the node's systemId is not the empty string then append the following, in the order listed, to markup:
    9.1 " " (U+0020 SPACE);
    9.2 """ (U+0022 QUOTATION MARK);
    9.3 The value of the node's systemId attribute;
    9.4 """ (U+0022 QUOTATION MARK).

The intention here seems to be to use single-quotes to surround systemID if the systemID contains a double-quote, and double-quotes to surround systemID otherwise, only throwing an exception if both a single-quote and a double-quote are present in the systemId attribute. But that good idea got lost between step 2 and step 9, and we only/always use double-quotes to surround the systemId.

One of two fixes should be made: A. Tweak step 2 to remove mention to U+0027 APOSTROPHE and just throw the exception if the systemId contains U+0022 QUOTATION MARK; or B. change steps 9.2 and 9.4 to both say "U+0022 QUOTATION MARK if the node's systemID does not contain a U+0022 QUOTATION MARK, otherwise U+0027 APOSTROPHE".

Option B is what Firefox appears to do:

$doc = (new DOMParser()).parseFromString("<!DOCTYPE root SYSTEM 'foo\"bar'><root><child>text</child></root>", "text/xml");
(new XMLSerializer()).serializeToString($doc)

outputs

<!DOCTYPE root SYSTEM 'foo"bar'>
<root><child>text</child></root>
wmfgerrit pushed a commit to wikimedia/mediawiki-libs-Dodo that referenced this issue Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant