New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some entities in text content are not converted/escaped when serializing #58
Comments
Is the second > in your question written as I think having something that looks like a closing tag bracket in text or values is "uncritical" for XML parsers, which can not be said about I have also seen that other parsers (at least one of the abandoned domjs or libxmljs) treat those things in a more consistent manner. If we would decide to change that (after having a stable test suite running on every change), even if it's not really a difference (from my current understanding) it might be considered a breaking change. So my guess is such a change might not come very soon. |
Thanks -- I do mean a tag bracket character. For example, if a node value contains HTML text where entities have been converted. You load it with the DOMParser, then when you XMLSerialize it the entities are reconverted, except the closing bracket Example: gets serialized as |
Yes. That's what I was also referring to. But another short search in the old repo shows that there are already multiple issues around it there and also PRs. |
According to XML spec https://www.w3.org/TR/xml/#NT-CharData Quote from spec
So |
@sarod Do I understand you correctly that you agree to a "subset" of the original question, namely for the case of Do you think we should just always convert it, even thought he specification you are quoting also says (emphasis mine)
? I think I'm convinced that we should take care of the |
to adhere to the XML specification https://www.w3.org/TR/xml/#NT-CharData xmldom#58 (comment)
The only thing left to do here is to add (non standard) options to the serializer / But I don't consider this a very important feature compared to other topics. |
There is some more related information in #22 which is the older one but I will still mark that one duplicate. |
Sorry for the late answer
Yes.
Limiting the conversion to the ']]>' case as the advantage of minimizing the changes in generated xml and so is likely to cause less breaking changes for consumers of the library. So that would be my recommendation but the choice is yours. |
@sarod Thx for the clarification. This issue is left open just for the more general topic of controlling the general behavior of serializing entities, hence the changed title. |
@karfau the spec seems to suggest that https://www.w3.org/TR/2016/WD-DOM-Parsing-20160517/ is the spec covering the method in question.
step 5: Return the result of running the XML serialization algorithm step 14: Append to markup the result of the XML serialization of node's attributes step 9.3: The result of serializing an attribute value given attr's value attribute step 3: Text: """ Otherwise, attribute value is a string. Return the value of attribute value, first replacing any occurrences of the following:
NOTE This matches behavior present in browsers, and goes above and beyond the grammar requirement in the XML specification's AttValue production [XML10] by also replacing ">" characters. The correct parsing of the spec is that XML10 did not require |
@SheetJSDev Thank you for digging deeper.
But the lines you quote are talking about I will update the labels accordingly. |
A bunch of issues related to new XMLSerializer().serializeToString(new DOMParser().parseFromString('<foo bar=">"/>', 'text/xml').documentElement) In Chrome: > new XMLSerializer().serializeToString(new DOMParser().parseFromString('<foo bar=">"/>', 'text/xml').documentElement)
< '<foo bar=">"/>' In version 0.8.1: > const { XMLSerializer, DOMParser } = require("@xmldom/xmldom")
undefined
> new XMLSerializer().serializeToString(new DOMParser().parseFromString('<foo bar=">"/>', 'text/xml').documentElement)
'<foo bar=">"/>' |
You are right, it looks like I confused myself with all the different threads on this issue. With the links you provided it makes sense to me that his is a bug and it will be fixed soon. Thank you for insisting. https://stackblitz.com/edit/js-xmldom58?devToolsHeight=33&file=index.js |
in both attributes and text content Fixes #58 https://w3c.github.io/DOM-Parsing/#xml-serializing-a-text-node https://w3c.github.io/DOM-Parsing/#serializing-an-element-s-attributes
in both attributes and text content Fixes #58 https://w3c.github.io/DOM-Parsing/#xml-serializing-a-text-node https://w3c.github.io/DOM-Parsing/#serializing-an-element-s-attributes
Using the master branch this issue should be resolved now. |
Looks good:
|
I have noticed that node values/strings containing
>
are not converted to>
when serializing the DOM. Is this due to any flag I should be using?The text was updated successfully, but these errors were encountered: