You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Facing the need to implement XML Canonicalization (for XML Signature validation and building), I cannot easily reuse pugixml for the basic steps because the parser seems to always remove eol, unless from identified PCDATA contents when parse_ws_pcdata has been set.
<doc>
<sub>
text
</sub>
</doc>
I can't find a way to keep the eol / whitespace between <doc> and <sub> for instance. Though parse_ws_pcdata do keep the eols within <sub></sub>.
When traversing the tree resulting from such an eol supporting parse, I would expect to see a PCDATA node as first child of <doc> before its sibling child element <sub>. That would allow me to adjust whatever eol might be in there in order to keep only one, which is one of the transformations steps I need to apply before output of the canonicalized text.
To overcome any unwanted behavior when outputting canonicalized content, I can write my own output code from the tree traversal, albeit with the lost eol as a beginning, it is getting nowhere.
The text was updated successfully, but these errors were encountered:
omascia
changed the title
As an option, allow parsing all eol into the document model (as PCDATA holding those)
XML Canonicalization: as an option, allow parsing eol's into the document model (as PCDATA holding those)
Dec 11, 2023
You're right! :)
I was mislead with my real test which had PI too. Looks like there are no PCDATA kept in between PI and elements or between multiple successive PIs. I will re-assemble a more complete and real-life sample and build from there. Anyway, I overlooked the side issue that I will not be able to control whitespace like eol around closing tags. The whole idea of relying on a DOM parser to modify the content (namespace updates and attributes sorting) and then output the required canonicalized form, is probably wrong.
This issue should be closed I think.
Thank you Arseny.
Facing the need to implement XML Canonicalization (for XML Signature validation and building), I cannot easily reuse pugixml for the basic steps because the parser seems to always remove eol, unless from identified PCDATA contents when parse_ws_pcdata has been set.
I can't find a way to keep the eol / whitespace between
<doc>
and<sub>
for instance. Though parse_ws_pcdata do keep the eols within<sub></sub>
.When traversing the tree resulting from such an eol supporting parse, I would expect to see a PCDATA node as first child of
<doc>
before its sibling child element<sub>
. That would allow me to adjust whatever eol might be in there in order to keep only one, which is one of the transformations steps I need to apply before output of the canonicalized text.To overcome any unwanted behavior when outputting canonicalized content, I can write my own output code from the tree traversal, albeit with the lost eol as a beginning, it is getting nowhere.
The text was updated successfully, but these errors were encountered: