-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge text and CDATA events in serde deserializer #474
Comments
I made some experiments with XmlBeans 5.0.0 -- a popular Java library to work with XML. Use the following XSD: <xs:schema xmlns:this="types.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="types.xsd"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
>
<xs:element name="Str" type="xs:string"/>
</xs:schema> It skips comments and processing instructions and merge texts and CDATA sections, as suggested in the issue description. All white spaces are significant (namespace definition
|
Unfortunately, this is not the easy task, because of trim feature, that is activated for serde deserializer. That means, that spaces between CDATA section and text will be trimmed, and it is not easely to fix that, because to do that correctly, we need to lookahead at infinity depth to solve such situations: text
<!--comment 1-->
<!--comment 2-->
...
<!--comment N--><![CDATA[cdata section]]> We should not strip between text and CDATA, but should trim between text and tag. Because comments should not change the content of document, that document is equivalent to: text
...
<![CDATA[cdata section]]> ("text" + N newlines + "cdata section"). Probably solving #460 first will make that easier to implement. |
CDATA elements cannot contain sequence
]]>
. When that sequence is appeared in the data, it should be split into two pieces and each piece should be put in their own CDATA container:become
or
Currently in serde deserializer only one CDATA event processed at time, that means, that deserialization
into
would fail or wrongly return
]]
instead of]]>
.To fix that we should merge CDATA events, that there are some ambiguities that should be investigated:
<![CDATA[one]]>two
onetwo
?onetwo
?onetwo
?Currently all comments are skips at very early stage and deserializer sees
<![CDATA[one]]><![CDATA[two]]>
onetwo
?onetwo
?Currently all processing instructions are skips at very early stage and deserializer sees
<![CDATA[one]]><![CDATA[two]]>
onetwo
?<![CDATA[one]]> two
onetwo
?The text was updated successfully, but these errors were encountered: