New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDATA should not be treated as markup #82
Comments
Sorry for the late answer! I am not very fond of the idea:
I'm happy to be convinced otherwise but I'm not sure this is the right thing to do. |
Nothing to apologise for. We all have stuff to do. Yeah, you are wise to not be fond of the idea, because I stated it wrong. There was some confusion in my mind as to what 'markup' actually means. Let me restate my suggestion in the terminology of
In terms of I have not been aware that CDATA is deprecated. I can use the link you send me to argue that my parser doesn need to support it either. So, thanks for that. I am gladly dropping the requirement that CDATA should be supported through How do you think about dropping the |
This is only about HTML and DOM, while quick-xml targets XML. Many things will break if you don’t handle CDATA correctly, on the web or elsewhere. |
Just to clear up a possible misunderstanding: |
In this case I think we could even have an external crate to parse CData, (so other xml library may use it) and it'll be an opt-in feature. |
@tafia So, I would just call a decode method implemented by the external crate instead of the one provided by quick_xml? |
Yes for the time being at least I think it is better to iterate this way. Is this a problem? I am not clear yet how it should look exactly on your API side. Please let this issue opened so if anyone wants to help or is looking for the same feature they will have some info. |
I think providing content of CDATA as it is, is a correct behavior for an XML parser. XML parser decodes XML content while reading. The data in CDATA is already in the raw format, so no need for decoding. So
I think changing By the way, I'm new to this library. I have only read some source code and tests to reason about it. Sorry if I misunderstood something. |
@fatihpense What is the use case for differentiating between CDATA and 'normal' encoding? I am always interested in the final decoded text. I think it is a bit of a pain, what I have to handle an arbitrary amount of CDATA events anytime I expect character data. Yet I see that my suggestion would ruin any use case which would require differentiating between the two encoding styles. |
@pacman82 I work on enterprise integrations. We do a lot of XML mapping and people can put meaning&business logic in every edge of the spec. Rarely we even deal with invalid XML. So I think it is better to give more power to the library user when it is possible. In Java SAX, this situation is solved by two interfaces. ContentHandler, which gives you characters as Edit: ContentHandler doesn't guarantee giving character data in one function call even there is no CDATA(I think that is because of performance). So in your "push parsing" SAX code you get chardata part, startCDATA, chardata part, endCDATA. I understand you need convenience over configuration for most of the tasks. I think your needs are better served as another library or new easy-mode methods in the same library. |
@fatihpense Thank you for explaining your use case. While I do think that it is an invalid use of XML to give semantic meaning to the way character data in XML is masked I can not deny that such things may happen. |
How would you feel about making quick-xml not omitting CDATA
Event
s? Instead I would suggest handling CDATA as part ofdecode()
. After all writing:<![CDATA[<example>]]>
is just another way of writing<example>
. To make matters worse the to approaches may be mixed<exa<![CDATA[mple>]]>
. With the current API it is really hard to get this right.The text was updated successfully, but these errors were encountered: