Why remove CDataSection? #142

Harbs · 2016-01-05T12:14:55Z

I'm in the process of implementing an XML processing library and I see the CDataSection and Node.CDATA_SECTION_NODE is being removed. What is the rationale for that decision?

I do not see any other way of determining that an XML node is CDATA. I need to know whether a node is plain text or CDATA to handle it correctly.

annevk · 2016-01-05T12:40:43Z

It might make its way back because it was too hard to remove from browsers, but why would you need to distinguish between these at the data level?

Harbs · 2016-01-05T13:16:10Z

Huh? How else is one to preserve the intent of the XML content?

Basically, I'm implementing E4X support for cross-compiled ActionScript. (The compiler is mapping E4X features not compatible with standard javascript to standard function calls.) Plain text and CDATA content are handled differently in various places internally. I can give more details if you need it, but I'm trying to even understand what the rationale was to try to remove it.

Why get rid of potentially useful information?

annevk · 2016-01-05T13:39:50Z

CDATA sections are just a "convenient" way to write text in XML, treating them as a distinct type is a bug.

Harbs · 2016-01-05T13:51:55Z

Not exactly. CDATA text is not encoded while plain text is. Whether it's an entirely distinct type or there's some meta-data describing the content, maintaining the distinction is important.

There are applications where CDATA in XML files are NOT the same as escaped string content. Round-tripping XML files in those cases would produce incorrect output.

I do agree that it's not a fully distinct type, but simply losing the information would be very bad for XML processing.

domenic · 2016-01-05T13:53:06Z

Maybe the misconception here is that you think the DOM is supposed to support round-tripping? It in general does not.

Harbs · 2016-01-05T13:59:59Z

Why is this different than handling of white-space? XML parsing in the DOM is very funky to attempt to preserve the white-space. https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Whitespace_in_the_DOM

I don't see why CDATA should be any less important.

annevk · 2016-01-05T14:04:08Z

Because CDATA is syntax, not content. See e.g., the definition of canonical XML. Anyway, it doesn't really matter, we most likely have to reintroduce CDATA nodes due to web content.

Harbs · 2016-01-05T14:16:24Z

IIUC, some of the white-space handling can be considered "syntax" as well. "Syntax" becomes "content" when the content of the syntax is important.

Either way, if current DOM3 behavior is future-proof, I'm happy leaving this discussion alone.

ArkadiuszMichalski · 2016-01-06T01:42:03Z

@Harbs There was the idea to expose boolean attribute Text.serializeAsCDATA so after removing CDATASection (as unnecessary type of node) you could still use functionality of CDATASection by using Text node (old P&S spec https://rawgit.com/whatwg/domparsing/edc795ccfdc03e396197bf81a0f550105930e90b/source.html). But it ended up being just an idea.
Next infos you can get on this still open bug (or others internal cited bugs) https://www.w3.org/Bugs/Public/show_bug.cgi?id=27386

travisleithead · 2016-01-06T02:18:58Z

If we bring back CDATASection in DOM, then I'm happy to re-add that API to produce a serialization that includes CDATA.

Harbs · 2016-01-06T08:07:25Z

Thanks for that link. Reading through that bug, it looks like making it a Text node would have the problem of merging adjacent non-CDATA nodes during normalization. I don't see how serializeAsCDATA would help for that.

Based on what I've read here, I'm working under the assumption that the following will be future-proof:

nodeType == 4 (CDATA_SECTION_NODE) is a valid check for CDATA
the [CDATA node].data is exactly the contents contained within the original CDATA tags
CDATA will not be merged with adjacent text nodes

If any of these assumptions are wrong, please let me know. Even if changing CDATA does not break things now, the chances of things breaking after the next release of FlexJS will be much higher. I believe that E4X will be a very popular feature of FlexJS.

annevk · 2016-01-06T08:42:21Z

It's still not clear to me why you think it's semantically distinct from text. I haven't checked with others yet if they also consider it a lost cause to remove it.

Harbs · 2016-01-06T10:05:41Z

I was looking for a good practical example, and this is the only one I came up with at the moment, but I do think it's valid.

Take a look at this (Flash) web app here: http://www.radii8.com/demo/ It's basically an online design tool for building apps. One of the formats that it supports is MXML. Script blocks within MXML have to be contained within CDATA blocks. I have never tried to see whether script blocks as XML encoded text (as opposed to CDATA blocks) will compile, but trying to edit the file with all the code converted into gobbledygook would be a nightmare. So, even though from an XML structure perspective CDATA is syntax, in this case, the "syntax" clearly becomes "content".

One of our goals with FlexJS is to minimize the pain of porting Flash apps to HTML apps. Flash apps in general tend to do a lot more XML processing than HTML ones (probably because of the native E4X support). Hence preserving XML processing native to E4X is an important goal. I have no data on how prevalent CDATA blocks are in XML files, but if preserving legibility of the files is important (and I think it is), preserving CDATA blocks is important.

I hope this is helpful in explaining where I'm coming from...

Harbs · 2016-01-06T14:39:33Z

FWIW, another data point:
It seems like E4X treats CDATA as content. The CDATA node when parsed in E4X is considered a normal text node, but it includes the CDATA tags in the text. The E4X spec is a bit vague on this point, but I do think that's the intention and that's definitely the implementation as I've observed it in ActionScript. I tried to test what the implementation was in SpiderMonkey (which is still in use in Adobe's ExtendScript), but I was having some problems getting my tests to work.

annevk · 2016-03-14T13:06:17Z

Duping this against https://www.w3.org/Bugs/Public/show_bug.cgi?id=27386.

annevk closed this as completed Mar 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why remove CDataSection? #142

Why remove CDataSection? #142

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

domenic commented Jan 5, 2016

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

ArkadiuszMichalski commented Jan 6, 2016

travisleithead commented Jan 6, 2016

Harbs commented Jan 6, 2016

annevk commented Jan 6, 2016

Harbs commented Jan 6, 2016

Harbs commented Jan 6, 2016

annevk commented Mar 14, 2016

Why remove CDataSection? #142

Why remove CDataSection? #142

Comments

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

domenic commented Jan 5, 2016

Harbs commented Jan 5, 2016

annevk commented Jan 5, 2016

Harbs commented Jan 5, 2016

ArkadiuszMichalski commented Jan 6, 2016

travisleithead commented Jan 6, 2016

Harbs commented Jan 6, 2016

annevk commented Jan 6, 2016

Harbs commented Jan 6, 2016

Harbs commented Jan 6, 2016

annevk commented Mar 14, 2016