Consider changing fragment serialization to be compatible with current implementations #944

Closed
smaug---- opened this Issue Mar 25, 2016 · 27 comments

Projects

None yet

9 participants

@smaug----
Collaborator

See https://bugzilla.mozilla.org/show_bug.cgi?id=1259723

I don't have strong opinion on this. Gecko and Blink at least seem to have different behavior than what the spec defines, and they have had that for a long time. So changing it might be risky.
But then, maybe Edge is following the spec here without compatibility issues?

@annevk
Member
annevk commented Mar 26, 2016

Test by @zcorpan: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3504. I don't have Edge. Anyone?

The specification here seems much better given that it doesn't require modification of the DOM to get something that roundtrips. But if nobody wants to fix this...

@domenic
Member
domenic commented Mar 30, 2016

Edge logs "FAIL"

@annevk
Member
annevk commented Mar 31, 2016

Okay, so what do we think? Do we make roundtripping worse to be compatible with implementations or hope one of the implementers is brave enough to fix this? I guess I'm happy to align with implementations, even though it's crappy. The specification did try for a bit.

@smaug----
Collaborator

Have web devs asked for the change (in implementations)? I couldn't find any related bug in bmo
(except https://bugzilla.mozilla.org/show_bug.cgi?id=1259723).

@zcorpan
Member
zcorpan commented Mar 31, 2016

I found this: https://phabricator.wikimedia.org/T44667

A serializer must emit an additional newline if the pre content (without the optional newline) in the DOM starts with a newline. Sadly, the innerHTML method in modern browsers does not seem to implement this, so we have to work around it.

@zcorpan zcorpan added a commit to w3c/web-platform-tests that referenced this issue Mar 31, 2016
@zcorpan zcorpan Test initial LF in pre/textarea/listing with innerHTML
The HTML spec requires this for proper round-tripping, but only
Presto has implemented it so far.

Ref. whatwg/html#944
46b7917
@zcorpan
Member
zcorpan commented Mar 31, 2016

Tests for the spec's current behavior at w3c/web-platform-tests#2734

@domenic
Member
domenic commented Mar 31, 2016

so we have to work around it.

This kind of implies to me that if browsers did try to fix this, they'd start breaking code.

@zcorpan
Member
zcorpan commented Mar 31, 2016

That's a risk, but it could also fix many more sites that are currently "broken" because they don't work around it.

It seems like the Wikipedia people wanted the browsers to fix this; the Chromium bug blocks https://bugs.chromium.org/p/chromium/issues/detail?id=463348

Also this behavior is shipped in Opera 12 and I couldn't find any bug reports about anything being broken. I don't know where in Wikipedia the difference would manifest itself. I tried previewing <pre>\n\nx</pre> in my Wikipedia "sandbox" in Opera 12 but that didn't have too many linebreaks...

cc @Krinkle

@Krinkle
Member
Krinkle commented Mar 31, 2016

@zcorpan The sandbox test you did most likely did not trigger the relevant code path. The wikitext markup is currently parsed server-side and transformed into HTML and then cached/served to readers as-is. There is no interaction with a DOM there.

This serialisation bug affected Wikipedia in its WYSIWYG editor (VisualEditor; demo).

The Node.js server (Parsoid) that converts wikitext into an annotated HTML5 DOM for visual editing currently works around this bug to ensure newlines are preserved. It works around it by adding a data attribute storing the fact that one new line must be prepended to the contents upon reserialisation (example; downstream bug report).

Since this logic is universally applied; Yes, I would expect a change in this regard to cause a regression.

We might be able to move the workaround client-side instead of server-side. Then we'd have to do a "feature test" to see if the current browser exhibits the buggy behaviour or not. Though it'd be a bit painful given it'd be synchronous DOM interaction just for an old silly bug. But then again, a "feature flag" seems rather overkill for something like this.

There's also something swirling in my head that suggests maybe this only affects parsing of entire HTML documents (e.g. during navigation or via DOMParser), and not via Element#innerHTML. Not sure if that's true or possible in current browsers.

@catrope
catrope commented Apr 1, 2016

Let me clarify a few things.

First of all, the spec didn't always say what it says currently. Until about 2009, it required the behavior that browsers implement now. Then someone realized this caused problems, so they changed the spec, but the change was clearly wrong and inconsistent so no browser implemented it. In 2013, the spec was changed to what it is now. I don't have links handy for this change (because it was made right before I tried to report the issue), but IIRC it was in late January 2013.

Secondly, T44667 is a red herring, because it's only about an edge case involving zero newlines vs one newline, which is not directly related to this bug. The important code working around this bug is already client-side and already uses feature detection, so it won't regress if browsers are fixed (we already need this for Opera 12). See wikimedia/VisualEditor/blob/master/src/ve.utils.js#L1021

As for innerHTML, that was definitely affected last I checked, because you can run foo.innerHTML = foo.innerHTML; multiple times and that'll eat a newline each time you run it.

@zcorpan
Member
zcorpan commented Apr 1, 2016

Thank you!

In baba6fa the spec's serialization was changed from always emitting the newline to only doing so if the data starts with a newline. I suppose this was around the time it was implemented in Presto.

So if I understand correctly, Wikipedia works around this, but won't regress if browsers change to match the spec because it already bug-checks to deal with Opera 12? (I tested @Krinkle's links in Opera 12 and they appeared to work the same between Opera 12 and Chromium-based Opera for <pre>\n\nx</pre>.)

@zcorpan
Member
zcorpan commented Apr 5, 2016

Given the information so far I suggest we leave the spec as is. Please reopen if new issues come up.

@zcorpan zcorpan closed this Apr 5, 2016
@annevk
Member
annevk commented Apr 5, 2016

@zcorpan is there a single implementer willing to change though?

@jdm
jdm commented Apr 27, 2016

https://bugzilla.mozilla.org/show_bug.cgi?id=838954 is on the list of things we're intending to work on soon.

@jdm jdm added a commit to w3c/web-platform-tests that referenced this issue Apr 28, 2016
@zcorpan @jdm zcorpan + jdm Test initial LF in pre/textarea/listing with innerHTML (#2734)
* Test initial LF in pre/textarea/listing with innerHTML

The HTML spec requires this for proper round-tripping, but only
Presto has implemented it so far.

Ref. whatwg/html#944

* fixup! Test initial LF in pre/textarea/listing with innerHTML

Address @jdm's comment.
7234d01
@ivanzr ivanzr added a commit to ivanzr/web-platform-tests that referenced this issue Jun 29, 2016
@zcorpan @ivanzr zcorpan + ivanzr Test initial LF in pre/textarea/listing with innerHTML (#2734)
* Test initial LF in pre/textarea/listing with innerHTML

The HTML spec requires this for proper round-tripping, but only
Presto has implemented it so far.

Ref. whatwg/html#944

* fixup! Test initial LF in pre/textarea/listing with innerHTML

Address @jdm's comment.
a2ad330
@smaug----
Collaborator

@zcorpan Do you know whether blink would be willing to make the change?
There is now a patch for Gecko to enable the spec behavior in Nightly builds.

@smaug----
Collaborator

http://dev.ckeditor.com/ticket/14814#ticket
So, this might not be web compatible enough.

@zcorpan zcorpan reopened this Sep 19, 2016
@zcorpan
Member
zcorpan commented Sep 20, 2016

Thanks. Without coordinating shipping this change in multiple browsers at the same time, I suppose this is unlikely to be successful. Do people still want to try to push this through or should we revert the spec and advice Web developers to insert the LF themselves if they want innerHTML to roundtrip?

cc @tkent-google @cdumez @travisleithead

@cdumez
Contributor
cdumez commented Sep 20, 2016

Seems like a risky change, especially given @smaug---- 's feedback.

@catrope
catrope commented Sep 20, 2016

It sounds like ckeditor was not expecting spec-compliant behavior from browsers, because almost all of them behave in the same non-compliant way. This probably means ckeditor breaks in Opera 12, but few people use that any more.

Wikipedia's visual editor works in both cases, because it detects whether the browser is compliant or eats newlines, and adjusts its behavior accordingly. We did this for future-proofing but also for compatibility with Opera 12.

@smaug----
Collaborator

FWIW, we're backing out the change in Nightly, and feels like we can't do the change ever unless every other browser do it also around the same time.

@zcorpan zcorpan self-assigned this Sep 23, 2016
@zcorpan zcorpan added a commit that referenced this issue Sep 23, 2016
@zcorpan zcorpan Don't serialize an extra LF in <pre>, <textarea>, <listing>
This was implemented in Presto in ~2012, and recently implemented
in Gecko, but it broke CKEditor (http://dev.ckeditor.com/ticket/14814#ticket)
so it is being backed out again in Gecko.

Fixes #944.
7a568a4
@zcorpan
Member
zcorpan commented Sep 23, 2016

PR to change the spec: #1815
PR to fix test: w3c/web-platform-tests#3814
I also resolved/commented the issues for Chromium, WebKit and Edge.

@domenic domenic closed this in #1815 Sep 24, 2016
@domenic domenic added a commit that referenced this issue Sep 24, 2016
@zcorpan @domenic zcorpan + domenic Don't serialize an extra LF in <pre>, <textarea>, <listing>
This was implemented in Presto in ~2012, and recently implemented
in Gecko, but it broke CKEditor (http://dev.ckeditor.com/ticket/14814#ticket)
so it is being backed out again in Gecko.

Fixes #944.
2aa0000
@cscott
Contributor
cscott commented Oct 18, 2016

It's quite sad that HTML doesn't round trip now. :(

Speaking from the Wikimedia side, we'd be much happier if we could always parse(serialize(doc)) and get back the same document. Previously only Visual Editor was affected by this, since on server-side the domino serialization implementation was spec-compliant. Now we won't be able to rely on "standard" serialization behavior on either server or client side. :(

@cscott cscott added a commit to fgnass/domino that referenced this issue Oct 18, 2016
@cscott cscott Remove newline in serialization of <pre>, <listing>, <textarea>.
This means that conforming HTML doesn't cleanly roundtrip when
serialized and reparsed, which is quite unfortunately.

Regardless: whatwg/html#944
c6e6daa
@zcorpan
Member
zcorpan commented Oct 18, 2016

It is indeed sad.

Note that what you do on the server side is up to you, you don't have to follow the spec. You could for example have non-compliant parser and serializer that roundtrips a lot better than the spec does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment