Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] Question re white space processing rules for U+000D #855

Closed
jfkthame opened this issue Jan 5, 2017 · 10 comments · Fixed by #3106
Closed

[css-text] Question re white space processing rules for U+000D #855

jfkthame opened this issue Jan 5, 2017 · 10 comments · Fixed by #3106
Assignees
Labels

Comments

@jfkthame
Copy link
Contributor

jfkthame commented Jan 5, 2017

AFAICT from https://drafts.csswg.org/css-text-3/#white-space-processing, a lone CR (U+000D) character should be treated just like a lone LF (U+000A) or a CRLF pair: it is a segment break, which will be transformed to a preserved line feed, removed, or transformed to a space (U+0020), depending on the value of white-space and possibly the context of the segment break.

However, none of the browsers I have tested so far (Firefox, Chrome, Safari, Edge) appear to behave this way; rather, they all discard the lone CR.

Testcase: https://people-mozilla.org/~jkew/tests/cr.html

Am I misunderstanding something here, should the spec be changed to better match actual behavior, or do we expect all the browsers to change to match the spec?

@upsuper
Copy link
Member

upsuper commented Jan 6, 2017

(cc myself)

@dauwhe dauwhe added the css-text-3 Current Work label Jan 6, 2017
@fantasai
Copy link
Collaborator

Probably we should make the spec match the implementations here.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Question re white space processing rules for U+000D, and agreed to the following resolutions:

  • RESOLVED: Accept the change in https://github.com/w3c/csswg-drafts/issues/855
The full IRC log of that discussion <dael> Topic: Question re white space processing rules for U+000D
<dael> Github topic: https://github.com//issues/855
<dael> Rossen: Is dauwhe on?
<dael> fantasai: Rossen can we have WD resolution?
<dael> Rossen: Oh, yes.
<fantasai> TabAtkins, would you mind adding the at-risk thing and pushing to /TR?
<dael> Rossen: Is anyone prepared to talk on this?
<dael> TabAtkins: I can do it based on thread.
<dael> TabAtkins: According to the text spec a lone U-000D is the same as a proper line break.
<dael> TabAtkins: Impl don't do that, they dropit frfom white space processing.
<dael> TabAtkins: There's a test case in the issue. It appears it's consistant. We should prob change spec to match impl.
<dael> Florian: To clarify, is that the one used to be used before OS10?
<dael> TabAtkins: Someone used it at some point.
<dael> fantasai: Yes.
<dael> Florian: So it's not likely something on the rise, it's t he other way around.
<dael> ?? Matching imple i s good idea.
<myles> s/??/myles/
<plinss> s/one used/one macs used/
<dael> dbaron: This is about carrage returns getting to the CSS level. Classic mac could have processed before.
<zcorpan> or textContent = "foo\rbar"
<zcorpan> html parser normalizes
<dael> TabAtkins: I don't think HTML parser does magic.
<dael> gsnedders: It repleaces them with [missed]
<dael> TabAtkins: Nevermind. He's just using entitities in text example.
<gsnedders> s/[missed]/line feed (000A)/
<dael> s/text/test
<myles> s/[missed]/line feeds/
<dael> Florian: Then go for it.
<dbaron> s/carrage/carriage/
<dael> Florian: Make the suggested change to match impl.
<dael> Rossen: Okay. Any objections to accepting the change?
<dael> RESOLVED: Accept the change in https://github.com//issues/855

@fantasai
Copy link
Collaborator

fantasai commented Mar 5, 2018

So, as I was working on this I realized that making a character invisible is underdefined: what effect does it have on surrounding characters? Does it break joining? Afaict, Chrome drops form feeds and carriage returns from rendering entirely, whereas Gecko treats them similar to zwsp.

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%D8%B4%D8%AA%0AVA%3Cbr%3E%0A%D8%B4%26%23x000c%3B%D8%AA%0AV%26%23x000c%3BA%3Cbr%3E%0A%D8%B4%26%23x000d%3B%D8%AA%0AV%26%23x000d%3BA%3Cbr%3E%0A%D8%B4%26%23x200b%3B%D8%AA%0AV%26%23x200b%3BA%0A%0A

What do we want to do here?

@css-meeting-bot
Copy link
Member

The Working Group just discussed Lone CRs, and agreed to the following resolutions:

  • RESOLVED: Treat lone CRs as 0 width spaces
The full IRC log of that discussion <fantasai> Topic: Lone CRs
<fantasai> https://drafts.csswg.org/css-text-3/issues-lc-2013#issue-138
<fantasai> github: https://github.com//issues/855
<dael> fantasai: Jonathan Kew reports a long cr is just like a lone line feed and it's a segment break. Browsers discared the carraige returns. If you put them in your source they're transformed. However if you inject it via JS into the dom text content then it disappears.
<dael> fantasai: If they used dthml and not escape it would happen. We accepted the change and then I realized that making a character invisible is under defined.
<dael> myles: Can we pretend it never existed?
<fantasai> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%D8%B4%D8%AA%0AVA%3Cbr%3E%0A%D8%B4%26%23x000c%3B%D8%AA%0AV%26%23x000c%3BA%3Cbr%3E%0A%D8%B4%26%23x000d%3B%D8%AA%0AV%26%23x000d%3BA%3Cbr%3E%0A%D8%B4%26%23x200b%3B%D8%AA%0AV%26%23x200b%3BA%0A%0A
<dael> fantasai: Or treat similar to 0 width
<dael> myles: For kerning you want it not to be similar to 0 width
<dael> fantasai: I think this is a weird edge case.
<dael> myles: You're saying because it's an edge case we should do the simple thing.
<dael> fantasai: For pages authored with crs they're taking care of by html parsing algo. This is just people who stick a character inside.
<dael> fantasai: gecko treats like 0 space and chrome drops. Someone help me decide between the 2 behaviors.
<dael> florian: Dropping seems simplier.
<dael> myles: Harder to impl because changes size of data structire
<dael> koji: Difference between drop and 0 width?
<dael> fantasai: There's a test case.
<dael> myles: In an impl where strings are never copied you'd have to removet he character and shift the future to pretend it doesn't happen.
<dael> astearns: We have 1 browser with 0 width and another inclined.
<dael> fantasai: fremy what does edge do?
<dael> fremy: I can't load page.
<dael> eae: We wouldn't object.
<dael> myles: There are other characters treated as 0 width.
<dael> astearns: fremy do you care?
<dael> fremy: I don't know what Edge does.
<dael> fantasai: There's an IRC test case.
<dael> fremy: It's so broken we need to fix it.
<dael> astearns: Prop: Treat lone CRs as 0 width spaces
<dael> RESOLVED: Treat lone CRs as 0 width spaces

@frivoal
Copy link
Collaborator

frivoal commented Apr 13, 2018

I was talking with @FremyCompany after this discussion, and we were thinking that this resolution is inconsistent with #1990 since we've decided that control characters should be visible, then applying the same treatment for this one as well could make sense.

@frivoal frivoal added the Agenda+ label Jun 6, 2018
@frivoal
Copy link
Collaborator

frivoal commented Jun 6, 2018

Agenda+ to follow up on the previous comment, and see if we want to resolve the inconsistency between the resolution in this issue and the one in #1990

fantasai added a commit that referenced this issue Jun 7, 2018
@css-meeting-bot
Copy link
Member

The Working Group just discussed form feeds and carriage returns, and agreed to the following:

  • RESOLVED: revert the previous lone cr resolution, and treat it as any control character
The full IRC log of that discussion <fantasai> topic: form feeds and carriage returns
<fantasai> github: https://github.com//issues/855
<frremy> florian: In Berlin, we made two conflicting resolutions without noticing
<frremy> florian: frremy noticed after, and raised that
<frremy> florian: one is that we should render control characters
<frremy> florian: two is that cr should not be rendered
<frremy> florian: that is not very consistent, we didn't need to specialize cr given the first resolution
<frremy> florian: no strong opinion, but we could change this
<fantasai> frremy: Edge already does render the CR block
<fantasai> frremy: I thought rndering of chars was not enabled, but Rossen enabled it, but ...
<fantasai> frremy: I'm proposing to move to Edge behavior, consistent with first resolution
<frremy> myles: is there a proposal?
<fantasai> florian: Proposal is discard resolution about treating CRs specially, treat them just like any other control character
<frremy> florian: rescede the resolution we made for lone CR
<frremy> florian: then the resolution we made for control chars will apply
<frremy> astearns: i like the consistent behavior
<frremy> heycam: I'm fine with that
<frremy> florian: is there anybody objecting?
<frremy> (no objection)
<frremy> RESOLVED: revert the previous lone cr resolution, and treat it as any control character
<frremy> dbaron: we don't think there is much content like this?
<frremy> myles: no, because it would be converted by the html parser
<frremy> myles: so the only way would be a script doing that
<frremy> myles: I haven't seen it
<frremy> florian: does the javascript parser also does that?
<frremy> frremy: you can't have line breaks in string
<frremy> xidorn: but with backticks you can
<frremy> florian: nobody is going to be writing es6 code on a OS 8 mac
<frremy> dbaron: OS 9 does something even different
<frremy> (general agreement it should be rare enough)

frivoal added a commit to frivoal/wpt that referenced this issue Apr 23, 2019
frivoal added a commit to web-platform-tests/wpt that referenced this issue Apr 23, 2019
@frivoal frivoal added the Tested Memory aid - issue has WPT tests label Apr 25, 2019
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 5, 2019
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443
mykmelez pushed a commit to mykmelez/gecko that referenced this issue Jun 6, 2019
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443
marcoscaceres pushed a commit to web-platform-tests/wpt that referenced this issue Jul 23, 2019
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…visible, a=testonly

Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389)

Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855
--

wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f
wpt-pr: 13389

UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…visible, a=testonly

Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389)

Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855
--

wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f
wpt-pr: 13389

UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…visible, a=testonly

Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389)

Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855
--

wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f
wpt-pr: 13389

UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 4, 2019
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443

UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 4, 2019
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443

UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 4, 2019
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443

UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
moz-wptsync-bot pushed a commit to web-platform-tests/wpt that referenced this issue Aug 8, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188

bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1657437
gecko-commit: b579c12907dc3bb210177454a696ccc181c1ded0
gecko-integration-branch: autoland
gecko-reviewers: jfkthame
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 8, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188
moz-wptsync-bot pushed a commit to web-platform-tests/wpt that referenced this issue Aug 9, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188

bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1657437
gecko-commit: b579c12907dc3bb210177454a696ccc181c1ded0
gecko-integration-branch: autoland
gecko-reviewers: jfkthame
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Aug 16, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188

UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Aug 16, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188

UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Aug 16, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188

UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
ambroff pushed a commit to ambroff/gecko that referenced this issue Nov 4, 2020
That prevents preceding whitespace from getting collapsed.

When there's a single lone CR (so `a\rb`) our behavior here diverges
from Chrome's but matches Safari's. We treat it as ZWSP.

That matches the initial resolution of [1], but then there have been
various doing and undoings of that resolution, so it's not totally clear
to me what the correct behavior per spec should be. I think "treat it as
other control character"? But I haven't dug into what that implies, so
for now I've just kept behavior there as-is.

[1]: w3c/csswg-drafts#855

Differential Revision: https://phabricator.services.mozilla.com/D86188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants