Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] Question re white space processing rules for U+000D #855

Closed
jfkthame opened this issue Jan 5, 2017 · 10 comments

Comments

Projects
None yet
7 participants
@jfkthame
Copy link

commented Jan 5, 2017

AFAICT from https://drafts.csswg.org/css-text-3/#white-space-processing, a lone CR (U+000D) character should be treated just like a lone LF (U+000A) or a CRLF pair: it is a segment break, which will be transformed to a preserved line feed, removed, or transformed to a space (U+0020), depending on the value of white-space and possibly the context of the segment break.

However, none of the browsers I have tested so far (Firefox, Chrome, Safari, Edge) appear to behave this way; rather, they all discard the lone CR.

Testcase: https://people-mozilla.org/~jkew/tests/cr.html

Am I misunderstanding something here, should the spec be changed to better match actual behavior, or do we expect all the browsers to change to match the spec?

@upsuper

This comment has been minimized.

Copy link
Member

commented Jan 6, 2017

(cc myself)

@dauwhe dauwhe added the css-text-3 label Jan 6, 2017

@fantasai fantasai added the Agenda+ label Apr 19, 2017

@fantasai

This comment has been minimized.

Copy link
Collaborator

commented Apr 19, 2017

Probably we should make the spec match the implementations here.

@css-meeting-bot

This comment has been minimized.

Copy link
Member

commented May 10, 2017

The CSS Working Group just discussed Question re white space processing rules for U+000D, and agreed to the following resolutions:

  • RESOLVED: Accept the change in https://github.com/w3c/csswg-drafts/issues/855
The full IRC log of that discussion <dael> Topic: Question re white space processing rules for U+000D
<dael> Github topic: https://github.com//issues/855
<dael> Rossen: Is dauwhe on?
<dael> fantasai: Rossen can we have WD resolution?
<dael> Rossen: Oh, yes.
<fantasai> TabAtkins, would you mind adding the at-risk thing and pushing to /TR?
<dael> Rossen: Is anyone prepared to talk on this?
<dael> TabAtkins: I can do it based on thread.
<dael> TabAtkins: According to the text spec a lone U-000D is the same as a proper line break.
<dael> TabAtkins: Impl don't do that, they dropit frfom white space processing.
<dael> TabAtkins: There's a test case in the issue. It appears it's consistant. We should prob change spec to match impl.
<dael> Florian: To clarify, is that the one used to be used before OS10?
<dael> TabAtkins: Someone used it at some point.
<dael> fantasai: Yes.
<dael> Florian: So it's not likely something on the rise, it's t he other way around.
<dael> ?? Matching imple i s good idea.
<myles> s/??/myles/
<plinss> s/one used/one macs used/
<dael> dbaron: This is about carrage returns getting to the CSS level. Classic mac could have processed before.
<zcorpan> or textContent = "foo\rbar"
<zcorpan> html parser normalizes
<dael> TabAtkins: I don't think HTML parser does magic.
<dael> gsnedders: It repleaces them with [missed]
<dael> TabAtkins: Nevermind. He's just using entitities in text example.
<gsnedders> s/[missed]/line feed (000A)/
<dael> s/text/test
<myles> s/[missed]/line feeds/
<dael> Florian: Then go for it.
<dbaron> s/carrage/carriage/
<dael> Florian: Make the suggested change to match impl.
<dael> Rossen: Okay. Any objections to accepting the change?
<dael> RESOLVED: Accept the change in https://github.com//issues/855
@fantasai

This comment has been minimized.

Copy link
Collaborator

commented Mar 5, 2018

So, as I was working on this I realized that making a character invisible is underdefined: what effect does it have on surrounding characters? Does it break joining? Afaict, Chrome drops form feeds and carriage returns from rendering entirely, whereas Gecko treats them similar to zwsp.

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%D8%B4%D8%AA%0AVA%3Cbr%3E%0A%D8%B4%26%23x000c%3B%D8%AA%0AV%26%23x000c%3BA%3Cbr%3E%0A%D8%B4%26%23x000d%3B%D8%AA%0AV%26%23x000d%3BA%3Cbr%3E%0A%D8%B4%26%23x200b%3B%D8%AA%0AV%26%23x200b%3BA%0A%0A

What do we want to do here?

@fantasai fantasai removed the Needs Edits label Mar 5, 2018

@css-meeting-bot

This comment has been minimized.

Copy link
Member

commented Apr 11, 2018

The Working Group just discussed Lone CRs, and agreed to the following resolutions:

  • RESOLVED: Treat lone CRs as 0 width spaces
The full IRC log of that discussion <fantasai> Topic: Lone CRs
<fantasai> https://drafts.csswg.org/css-text-3/issues-lc-2013#issue-138
<fantasai> github: https://github.com//issues/855
<dael> fantasai: Jonathan Kew reports a long cr is just like a lone line feed and it's a segment break. Browsers discared the carraige returns. If you put them in your source they're transformed. However if you inject it via JS into the dom text content then it disappears.
<dael> fantasai: If they used dthml and not escape it would happen. We accepted the change and then I realized that making a character invisible is under defined.
<dael> myles: Can we pretend it never existed?
<fantasai> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%D8%B4%D8%AA%0AVA%3Cbr%3E%0A%D8%B4%26%23x000c%3B%D8%AA%0AV%26%23x000c%3BA%3Cbr%3E%0A%D8%B4%26%23x000d%3B%D8%AA%0AV%26%23x000d%3BA%3Cbr%3E%0A%D8%B4%26%23x200b%3B%D8%AA%0AV%26%23x200b%3BA%0A%0A
<dael> fantasai: Or treat similar to 0 width
<dael> myles: For kerning you want it not to be similar to 0 width
<dael> fantasai: I think this is a weird edge case.
<dael> myles: You're saying because it's an edge case we should do the simple thing.
<dael> fantasai: For pages authored with crs they're taking care of by html parsing algo. This is just people who stick a character inside.
<dael> fantasai: gecko treats like 0 space and chrome drops. Someone help me decide between the 2 behaviors.
<dael> florian: Dropping seems simplier.
<dael> myles: Harder to impl because changes size of data structire
<dael> koji: Difference between drop and 0 width?
<dael> fantasai: There's a test case.
<dael> myles: In an impl where strings are never copied you'd have to removet he character and shift the future to pretend it doesn't happen.
<dael> astearns: We have 1 browser with 0 width and another inclined.
<dael> fantasai: fremy what does edge do?
<dael> fremy: I can't load page.
<dael> eae: We wouldn't object.
<dael> myles: There are other characters treated as 0 width.
<dael> astearns: fremy do you care?
<dael> fremy: I don't know what Edge does.
<dael> fantasai: There's an IRC test case.
<dael> fremy: It's so broken we need to fix it.
<dael> astearns: Prop: Treat lone CRs as 0 width spaces
<dael> RESOLVED: Treat lone CRs as 0 width spaces
@frivoal

This comment has been minimized.

Copy link
Collaborator

commented Apr 13, 2018

I was talking with @FremyCompany after this discussion, and we were thinking that this resolution is inconsistent with #1990 since we've decided that control characters should be visible, then applying the same treatment for this one as well could make sense.

@frivoal frivoal added the Agenda+ label Jun 6, 2018

@frivoal

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2018

Agenda+ to follow up on the previous comment, and see if we want to resolve the inconsistency between the resolution in this issue and the one in #1990

fantasai added a commit that referenced this issue Jun 7, 2018

@frivoal frivoal added the Agenda+ F2F label Jul 1, 2018

@css-meeting-bot

This comment has been minimized.

Copy link
Member

commented Jul 3, 2018

The Working Group just discussed form feeds and carriage returns, and agreed to the following:

  • RESOLVED: revert the previous lone cr resolution, and treat it as any control character
The full IRC log of that discussion <fantasai> topic: form feeds and carriage returns
<fantasai> github: https://github.com//issues/855
<frremy> florian: In Berlin, we made two conflicting resolutions without noticing
<frremy> florian: frremy noticed after, and raised that
<frremy> florian: one is that we should render control characters
<frremy> florian: two is that cr should not be rendered
<frremy> florian: that is not very consistent, we didn't need to specialize cr given the first resolution
<frremy> florian: no strong opinion, but we could change this
<fantasai> frremy: Edge already does render the CR block
<fantasai> frremy: I thought rndering of chars was not enabled, but Rossen enabled it, but ...
<fantasai> frremy: I'm proposing to move to Edge behavior, consistent with first resolution
<frremy> myles: is there a proposal?
<fantasai> florian: Proposal is discard resolution about treating CRs specially, treat them just like any other control character
<frremy> florian: rescede the resolution we made for lone CR
<frremy> florian: then the resolution we made for control chars will apply
<frremy> astearns: i like the consistent behavior
<frremy> heycam: I'm fine with that
<frremy> florian: is there anybody objecting?
<frremy> (no objection)
<frremy> RESOLVED: revert the previous lone cr resolution, and treat it as any control character
<frremy> dbaron: we don't think there is much content like this?
<frremy> myles: no, because it would be converted by the html parser
<frremy> myles: so the only way would be a script doing that
<frremy> myles: I haven't seen it
<frremy> florian: does the javascript parser also does that?
<frremy> frremy: you can't have line breaks in string
<frremy> xidorn: but with backticks you can
<frremy> florian: nobody is going to be writing es6 code on a OS 8 mac
<frremy> dbaron: OS 9 does something even different
<frremy> (general agreement it should be rare enough)

@frivoal frivoal self-assigned this Oct 3, 2018

frivoal added a commit to frivoal/web-platform-tests that referenced this issue Oct 5, 2018

frivoal added a commit to web-platform-tests/wpt that referenced this issue Oct 8, 2018

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Oct 12, 2018

Bug 1497380 [wpt PR 13389] - [css-text] Control characters should be …
…visible, a=testonly

Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389)

Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855
--

wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f
wpt-pr: 13389

xeonchen pushed a commit to xeonchen/gecko-cinnabar that referenced this issue Oct 15, 2018

Bug 1497380 [wpt PR 13389] - [css-text] Control characters should be …
…visible, a=testonly

Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389)

Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855
--

wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f
wpt-pr: 13389
@fantasai

This comment has been minimized.

Copy link
Collaborator

commented Jan 3, 2019

Reopening because the WG discussed handling of lone CRs in #855, but not form feeds (or at least, they weren't minuted as having been discussed). The question is, should they continue to be handled as zwsp as resolved in response to #855 or should they also be affected by the resolution saying CRs are rendered just like any other control character?

@css-meeting-bot

This comment has been minimized.

Copy link
Member

commented Jan 23, 2019

The CSS Working Group just discussed Question re white space processing rules for U+000D\, and agreed to the following:

  • RESOLVED: amend the previous resolution to include form feeds so they're processed same as lone CRs
The full IRC log of that discussion <dael> Topic: Question re white space processing rules for U+000D\
<dael> github: https://github.com//issues/855#issuecomment-451191125
<dael> fantasai: We discussed handling lone carriage returns. Last couple resolution talked about carriage, but not form feeds. Current spec form feeds are different than every other control character. They are not rendered
<dael> fantasai: Wanted to know what we wanted to do. Treat as 0 width space or no?
<fantasai> https://www.w3.org/TR/css-text-3/#white-space-processing
<dael> fantasai: Spec section ^
<fantasai> Form feeds (U+000C) (that are not segment breaks) are rendered as a zero-width space (U+200B). Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), and form feed (U+000C), must be rendered as a visible glyph which the UA must synthethize if the glyphs found in the font are not visible and otherwise treated as any other character of the Other Symbols (So) general category and
<fantasai> Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its code point value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.
<dael> astearns: I don't have a strong opinion but makes sense to treat form feeds in a consistent way
<dael> astearns: Anyone have a reason to treat form feeds differently?
<dael> astearns: Obj to amend the previous resolution to include form feeds so they're processed same as lone CRs?
<dael> RESOLVED: amend the previous resolution to include form feeds so they're processed same as lone CRs
<dael> astearns: That will require test changes an that will get us feedback

@fantasai fantasai closed this Jan 30, 2019

frivoal added a commit to frivoal/web-platform-tests that referenced this issue Apr 23, 2019

frivoal added a commit to web-platform-tests/wpt that referenced this issue Apr 23, 2019

@frivoal frivoal added the Tested label Apr 25, 2019

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 5, 2019

Bug 1550241 [wpt PR 16443] - [css-test] Handle U+00C0 as other contro…
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443

mykmelez pushed a commit to mykmelez/gecko that referenced this issue Jun 6, 2019

Bug 1550241 [wpt PR 16443] - [css-test] Handle U+00C0 as other contro…
…l characters, a=testonly

Automatic update from web-platform-tests
[css-test] Handle U+00C0 as other control characters

As per w3c/csswg-drafts#855

--

wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82
wpt-pr: 16443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.