-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text] Question re white space processing rules for U+000D #855
Comments
(cc myself) |
Probably we should make the spec match the implementations here. |
The CSS Working Group just discussed
The full IRC log of that discussion<dael> Topic: Question re white space processing rules for U+000D<dael> Github topic: https://github.com//issues/855 <dael> Rossen: Is dauwhe on? <dael> fantasai: Rossen can we have WD resolution? <dael> Rossen: Oh, yes. <fantasai> TabAtkins, would you mind adding the at-risk thing and pushing to /TR? <dael> Rossen: Is anyone prepared to talk on this? <dael> TabAtkins: I can do it based on thread. <dael> TabAtkins: According to the text spec a lone U-000D is the same as a proper line break. <dael> TabAtkins: Impl don't do that, they dropit frfom white space processing. <dael> TabAtkins: There's a test case in the issue. It appears it's consistant. We should prob change spec to match impl. <dael> Florian: To clarify, is that the one used to be used before OS10? <dael> TabAtkins: Someone used it at some point. <dael> fantasai: Yes. <dael> Florian: So it's not likely something on the rise, it's t he other way around. <dael> ?? Matching imple i s good idea. <myles> s/??/myles/ <plinss> s/one used/one macs used/ <dael> dbaron: This is about carrage returns getting to the CSS level. Classic mac could have processed before. <zcorpan> or textContent = "foo\rbar" <zcorpan> html parser normalizes <dael> TabAtkins: I don't think HTML parser does magic. <dael> gsnedders: It repleaces them with [missed] <dael> TabAtkins: Nevermind. He's just using entitities in text example. <gsnedders> s/[missed]/line feed (000A)/ <dael> s/text/test <myles> s/[missed]/line feeds/ <dael> Florian: Then go for it. <dbaron> s/carrage/carriage/ <dael> Florian: Make the suggested change to match impl. <dael> Rossen: Okay. Any objections to accepting the change? <dael> RESOLVED: Accept the change in https://github.com//issues/855 |
So, as I was working on this I realized that making a character invisible is underdefined: what effect does it have on surrounding characters? Does it break joining? Afaict, Chrome drops form feeds and carriage returns from rendering entirely, whereas Gecko treats them similar to zwsp. What do we want to do here? |
The Working Group just discussed
The full IRC log of that discussion<fantasai> Topic: Lone CRs<fantasai> https://drafts.csswg.org/css-text-3/issues-lc-2013#issue-138 <fantasai> github: https://github.com//issues/855 <dael> fantasai: Jonathan Kew reports a long cr is just like a lone line feed and it's a segment break. Browsers discared the carraige returns. If you put them in your source they're transformed. However if you inject it via JS into the dom text content then it disappears. <dael> fantasai: If they used dthml and not escape it would happen. We accepted the change and then I realized that making a character invisible is under defined. <dael> myles: Can we pretend it never existed? <fantasai> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%D8%B4%D8%AA%0AVA%3Cbr%3E%0A%D8%B4%26%23x000c%3B%D8%AA%0AV%26%23x000c%3BA%3Cbr%3E%0A%D8%B4%26%23x000d%3B%D8%AA%0AV%26%23x000d%3BA%3Cbr%3E%0A%D8%B4%26%23x200b%3B%D8%AA%0AV%26%23x200b%3BA%0A%0A <dael> fantasai: Or treat similar to 0 width <dael> myles: For kerning you want it not to be similar to 0 width <dael> fantasai: I think this is a weird edge case. <dael> myles: You're saying because it's an edge case we should do the simple thing. <dael> fantasai: For pages authored with crs they're taking care of by html parsing algo. This is just people who stick a character inside. <dael> fantasai: gecko treats like 0 space and chrome drops. Someone help me decide between the 2 behaviors. <dael> florian: Dropping seems simplier. <dael> myles: Harder to impl because changes size of data structire <dael> koji: Difference between drop and 0 width? <dael> fantasai: There's a test case. <dael> myles: In an impl where strings are never copied you'd have to removet he character and shift the future to pretend it doesn't happen. <dael> astearns: We have 1 browser with 0 width and another inclined. <dael> fantasai: fremy what does edge do? <dael> fremy: I can't load page. <dael> eae: We wouldn't object. <dael> myles: There are other characters treated as 0 width. <dael> astearns: fremy do you care? <dael> fremy: I don't know what Edge does. <dael> fantasai: There's an IRC test case. <dael> fremy: It's so broken we need to fix it. <dael> astearns: Prop: Treat lone CRs as 0 width spaces <dael> RESOLVED: Treat lone CRs as 0 width spaces |
I was talking with @FremyCompany after this discussion, and we were thinking that this resolution is inconsistent with #1990 since we've decided that control characters should be visible, then applying the same treatment for this one as well could make sense. |
Agenda+ to follow up on the previous comment, and see if we want to resolve the inconsistency between the resolution in this issue and the one in #1990 |
The Working Group just discussed
The full IRC log of that discussion<fantasai> topic: form feeds and carriage returns<fantasai> github: https://github.com//issues/855 <frremy> florian: In Berlin, we made two conflicting resolutions without noticing <frremy> florian: frremy noticed after, and raised that <frremy> florian: one is that we should render control characters <frremy> florian: two is that cr should not be rendered <frremy> florian: that is not very consistent, we didn't need to specialize cr given the first resolution <frremy> florian: no strong opinion, but we could change this <fantasai> frremy: Edge already does render the CR block <fantasai> frremy: I thought rndering of chars was not enabled, but Rossen enabled it, but ... <fantasai> frremy: I'm proposing to move to Edge behavior, consistent with first resolution <frremy> myles: is there a proposal? <fantasai> florian: Proposal is discard resolution about treating CRs specially, treat them just like any other control character <frremy> florian: rescede the resolution we made for lone CR <frremy> florian: then the resolution we made for control chars will apply <frremy> astearns: i like the consistent behavior <frremy> heycam: I'm fine with that <frremy> florian: is there anybody objecting? <frremy> (no objection) <frremy> RESOLVED: revert the previous lone cr resolution, and treat it as any control character <frremy> dbaron: we don't think there is much content like this? <frremy> myles: no, because it would be converted by the html parser <frremy> myles: so the only way would be a script doing that <frremy> myles: I haven't seen it <frremy> florian: does the javascript parser also does that? <frremy> frremy: you can't have line breaks in string <frremy> xidorn: but with backticks you can <frremy> florian: nobody is going to be writing es6 code on a OS 8 mac <frremy> dbaron: OS 9 does something even different <frremy> (general agreement it should be rare enough) |
…l characters, a=testonly Automatic update from web-platform-tests [css-test] Handle U+00C0 as other control characters As per w3c/csswg-drafts#855 -- wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82 wpt-pr: 16443
…l characters, a=testonly Automatic update from web-platform-tests [css-test] Handle U+00C0 as other control characters As per w3c/csswg-drafts#855 -- wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82 wpt-pr: 16443
…visible, a=testonly Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389) Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855 -- wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f wpt-pr: 13389 UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
…visible, a=testonly Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389) Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855 -- wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f wpt-pr: 13389 UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
…visible, a=testonly Automatic update from web-platform-tests[css-text] Control characters should be visible (#13389) Tests for w3c/csswg-drafts#1990 and w3c/csswg-drafts#855 -- wpt-commits: 434ca4744845966d5f3f87355f41ccc6f777376f wpt-pr: 13389 UltraBlame original commit: 3dcedb9143bb1eb83104848a61127b1d97a39bed
…l characters, a=testonly Automatic update from web-platform-tests [css-test] Handle U+00C0 as other control characters As per w3c/csswg-drafts#855 -- wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82 wpt-pr: 16443 UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
…l characters, a=testonly Automatic update from web-platform-tests [css-test] Handle U+00C0 as other control characters As per w3c/csswg-drafts#855 -- wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82 wpt-pr: 16443 UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
…l characters, a=testonly Automatic update from web-platform-tests [css-test] Handle U+00C0 as other control characters As per w3c/csswg-drafts#855 -- wpt-commits: eea2d659e470be96d0f8fc4d5b535c7faa8fee82 wpt-pr: 16443 UltraBlame original commit: bb7f44b44ba3a6d39f87e777a88ee56b7f75a892
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188 bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1657437 gecko-commit: b579c12907dc3bb210177454a696ccc181c1ded0 gecko-integration-branch: autoland gecko-reviewers: jfkthame
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188 bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1657437 gecko-commit: b579c12907dc3bb210177454a696ccc181c1ded0 gecko-integration-branch: autoland gecko-reviewers: jfkthame
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188 UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188 UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188 UltraBlame original commit: b579c12907dc3bb210177454a696ccc181c1ded0
That prevents preceding whitespace from getting collapsed. When there's a single lone CR (so `a\rb`) our behavior here diverges from Chrome's but matches Safari's. We treat it as ZWSP. That matches the initial resolution of [1], but then there have been various doing and undoings of that resolution, so it's not totally clear to me what the correct behavior per spec should be. I think "treat it as other control character"? But I haven't dug into what that implies, so for now I've just kept behavior there as-is. [1]: w3c/csswg-drafts#855 Differential Revision: https://phabricator.services.mozilla.com/D86188
AFAICT from https://drafts.csswg.org/css-text-3/#white-space-processing, a lone CR (U+000D) character should be treated just like a lone LF (U+000A) or a CRLF pair: it is a segment break, which will be transformed to a preserved line feed, removed, or transformed to a space (U+0020), depending on the value of
white-space
and possibly the context of the segment break.However, none of the browsers I have tested so far (Firefox, Chrome, Safari, Edge) appear to behave this way; rather, they all discard the lone CR.
Testcase: https://people-mozilla.org/~jkew/tests/cr.html
Am I misunderstanding something here, should the spec be changed to better match actual behavior, or do we expect all the browsers to change to match the spec?
The text was updated successfully, but these errors were encountered: