[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992

tabatkins · 2022-01-26T18:28:19Z

I'd like to propose that U+2028 be rendered as a forced line break.

The changes to the CSS Text Module Level 3 draft would be minimal; for example:

In Section 3, append the sentence "U+2028 LINE SEPARATOR is always a forced line break."
In Section 4.1, exclude U+2028 from the definition of "other space separators."
Optionally, add a "U+2028" column to the table in Section 3, with "Forced line break" in every row.

The rationale is straightforward:

Unicode is very clear about the purpose of U+2028.
There are many circumstances in which it is useful to represent visible line breaks in text strings without additional markup.
There is solid precedent for a character with whitespace behaviour that supersedes all the CSS white-space options, U+00A0 NO-BREAK SPACE.
The essential layout functionality needed to implement U+2028 as a forced line break is not new; browsers already have it if they support "white-space: pre-line".
Current browsers typically render U+2028 as a visible glyph, such as an empty black box. Many developers find this surprising; most likely, it would be less surprising for U+2028 LINE SEPARATOR to be rendered as a line separator, as befits its name.

For reference, the Unicode Standard 14.0 defines U+2028 LINE SEPARATOR as an "unambiguous separator character". By my reading, it could hardly be more clear as to what U+2028 is intended to represent, and what the most sensible rendering should be:

5.8 Newline Guidelines

[...]

Line Separator and Paragraph Separator

A paragraph separator—independent of how it is encoded—is used to indicate a separation between paragraphs. A line separator indicates where a line break alone should occur, typically within a paragraph. [...] For comparison, line separators basically correspond to HTML <BR>, and paragraph separators to older usage of HTML <P> (modern HTML delimits paragraphs by enclosing them in <P>...</P>).

[...]

Recommendations

The Unicode Standard defines two unambiguous separator characters: U+2029 paragraph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous.

I'd appreciate hearing your thoughts and suggested next steps on this.

Thanks very much!

The text was updated successfully, but these errors were encountered:

kennyluck · 2022-01-26T19:37:07Z

zestyping · 2022-01-26T20:28:35Z

Thanks, @tabatkins!

I can't edit the issue description directly, but here it is with the markup fixed up to render correctly on GitHub: [Copied into OP]

xfq · 2022-01-27T09:11:04Z

I tested the rendering of this character in various browsers and editors, for you reference.

In Chromium it is rendered as a box with a cross: (font is Hiragino Kaku Gothic ProN)

In Firefox, Safari, and iCab, it doesn't display at all.

In Visual Studio Code, the editor will emit a warning when it detects this character. See microsoft/vscode#96142

In Atom, it is not rendered. See atom/atom#12157

In Sublime Text 4, it is rendered as <0x2028>:

In TextEdit it is rendered as a forced line break.

In GNU Emacs (27.2) it is rendered as horizontal whitespace instead of a line break, even after enabling whitespace-mode.

In Vim (8.2) it is the same.

For the applications I tested, only TextEdit renders this character as a newline.

See also:

https://lists.w3.org/Archives/Public/www-international/2014JulSep/0084.html (some past discussions in this WG)
"Newlines in HTML may be represented either as U..." whatwg/html#243 (related discussions in the HTML spec)
https://www.w3.org/TR/unicode-xml/#Line (if you have questions about the status of this document, please see Why was this document deprecated? unicode-xml#28 )

zestyping · 2022-01-27T17:41:13Z

Thank you for doing this research, @xfq !

fantasai · 2022-01-27T22:09:12Z

I think this issue is filed on the basis of some misunderstandings.

Sections 3 and 4 are concerned with document white space characters, specifically U+0020, U+0009, and segment breaks. U+2028 LINE SEPARATOR is not included explicitly, and unless the host language defines it as a segment break, it is not affected by any of the rules therein.
U+2028 is not an "other space separator". U+2028 belongs to category Zl, not Zs.
CSS Text 3 already normatively references UAX14's forced break behavior for U+2028, see section 5.1 list item 2.

CSS3 Text has, technically, required LS to be treated as a forced break for at least a decade. If browsers are not treating it as such, that should be considered a bug against them. Closing as invalid (not a spec issue).

@zestyping Copied your fixed markup into the OP! Thanks for caring about this issue, I hope your concern can motivate the browsers to fix this longstanding problem.

xfq · 2022-01-28T05:12:50Z

Browser bug reports: Gecko • Blink • Webkit

Since this code point isn't directly mentioned in css-text, I'm not quite sure if we need to add a relevant test in WPT.

fantasai · 2022-01-28T07:57:23Z

@xfq Tests for any behavior specced in css-text-3, even if indirectly, are welcome in WPT. :) Probably best to do it as a test for all BK/NL characters.

zestyping · 2022-01-28T10:17:43Z

@fantasai Thank you for clarifying this! I do see now that Section 4.1 did not mean to refer to U+2028 when defining "other space separators".

CSS3 Text has, technically, required LS to be treated as a forced break for at least a decade. If browsers are not treating it as such, that should be considered a bug against them.

Can this be taken as an official statement on the WG's intended interpretation of LS? I would be delighted to know that treating U+2028 as a forced line break is already the behaviour that CSS Text 3 intends to specify!

I can imagine browser developers not finding this to be obvious from the spec. If this interpretation is not clear to them, would it be appropriate for me to point them at this comment thread as an authoritative ruling?

Here is why I suspect they might find it rather subtle. CSS Text 3 mentions many other relevant characters by code point (such as U+000A, U+0020, etc.) and name (CARRIAGE RETURN, IDEOGRAPHIC SPACE, etc.). Yet U+2028 is never mentioned anywhere in the entire spec. Neither LINE SEPARATOR nor its abbreviation LSEP is mentioned anywhere. Neither the "Line Separator" category nor its abbreviation "Zl" is mentioned anywhere. An ordinary person can wonder "I wonder why U+2028 doesn't render as a line break", search for the spec, arrive at CSS Text 3, search the entire document for every imaginable term related to U+2028, and find nothing — indeed, that was my experience, and what led me to file this issue. And, of course, we have the empirical evidence of a decade of browser development oblivious to this rule.

Would the CSS editors be willing to consider making this a little more explicit? I can think of one small change that would clear this all up.

As you pointed out, Section 5.1, bullet point 2 says "lines always break at each preserved forced break character".

Regardless of the 'white-space' value, lines always break at each preserved forced break character: thus for all values, line-breaking behavior defined for the BK and NL Unicode line breaking classes must be honored. [UAX14]

But there is no definition for the term "forced break character" in the spec. If you assume that a "forced break character" has something to do with a "forced line break", then the term "preserved forced break character" is nonsensical: "forced line break" is defined in terms of preserved characters, so there can be no such thing as a non-preserved forced break character. If you instead start by trying to understand the term "preserved", you find that it is defined only as part of the term "preserved white space", wherein the default meaning of "white space" is "document white space characters", which consists of U+0020, U+0009, and segment breaks; so "preserved" has no meaning when applied to other characters like U+2028.

Fixing this is easy; delete the confusing term and simplify the bullet point to:

Regardless of the white-space value, Unicode characters with the mandatory break property (BK) must be treated as forced line breaks. This includes U+000C, U+2028, and U+2029 [UAX14].

(I am omitting VT and NEL here because UAX#14 says "implementations are not required to support the VT character" and "implementations are not required to support the NEL character".)

zestyping · 2022-01-29T00:29:30Z

@xfq Thank you for filing https://bugs.webkit.org/show_bug.cgi?id=235753 !

frivoal · 2022-01-31T02:52:07Z

Can this be taken as an official statement on the WG's intended interpretation of LS? I would be delighted to know that treating U+2028 as a forced line break is already the behaviour that CSS Text 3 intends to specify!

I'd agree with that interpretation. css-text-3 states that:

or the BK and NL Unicode line breaking classes must be honored. [UAX14]

UAX14 States that 2028 has non-tailorable BK class, and that “The text after [it] starts at the beginning of the line”.

There's a level of indirection, which may make it non obvious on a casual read, but I think it's unambiguous that this is the expected behavior.

CSS Text 3 mentions many other relevant characters by code point (such as U+000A, U+0020, etc.) and name (CARRIAGE RETURN, IDEOGRAPHIC SPACE, etc.). Yet U+2028 is never mentioned anywhere in the entire spec

css-text-3 mentions those characters where special css-specific processing going beyond (or against) Unicode is needed. For the rest, as stated in 1.5, “CSS is built on Unicode. UAs […] must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS.” So css-text-3 cannot be implemented correctly without referencing Unicode (and in particular UAX14), which in the case of U+2028, gives us a definitive normative answer.

That said, if an editorial chance can make this clearer, I'd be happy to take that on.

Fixing this is easy; delete the confusing term and simplify the bullet point to:

Regardless of the white-space value, Unicode characters with the mandatory break property (BK) must be treated as forced line breaks. This includes U+000C, U+2028, and U+2029. [UAX14]

I don't think this quite works. That covers the BK class, but leaves off preserved segments breaks (U+000A).

Also

I am omitting VT and NEL here because UAX#14 says "implementations are not required to support…

I am interpreting css-text-3 to be going beyond Unicode here, removing the optionality, and adding a requirement that this be supported for the sake of interoperability, so I'd rather keep it.

How about

Preserved segment breaks, and—regardless of the white-space value—any Unicode character with the BK or LN line breaking class, must be treated as forced line breaks. [UAX14]
Note: As of Unicode 14, the BK and NL classes include U+000B, U+000C, U+0085, U+2028, and U+2029.

zestyping · 2022-02-04T23:40:38Z

@frivoal That looks great! I agree with your reasoning. Thank you for the careful review and clarification.

frivoal · 2022-02-07T09:56:25Z

@fantasai does the proposal at the bottom of #6992 (comment) look reasonable to you, or do you think I missed something?

zestyping · 2022-03-30T00:59:30Z

@fantasai I see that the first sentence of @frivoal's suggestion made it into https://www.w3.org/TR/css-text-4/:

Regardless of the white-space value, lines always break at each preserved forced break character: thus for all values, line-breaking behavior defined for the BK and NL Unicode line breaking classes must be honored. [UAX14]

but not the second sentence:

Note: As of Unicode 14, the BK and NL classes include U+000B, U+000C, U+0085, U+2028, and U+2029.

Any particular reason why this should not be included? I realize these code points are implied by reference to UAX14, but it seems nice to be explicit, especially given that plenty of other code points are mentioned by number in this draft.

fantasai · 2022-04-20T02:18:46Z

@zestyping As noted in #6992 (comment), that sentence was always there: https://www.w3.org/TR/css-text-3/#line-break-details

fantasai · 2022-04-20T03:19:21Z

Updated the specs to use Florian's rephrasing. As for a note listing all the individual codepoints... I think it's better to just make sure there's testcases in WPT.

See w3c/csswg-drafts#6992

…rs creating line breaks, a=testonly Automatic update from web-platform-tests Add tests for BK and NL Unicode characters creating line breaks See w3c/csswg-drafts#6992 -- wpt-commits: a8ee96901b9eabf3876d38d3328bf1320b115ca6 wpt-pr: 37696

tabatkins added the css-text-4 label Jan 26, 2022

fantasai added Closed Rejected as Invalid css-text-3 Current Work and removed css-text-4 labels Jan 27, 2022

fantasai closed this as completed Jan 27, 2022

xfq mentioned this issue Jan 28, 2022

Unicode Line Separator (U+2028) not rendered correctly ajaxorg/ace#3178

Closed

fantasai added the Needs Testcase (WPT) label Jan 28, 2022

fantasai reopened this Jan 31, 2022

fantasai added a commit that referenced this issue Apr 20, 2022

[css-text] Use Florian's rephrasing. #6992

c8ac454

fantasai added the Closed Accepted as Editorial label Apr 20, 2022

frivoal removed the Closed Rejected as Invalid label Apr 20, 2022

fantasai closed this as completed Apr 20, 2022

frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022

Add tests for BK and NL Unicode characters creating line breaks

d0db16a

See w3c/csswg-drafts#6992

frivoal mentioned this issue Dec 29, 2022

Add tests for BK and NL Unicode characters creating line breaks web-platform-tests/wpt#37696

Merged

frivoal removed the Needs Testcase (WPT) label Dec 29, 2022

frivoal added the Tested Memory aid - issue has WPT tests label Dec 29, 2022

frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022

Add tests for BK and NL Unicode characters creating line breaks

672d27f

See w3c/csswg-drafts#6992

frivoal added a commit to web-platform-tests/wpt that referenced this issue Dec 29, 2022

Add tests for BK and NL Unicode characters creating line breaks

a8ee969

See w3c/csswg-drafts#6992

Mouvedia mentioned this issue Jul 15, 2024

feat(biome_css_analyze): implement noIrregularWhitespaceCss biomejs/biome#3428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992

[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992

tabatkins commented Jan 26, 2022 •

edited by fantasai

Loading

5.8 Newline Guidelines

Line Separator and Paragraph Separator

Recommendations

kennyluck commented Jan 26, 2022

zestyping commented Jan 26, 2022 •

edited by fantasai

Loading

xfq commented Jan 27, 2022

zestyping commented Jan 27, 2022

fantasai commented Jan 27, 2022 •

edited

Loading

xfq commented Jan 28, 2022

fantasai commented Jan 28, 2022

zestyping commented Jan 28, 2022 •

edited

Loading

zestyping commented Jan 29, 2022 •

edited

Loading

frivoal commented Jan 31, 2022

zestyping commented Feb 4, 2022 •

edited

Loading

frivoal commented Feb 7, 2022

zestyping commented Mar 30, 2022

fantasai commented Apr 20, 2022

fantasai commented Apr 20, 2022

[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992

[css-text] Render U+2028 LINE SEPARATOR as a forced line break #6992

Comments

tabatkins commented Jan 26, 2022 • edited by fantasai Loading

5.8 Newline Guidelines

Line Separator and Paragraph Separator

Recommendations

kennyluck commented Jan 26, 2022

zestyping commented Jan 26, 2022 • edited by fantasai Loading

xfq commented Jan 27, 2022

zestyping commented Jan 27, 2022

fantasai commented Jan 27, 2022 • edited Loading

xfq commented Jan 28, 2022

fantasai commented Jan 28, 2022

zestyping commented Jan 28, 2022 • edited Loading

zestyping commented Jan 29, 2022 • edited Loading

frivoal commented Jan 31, 2022

zestyping commented Feb 4, 2022 • edited Loading

frivoal commented Feb 7, 2022

zestyping commented Mar 30, 2022

fantasai commented Apr 20, 2022

fantasai commented Apr 20, 2022

tabatkins commented Jan 26, 2022 •

edited by fantasai

Loading

zestyping commented Jan 26, 2022 •

edited by fantasai

Loading

fantasai commented Jan 27, 2022 •

edited

Loading

zestyping commented Jan 28, 2022 •

edited

Loading

zestyping commented Jan 29, 2022 •

edited

Loading

zestyping commented Feb 4, 2022 •

edited

Loading