[css-text-4] Remove collapsible line breaks adjacent to word separators #3481

fantasai · 2019-01-03T17:53:16Z

We have rules in place that eliminate line breaks if they are adjacent to ZWSP, leaving behind the ZWSP when assembling the paragraph text form multiple lines of source text. However, we didn't consider explicit word separators such as the Ethiopic word space. Probably all “word separators” (other than space and nbsp) should have the same behavior as ZWSP here.

asmusf · 2019-01-04T03:29:20Z

Tibetan intersyllabic tsheg?

fantasai · 2019-01-14T23:53:43Z

@asmusf I was thinking about that, yes. Also the Ogham space mark.

r12a · 2019-02-26T06:39:08Z

Seems to me that there are a number of characters that are or were (in archaic script use) used in place of spaces. But i'm surprised that this is an issue. @fantasai could you point to the part of the spec that is in question?

fantasai · 2019-02-27T04:10:30Z

@r12a https://www.w3.org/TR/css-text-3/#line-break-transform Where we handle ZWSP, it might make sense to handle other word separators that aren't spaces.

r12a · 2019-05-17T13:00:22Z

I suspect that what distinguishes ZWSP and TSEK in these circumstances is that [thai etc character][zwsp][whitespace] is likely to be an error, whereas [tibetan character][tsek][white space] is not (even when spaces in tibetan would theoretically use NBSP), or if it is an error this can only be detected by understanding the text and/or the intention of the author. Same goes for ethiopic word space.

I suspect that, mostly, content authors just need to be careful about how they compose the source text, so that spans of text that shouldn't include spaces don't, even if they are using an editor or tool that wraps lines automatically. It seems to me that that's also the approach you'd need to take when composing text in archaic hangul styles, where they didn't use spaces between words.

(Btw, this probably has implications for some aspects of Semantic linefeeds if the language of the text used doesn't employ spaces as word separators.)

fantasai · 2019-09-11T02:46:33Z

I think the goal should be that other languages are not at a significant disadvantage in how they organize their source code, i.e. make semantic linefeeds possible for all languages where we can plausibly do so without breaking existing content.

fantasai · 2019-09-11T02:48:16Z

I'm not sure what that means for what characters we should consider... I'm pretty sure that the Ogham space mark and Ethiopic word space should collapse with subsequent spaces, it doesn't make sense to want both. But for Tibetan, I'm not sure, does it really use spaces after tsek marks? (I know they do after shad, but that's a different character.) @r12a

css-meeting-bot · 2019-09-17T05:04:54Z

The CSS Working Group just discussed Collapsible breaks adjacent to word separtors.

The full IRC log of that discussion

<fantasai> Topic: Collapsible breaks adjacent to word separtors
<heycam> github: https://github.com//issues/3481
<heycam> fantasai: we generally have this concept in CSS and HTML that you can use white space to format your source, and we collapse white space down to a single space
<heycam> ... including line breaks
<heycam> ... for Chinese and Japanese which don't use spaces, we have some rules to remove the space otherwise you will be forced to put all paras on one line
<heycam> ... there are some rules for doing that based on character classes
<heycam> ... what we didn't consider thoroughly is languages that use a word separator that's not a space
<heycam> ... we do special case ZWSP, for Thai and other languages
<heycam> ... but we don't have something similar for Ethiopic word space
<heycam> ... probably don't also want a regular space there
<heycam> ... proposal is when there's a word separator character adjacent to a line break, the line break just goes away
<heycam> ... I think the characters that are affected here are Ogham space mark and Ethiopic word space and the Tibetan tsek
<heycam> AmeliaBR: does this map to something in Unicode? or do we need to maintain this list?
<koji> https://drafts.csswg.org/css-text-3/#word-separator
<heycam> r12a: I think there is something, not sure if it's fit for this purpose
<heycam> r12a: archaic scripts have other examples
<heycam> y
<heycam> fantasai: [reads definition in the spec right now for word-spacing]
<heycam> florian: we need to maintain a list
<heycam> myles: let's ask Unicode to do it
<heycam> ... if there is such a facility for these character lists, hard to believe it's specific for the web platform
<heycam> ... and not needed in text editors for example
<heycam> ... I don't think the web specs should maintain this list
<heycam> florian: I agree with part of your statement, should try to work this out with Unicode
<heycam> ... this one specifically maybe, but some are specifically web platform relatively
<heycam> ... since this is relevant to turning HTML markup into text
<heycam> myles: there are many different markup languages...
<heycam> fantasai: there are 2 questions
<heycam> ... if we want to do this, and then whether we maintain the list of if Unicode should
<heycam> addison: i think we want to do some research
<heycam> ... space or no space is a classic problem
<heycam> ... I would be surprised if there weren't something, but don't know off the top of my head
<heycam> ... would be happy to engage
<heycam> myles: if this is a classical problem, it's been solved, and we should figure out how it's been solved in the past and re-use that solution
<heycam> fantasai: looking at some of the stuff in css-text, weh ave a concept of word separateors
<heycam> ... and it includes a set of code points
<heycam> ... it excludes Ogham space mark
<heycam> ... since it would cause text to not join any more
<heycam> ... so general usage in UNicode is text processing segmentation is not going to account ofr that concern, since they don't deal with typesetting
<heycam> ... so there's gonna be some aspects of how we're using Unicode codepoints with sepecific requirements that haven't come up in Unicode's context so far
<heycam> ... unbreaking lines is something that's been hard to explain to them
<heycam> myles: maybe we shouldn't be unbreaking them?
<heycam> fantasai: too late for that!
<heycam> addison: fwiw I've had to write this code in the past, and it's not any fun
<heycam> ... it maye have been individually solved but not written down
<fantasai> fantasai: HTML has been unbreaking lines for as long as it has existed, we want to make that ability available to more languages
<heycam> r12a: like with the other issues, we need to look in more detail
<heycam> ... the Tsek is a syllable separator, not the same as a word joiner
<heycam> ... you could end a line with a Tsek, then start with more Tibetan on the next line, with indentation, and no real reason to join those together necessarily
<heycam> fantasai: you wouldn't make the Tsek go away, just avoid the extra space going in there
<heycam> ACTION: i18n to look this issue of word separators next to newlines
<trackbot> Error finding 'i18n'. You can review and register nicknames at <https://www.w3.org/Style/CSS/Tracker/users>.
<addison> action: addison: ensure we respond to css 3481
<trackbot> Error finding 'addison'. You can review and register nicknames at <https://www.w3.org/Style/CSS/Tracker/users>.

css-meeting-bot · 2020-01-24T15:19:40Z

The CSS Working Group just discussed Removing collapsible linebreaks", and agreed to the following:

RESOLVED: Punt "removing collapsible linebreaks adjacent to work separators" to level 4

The full IRC log of that discussion

<TabAtkins> Topic: Removing collapsible linebreaks"
<astearns> github: https://github.com//issues/3481
<TabAtkins> fantasai: Proposal is to defer to level 4
<TabAtkins> astearns: Anyone concerned about punting?
<TabAtkins> astearns: reading thru the issue, lots of words I don't know...
<TabAtkins> astearns: We discussed previously and didn't get a conclusion
<TabAtkins> fantasai: Looks like it'll need more research and digging.
<TabAtkins> fantasai: I think we should get the spec done and defer this.
<TabAtkins> RESOLVED: Punt "removing collapsible linebreaks adjacent to work separators" to level 4

fantasai added css-text-3 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Jan 3, 2019

himorin mentioned this issue Jan 7, 2019

[css-text-3] Remove collapsible line breaks adjacent to word separators w3c/i18n-activity#627

Open

fantasai added the Tracked in DoC label Jan 15, 2019

fantasai mentioned this issue Sep 10, 2019

[css-text-4] timing of the virtual word boundary insertion and of the U+200B conversion #4261

Open

css-meeting-bot mentioned this issue Sep 17, 2019

[css-text] Clarify whether soft breaks exist at boundaries of an inline element with word-break:break-all #3897

Closed

frivoal added Agenda+ F2F Closed Deferred labels Jan 22, 2020

mozilla-apprentice mentioned this issue Jan 24, 2020

[css-text-3] Remove collapsible line breaks adjacent to word separators mozilla/wg-decisions#199

Closed

astearns removed the Agenda+ F2F label Jan 24, 2020

frivoal added css-text-4 and removed css-text-3 Current Work labels Jun 5, 2020

frivoal changed the title ~~[css-text-3] Remove collapsible line breaks adjacent to word separators~~ [css-text-4] Remove collapsible line breaks adjacent to word separators Jun 5, 2020

fantasai added Needs Design / Proposal and removed Closed Deferred labels Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-text-4] Remove collapsible line breaks adjacent to word separators #3481

[css-text-4] Remove collapsible line breaks adjacent to word separators #3481

fantasai commented Jan 3, 2019

asmusf commented Jan 4, 2019 •

edited

Loading

fantasai commented Jan 14, 2019

r12a commented Feb 26, 2019

fantasai commented Feb 27, 2019

r12a commented May 17, 2019 •

edited

Loading

fantasai commented Sep 11, 2019

fantasai commented Sep 11, 2019 •

edited

Loading

css-meeting-bot commented Sep 17, 2019

css-meeting-bot commented Jan 24, 2020

[css-text-4] Remove collapsible line breaks adjacent to word separators #3481

[css-text-4] Remove collapsible line breaks adjacent to word separators #3481

Comments

fantasai commented Jan 3, 2019

asmusf commented Jan 4, 2019 • edited Loading

fantasai commented Jan 14, 2019

r12a commented Feb 26, 2019

fantasai commented Feb 27, 2019

r12a commented May 17, 2019 • edited Loading

fantasai commented Sep 11, 2019

fantasai commented Sep 11, 2019 • edited Loading

css-meeting-bot commented Sep 17, 2019

css-meeting-bot commented Jan 24, 2020

asmusf commented Jan 4, 2019 •

edited

Loading

r12a commented May 17, 2019 •

edited

Loading

fantasai commented Sep 11, 2019 •

edited

Loading