Languages / writing systems with 2 line breaking conventions in common use? #11

frivoal · 2019-10-16T18:16:44Z

Are there writing-systems other than Korean/Hangul that meet the following criteria:

has two different line-breaking conventions in relatively common use, between which document authors (or possibly readers) may want to switch:
1. one of which allows breaking between any letters (letter = grapheme cluster)
2. the other one of which disallows breaking between letters of a word and only allowing breaking a spaces
has variant (i) as the "default" behavior, in the sense of being the one invoked by css's word-break: normal

Context: the CSS-WG is planning to introduce a new value to the word-break property, that behaves like normal except for hangul, where it would have behavior (ii) (the same as keep-all). If this is only useful to Korean, then the name of the value can be specific to korean (i.e. keep-all-hangul). If some other language would want to use it, then the value should be named something more generic, and the behavior adjusted to handle that other language as well.

The reason keep-all is insufficient to serve this need is that not all content can be language tagged (for instance, user generated content in an editable text field isn't), and keep-all is neither appropriate as a default for all languages, not is it appropriate to content that contains any amount of Korean, multi-lingual content exists, and keep-all would not be appropriate for Korean mixed with Japanese (for instance). So we need a second value that's like normal, but with behavior (ii) instead of (i) for hangul.

The text was updated successfully, but these errors were encountered:

frivoal · 2019-10-16T18:27:14Z

Additionally, if there are languages with two line breaking behaviors in common use, where the default (as in, the behavior of word-break: normal) is the other way around and which would benefit from being able to opt into a normal-with-break-all-for-a-certain-script, that too would be useful to know.

r12a · 2019-10-24T17:12:48Z

Hmm. Not sure.

http://w3c.github.io/elreq/#ethiopic_line_breaking and http://w3c.github.io/elreq/#ethiopic_hyphenation indicate that languages using the Ethiopic script break character by character, regardless of whether space or the word-separator are used between words. However, major browsers actually break on word boundaries (space or word-sep), and i'm not sure whether that might be establishing a new expectation. @dyacob any thoughts on that?

frivoal · 2020-01-21T18:02:36Z

As far as I can tell, browsers do that because Unicode tells them to: https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt classifies Ethiopic syllables as AL, which by UAX14 prohibits breaks between pairs of such letters.

But given the explanation in elreq, that actually makes sense: when ethiopic was primarily written with word separators, using a break-all style of line breaking was fine, but with the advent using spaces, line breaking anywhere becomes somewhat ambiguous.

So, what elreq currently describes seems to be the historic reality that breaking between all letters was the common practice. What it doesn't say is whether there's a continued desire for this behavior.

r12a · 2024-01-24T14:54:00Z

@dyacob is it reasonable to assert that, although it is mostly used for historic text, some modern content authors of text using Ethiopic orthographies still sometimes want the line to break before the last character that fits, rather than wrapping whole words? This is so that Florian can decide whether to name his line-break property value with a generic or a Korean-specific name.

Do you know of other orthographies that behave like Korean?

Personally, i think a generic name would be best because even if modern content authors generally don't expect the text to break like Korean, people writing expository texts about archaic scripts will probably also need this.(?)

dyacob · 2024-01-25T22:25:08Z

@r12a I think that is very reasonable to say, particularly for content authors targetting web media. In print media, the desire is greater to have the inner-word breaking. I would imagine that other scripts that historically used a printed wordspace would behave like Ethiopic with respect to breaking.

I don't know of others scripts that behave like Korean ("unbreakable" if I'm understanding it correctly).

r12a mentioned this issue Oct 24, 2019

Does Ethiopic text also get wrapped by word? w3c/elreq#116

Open

frivoal mentioned this issue Jan 23, 2020

[css-text] Need additional value of word-break for Korean w3c/csswg-drafts#4285

Open

aphillips mentioned this issue Jul 11, 2023

Keep track of line-breaking in Korean for i18n-discuss#11 w3c/i18n-actions#16

Open

r12a mentioned this issue Dec 19, 2023

contact Daniel and check the Ethiopic word boundary situation w3c/i18n-actions#65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Languages / writing systems with 2 line breaking conventions in common use? #11

Languages / writing systems with 2 line breaking conventions in common use? #11

frivoal commented Oct 16, 2019

frivoal commented Oct 16, 2019

r12a commented Oct 24, 2019

frivoal commented Jan 21, 2020

r12a commented Jan 24, 2024 •

edited

Loading

dyacob commented Jan 25, 2024 •

edited

Loading

Languages / writing systems with 2 line breaking conventions in common use? #11

Languages / writing systems with 2 line breaking conventions in common use? #11

Comments

frivoal commented Oct 16, 2019

frivoal commented Oct 16, 2019

r12a commented Oct 24, 2019

frivoal commented Jan 21, 2020

r12a commented Jan 24, 2024 • edited Loading

dyacob commented Jan 25, 2024 • edited Loading

r12a commented Jan 24, 2024 •

edited

Loading

dyacob commented Jan 25, 2024 •

edited

Loading