-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Languages / writing systems with 2 line breaking conventions in common use? #11
Comments
Additionally, if there are languages with two line breaking behaviors in common use, where the default (as in, the behavior of |
Hmm. Not sure. http://w3c.github.io/elreq/#ethiopic_line_breaking and http://w3c.github.io/elreq/#ethiopic_hyphenation indicate that languages using the Ethiopic script break character by character, regardless of whether space or the word-separator are used between words. However, major browsers actually break on word boundaries (space or word-sep), and i'm not sure whether that might be establishing a new expectation. @dyacob any thoughts on that? |
As far as I can tell, browsers do that because Unicode tells them to: https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt classifies Ethiopic syllables as AL, which by UAX14 prohibits breaks between pairs of such letters. But given the explanation in elreq, that actually makes sense: when ethiopic was primarily written with word separators, using a break-all style of line breaking was fine, but with the advent using spaces, line breaking anywhere becomes somewhat ambiguous. So, what elreq currently describes seems to be the historic reality that breaking between all letters was the common practice. What it doesn't say is whether there's a continued desire for this behavior. |
@dyacob is it reasonable to assert that, although it is mostly used for historic text, some modern content authors of text using Ethiopic orthographies still sometimes want the line to break before the last character that fits, rather than wrapping whole words? This is so that Florian can decide whether to name his line-break property value with a generic or a Korean-specific name. Do you know of other orthographies that behave like Korean? Personally, i think a generic name would be best because even if modern content authors generally don't expect the text to break like Korean, people writing expository texts about archaic scripts will probably also need this.(?) |
@r12a I think that is very reasonable to say, particularly for content authors targetting web media. In print media, the desire is greater to have the inner-word breaking. I would imagine that other scripts that historically used a printed wordspace would behave like Ethiopic with respect to breaking. I don't know of others scripts that behave like Korean ("unbreakable" if I'm understanding it correctly). |
Are there writing-systems other than Korean/Hangul that meet the following criteria:
word-break: normal
Context: the CSS-WG is planning to introduce a new value to the
word-break
property, that behaves likenormal
except for hangul, where it would have behavior (ii) (the same askeep-all
). If this is only useful to Korean, then the name of the value can be specific to korean (i.e.keep-all-hangul
). If some other language would want to use it, then the value should be named something more generic, and the behavior adjusted to handle that other language as well.The reason
keep-all
is insufficient to serve this need is that not all content can be language tagged (for instance, user generated content in an editable text field isn't), andkeep-all
is neither appropriate as a default for all languages, not is it appropriate to content that contains any amount of Korean, multi-lingual content exists, and keep-all would not be appropriate for Korean mixed with Japanese (for instance). So we need a second value that's likenormal
, but with behavior (ii) instead of (i) for hangul.The text was updated successfully, but these errors were encountered: