Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] Allow alias for language hyphenation #5270

Open
sujato opened this issue Jun 30, 2020 · 7 comments
Open

[css-text] Allow alias for language hyphenation #5270

sujato opened this issue Jun 30, 2020 · 7 comments
Labels
css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@sujato
Copy link

sujato commented Jun 30, 2020

The CSS spec provides for hyphenation of text, leaving the choice of language up to the UA:

https://www.w3.org/TR/css-text-4/#hyphenation

Currently Firefox offers the best support, but even they only support fairly small subset of the world's languages.

https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens

The thing is, it is sometimes better to have imperfect hyphenation than none at all. No hyphenation can result in a broken UI and unreadable text, whereas imperfect hyphenation might work fine, or at worst be merely inelegant.

I work with texts in Pali and Sanskrit, which can have very long words formed by compounding. There is no browser support for hyphenation for these, nor is there likely to be. Surely these are not the only languages affected. Here is a typical example, rendered in firefox:

Screenshot from 2020-06-30 09-26-16

It is possible to hack around this by activating hyphens and setting lang='la':

Screenshot from 2020-06-30 09-25-58

This is identical to the result that a proper Pali hyphenation would produce. Note that in tradition Indic orthography, there is no concept of a correct breakpoint; scribes merely wrote to the end of the line and continued on the next line. Thus the traditional practice would agree with the idea that sometimes any breakpoint is better than none.

However, it's obviously not a good idea to deliberately set a false language. Hence my proposal:

Allow the CSS to declare a language alias for hyphenation.

So the text language is unaffected, and the HTML does not change. But the user can declare via CSS something like:

hyphenate-alias-languages: pli, la;

Meaning: "for the purpose of hyphenation, Latin and Pali may be substituted."

Such substitution would apply only if explicit support for that language is missing. So if lang='pli' is set on the HTML, then if one UA has support for Pali hyphens, that is used, if not, it looks for support for Latin.

@faceless2
Copy link

That seems very reasonable to me, although I'd suggest a syntax more like:

:lang(pli) { hyphenate-language-fallback: la; }

I would have gone with hyphenate-language-override to align with font-language-override, but @sujato states the intention to to provide an equivalence only if support for the intended language is missing. CSS uses the term "fallback" for this type of concept in css-counter-style-3. It should probably accept a comma-separated list - I don't think the presence of a hyphenation dictionary for latin can be guaranteed!

@sujato
Copy link
Author

sujato commented Jul 1, 2020

Oh yes, that looks much better, thanks!

And yes, a comma separated list would be ideal.

@AmeliaBR AmeliaBR added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Jul 1, 2020
@AmeliaBR
Copy link
Contributor

AmeliaBR commented Jul 1, 2020

Labelling this to get feedback from internationalization experts, but I agree that this sounds like a good proposal. We definitely don't want authors to hack around with incorrect language tags just to get hyphenation!

@fantasai
Copy link
Collaborator

fantasai commented Jul 1, 2020

Wouldn't it make more sense to build this information into CLDR and have the aliasing built into the browser?

@sujato
Copy link
Author

sujato commented Jul 2, 2020

@fantasai I'm not really sure how all this works, but my concern would be that this should be left up to the site designer. It's hard to say that X language hyphenation will be an adequate fallback for Y language in all cases; whereas it is, I think, possible to say that it will work in this case. As with most typographic refinements, there are pluses and minuses, and the site designer would need to weigh them up.

@xfq
Copy link
Member

xfq commented Jul 3, 2020

@sujato One way to solve this issue is to come up with a reasonable default set of aliases (if possible) and add the aliases to the user agent style sheet (or just add them to css-text as the default hyphenation behavior), and the author can override them in their own style sheet.

@sujato
Copy link
Author

sujato commented Jul 3, 2020

Sure, that might work, so long as it is possible to override the defaults. Personally I prefer to let people opt in, but I will leave that to the experts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

6 participants