Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[css-text] For most languages, hyphens:auto should not hyphenate Capitalized words #3927
When auto-hyphenation is in use, I believe that in most languages - with German being the major exception - it would be preferable for browsers not to hyphenate capitalized words, which will often be proper nouns. In many cases authors and readers will prefer that names (of people, companies, etc) not be split, and in addition hyphenation rules designed for the "normal" words of a language may fail to hyphenate many names appropriately.
(https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 was recently filed against Gecko about this issue.)
The CSS Text 3 spec explicitly does not specify exactly where hyphenation opportunities occur when
For CSS Text 4, perhaps a property should be introduced to allow authors to explicitly control this behavior; e.g.
Well, what's "the right thing" for a browser to do regarding hyphenation of capitalized words? I don't think there's a clear answer to that, although I do think browsers should try for a sensible default behavior, and in https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 we just made the suggested adjustment for Firefox.
The problem is that in some cases authors/users may prefer that proper names not be hyphenated (as requested in the Mozilla bug); we can't reliably identify proper names in general text, but we can use capitalized words as the best available proxy for this (except in German); but this has the drawback that we'll also suppress hyphenation of non-names at the beginning of sentences; in some cases, this trade-off may be too great and it'd be preferable to allow capitalized hyphenation after all. I don't think a single hard-coded behavior will ever satisfy all use cases.
(A further refinement to the heuristic -- not yet implemented -- would be to make the behavior dependent on line width, so that as line width is reduced, constraints on what may be hyphenated are relaxed.)
Note that systems such as TeX (the
Obviously, authors can override the browser's heuristics by adding markup to individual names; the question here is what kind of default behavior, and how much author control, we can/should offer for (the overwhelming majority of) text that does not have that level of detailed markup.
I'm not sure that not hyphenating capitalized words in English is a rule and hyphenating them in German is an exception, and not the other way around. At least, AFAIK, in Russian there is no special case for capitalized words regarding hyphenation (only abbreviations are not hyphenated). Maybe a bit more statistics is needed?
I don't believe there are (in general) firm rules about this in either direction; it's a judgement call, and may depend on the specific content and the context in which it's being presented, as well as the individual preferences of the author/typographer.
As such, I think the best we can do in CSS is to offer some guidance as to good default behaviors for browsers -- and further information regarding typical usage in various languages may be helpful -- together with adequate controls so that authors can achieve the results they want.
Hello, I’m the person filing these bugs. I appreciate the discussion. For the record, here's the bug I sent to Blink, too: https://bugs.chromium.org/p/chromium/issues/detail?id=963039&can=2&q=hyphen%20proper%20nouns
I do recognize that it will be difficult to find the perfect solution that works for everyone, but I think there can be more sensible defaults. People don’t like it when their name gets broken at the end of a line. Companies don’t like it when their own materials add hyphens into the middle of their brand names.
Hyphenation should be a progressive enhancement. Over the last 10+ years, I haven’t been able to use it in a professional setting, because I’m always asked to turn it off the instant someone sees their brand name or their own name broken across a line. That’s not an enhancement. I understand that we can turn it off on a case-by-case basis with
I wonder, too, if we could add a new value to the
Note that there are already multiple properties proposed for controlling hyphenation in CSS Text 4, and other open issues suggesting more control. So adding a single new keyword likely wouldn't be enough.
I'm happy to add a note to CSS Text saying that UAs might want to use heuristics suppress hyphenation in proper nouns, but I don't think we should define those heuristics in the spec.
("Capitalized words except in German" might want to be "Capitalized words except in German and except after periods", or in a CSS-to-PDF renderer used in publication workflows, even "Capitalized words except in German and except after periods unless we saw it capitalized not after a period." I don't think we'll come up with the ideal heuristics here.)
The CSS Working Group just discussed
The full IRC log of that discussion<Rossen_> Topic: hyphens:auto should not hyphenate Capitalized words
<Rossen_> github: https://github.com//issues/3927
<una> florian: so the issue being raised is that in some langs, when words are capitalized you should hyphenate and in some they should not
<una> ... we should bake this into the spec
<una> ... i'd like to close this as wontfix or rejected bc we already say this is dict based within the logic of the lang-based resource
<una> fantasai: I would go a little farther and say that we should only put a note and not change normative requirements and talk about proper nouns
<una> ... it can suggest i.e. in English you may want to supress hyphenation words that are proper nouns and mixed case
<una> ... I would like to leave the heuristics up to the user agent and not bake anything into the spec
<Rossen_> ack dauwhe
<una> dave: in english should capital letters be hyphenated? maybe... I wouldn't want anythign baked into the spec that says what should happen
<una> AmeliaBR: the rec is more to add a suggested note to add in your hyphenation dictionaries you should consider this
<una> ... at least one browser has agreed
<una> ... not sure this is a normative requirement
<una> Rossen_: so proposed resolution for this is to add a note and no normative change
<una> RESOLVED: Add A note to the spec and close with no normative change
<una> florian: myles, a while back you raised 3566 - should we reopen?
referenced this issue
Jun 5, 2019
referenced this issue
Jun 5, 2019
That hyphenation somehow implies that Germans pronounce the word eye-too-ness instead of eye-toons or eye-tjoons. It seems "not ideal" for reasons other than capitalization; just as loan words generally aren't regular.