Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] For most languages, hyphens:auto should not hyphenate Capitalized words #3927

Closed
jfkthame opened this issue May 13, 2019 · 17 comments

Comments

@jfkthame
Copy link

commented May 13, 2019

When auto-hyphenation is in use, I believe that in most languages - with German being the major exception - it would be preferable for browsers not to hyphenate capitalized words, which will often be proper nouns. In many cases authors and readers will prefer that names (of people, companies, etc) not be split, and in addition hyphenation rules designed for the "normal" words of a language may fail to hyphenate many names appropriately.

(https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 was recently filed against Gecko about this issue.)

The CSS Text 3 spec explicitly does not specify exactly where hyphenation opportunities occur when hyphens:auto is used. However, I would suggest adding an informative note to the spec, suggesting that browsers may want to suppress auto-hyphenation of capitalized words except when the hyphenation language in use is German.

For CSS Text 4, perhaps a property should be introduced to allow authors to explicitly control this behavior; e.g. hyphenation-capitalized-words: auto | yes | no, where yes and no would have the obvious meaning, and auto would tell the browser to use whatever heuristics it may have, such as considering the current language.

@Crissov

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

name, .name, 
::proper-noun
  {hyphens: none;}

if there is sufficient markup or if there was such a semantic pseudo element.

@litherum

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

Is a new property really worth it? Is this something authors are asking to be able to control? Can the browser just do it right in the first place?

@jfkthame

This comment has been minimized.

Copy link
Author

commented May 14, 2019

Well, what's "the right thing" for a browser to do regarding hyphenation of capitalized words? I don't think there's a clear answer to that, although I do think browsers should try for a sensible default behavior, and in https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 we just made the suggested adjustment for Firefox.

The problem is that in some cases authors/users may prefer that proper names not be hyphenated (as requested in the Mozilla bug); we can't reliably identify proper names in general text, but we can use capitalized words as the best available proxy for this (except in German); but this has the drawback that we'll also suppress hyphenation of non-names at the beginning of sentences; in some cases, this trade-off may be too great and it'd be preferable to allow capitalized hyphenation after all. I don't think a single hard-coded behavior will ever satisfy all use cases.

(A further refinement to the heuristic -- not yet implemented -- would be to make the behavior dependent on line width, so that as line width is reduced, constraints on what may be hyphenated are relaxed.)

Note that systems such as TeX (the \uchyph parameter) and InDesign (the "Hyphenate Capitalized Words" option in paragraph formatting) do expose this question to authors, recognizing that there is not a simple "correct" behavior that the application can universally use.

Obviously, authors can override the browser's heuristics by adding markup to individual names; the question here is what kind of default behavior, and how much author control, we can/should offer for (the overwhelming majority of) text that does not have that level of detailed markup.

@SelenIT

This comment has been minimized.

Copy link
Collaborator

commented May 14, 2019

I'm not sure that not hyphenating capitalized words in English is a rule and hyphenating them in German is an exception, and not the other way around. At least, AFAIK, in Russian there is no special case for capitalized words regarding hyphenation (only abbreviations are not hyphenated). Maybe a bit more statistics is needed?

@jfkthame

This comment has been minimized.

Copy link
Author

commented May 14, 2019

I don't believe there are (in general) firm rules about this in either direction; it's a judgement call, and may depend on the specific content and the context in which it's being presented, as well as the individual preferences of the author/typographer.

As such, I think the best we can do in CSS is to offer some guidance as to good default behaviors for browsers -- and further information regarding typical usage in various languages may be helpful -- together with adequate controls so that authors can achieve the results they want.

@litherum

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

WebKit just got a bug about this too (possibly filed by the same person) https://bugs.webkit.org/show_bug.cgi?id=197889

@AmeliaBR

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

Note that systems such as TeX … and InDesign … do expose this question to authors.

This is a very good argument for adding a new property. Does anyone have other examples?

@revoltpuppy

This comment has been minimized.

Copy link

commented May 14, 2019

Hello, I’m the person filing these bugs. I appreciate the discussion. For the record, here's the bug I sent to Blink, too: https://bugs.chromium.org/p/chromium/issues/detail?id=963039&can=2&q=hyphen%20proper%20nouns

I do recognize that it will be difficult to find the perfect solution that works for everyone, but I think there can be more sensible defaults. People don’t like it when their name gets broken at the end of a line. Companies don’t like it when their own materials add hyphens into the middle of their brand names.

Hyphenation should be a progressive enhancement. Over the last 10+ years, I haven’t been able to use it in a professional setting, because I’m always asked to turn it off the instant someone sees their brand name or their own name broken across a line. That’s not an enhancement. I understand that we can turn it off on a case-by-case basis with .name or something similar, but that puts the burden on content owners to wrap every name in a span. That’s not an enhancement either.

I wonder, too, if we could add a new value to the hyphens property, all, instead of having a whole separate property. auto would be updated to hyphenate capitalized words based on language (e.g. in German, but not English) and all would hyphenate capitalized words regardless of language. Or keep auto as currently defined and add no-capitalized-words as the new value.

@AmeliaBR

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

I wonder, too, if we could add a new value to the hyphens property, all, instead of having a whole separate property.

Note that there are already multiple properties proposed for controlling hyphenation in CSS Text 4, and other open issues suggesting more control. So adding a single new keyword likely wouldn't be enough.

@fantasai

This comment has been minimized.

Copy link
Collaborator

commented May 15, 2019

I'm happy to add a note to CSS Text saying that UAs might want to use heuristics suppress hyphenation in proper nouns, but I don't think we should define those heuristics in the spec.

("Capitalized words except in German" might want to be "Capitalized words except in German and except after periods", or in a CSS-to-PDF renderer used in publication workflows, even "Capitalized words except in German and except after periods unless we saw it capitalized not after a period." I don't think we'll come up with the ideal heuristics here.)

@revoltpuppy

This comment has been minimized.

Copy link

commented May 15, 2019

The last one, “Capitalized words, except in German, and except after periods, unless we saw it capitalized not after a period,” is the best heuristic I’ve seen so far, and the fact that it’s used in publication workflows backs that up.

@Crissov

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

“Capitalized” probably meaning contains a capital letter, not begins with a capital letter to capture “iTunes” and the likes.

@jfkthame

This comment has been minimized.

Copy link
Author

commented May 16, 2019

That's a good point, although in practice I wonder how many such names are actually long enough that hyphenation rules are likely to apply to them? Current browsers don't appear to find a hyphenation opportunity in "iTunes", for example, regardless of casing.

@jfkthame

This comment has been minimized.

Copy link
Author

commented May 16, 2019

...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.

@fantasai

This comment has been minimized.

Copy link
Collaborator

commented May 16, 2019

@revoltpuppy To be clear, that was a hypothetical example. :) Not very practical for browsers, but much more practical for publication workflows.

@xfq xfq added the i18n-tracking label Jun 1, 2019

@frivoal frivoal added Agenda+ Agenda+ F2F and removed Agenda+ labels Jun 3, 2019

@css-meeting-bot

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

The CSS Working Group just discussed hyphens:auto should not hyphenate Capitalized words, and agreed to the following:

  • RESOLVED: Add A note to the spec and close with no normative change
The full IRC log of that discussion <Rossen_> Topic: hyphens:auto should not hyphenate Capitalized words
<Rossen_> github: https://github.com//issues/3927
<una> florian: so the issue being raised is that in some langs, when words are capitalized you should hyphenate and in some they should not
<una> ... we should bake this into the spec
<una> ... i'd like to close this as wontfix or rejected bc we already say this is dict based within the logic of the lang-based resource
<dauwhe> q+
<una> fantasai: I would go a little farther and say that we should only put a note and not change normative requirements and talk about proper nouns
<una> ... it can suggest i.e. in English you may want to supress hyphenation words that are proper nouns and mixed case
<una> ... I would like to leave the heuristics up to the user agent and not bake anything into the spec
<Rossen_> ack dauwhe
<una> dave: in english should capital letters be hyphenated? maybe... I wouldn't want anythign baked into the spec that says what should happen
<astearns> s/dave/dauwhe/
<una> AmeliaBR: the rec is more to add a suggested note to add in your hyphenation dictionaries you should consider this
<una> ... at least one browser has agreed
<una> ... not sure this is a normative requirement
<una> Rossen_: so proposed resolution for this is to add a note and no normative change
<una> RESOLVED: Add A note to the spec and close with no normative change
<una> florian: myles, a while back you raised 3566 - should we reopen?
@asmusf

This comment has been minimized.

Copy link

commented Jun 6, 2019

...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.

That hyphenation somehow implies that Germans pronounce the word eye-too-ness instead of eye-toons or eye-tjoons. It seems "not ideal" for reasons other than capitalization; just as loan words generally aren't regular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.