Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-fonts] Propose adding lang as a font-face descriptor #1744

Closed
kahsieh opened this issue Aug 19, 2017 · 20 comments
Closed

[css-fonts] Propose adding lang as a font-face descriptor #1744

kahsieh opened this issue Aug 19, 2017 · 20 comments
Labels
Closed Rejected as Wontfix by CSSWG Resolution Commenter Timed Out (Assumed Satisfied) css-fonts-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@kahsieh
Copy link

kahsieh commented Aug 19, 2017

See https://drafts.csswg.org/css-fonts-3/#at-font-face-rule

Per the discussion in #1736, we note that generic fonts typically map to more than one system font with lang affecting the resultant font. Adding lang as a @font-face descriptor might be useful for similar reasons, for example with CJK:

  • CJK have partially intersecting character sets, so unicode-range alone cannot specify fonts for each language. This can be an issue on pages containing text in more than one lang.
  • Even if a single font (or @font-face) contains faces for all the characters in some text, that font may not be for the correct lang everywhere. (A single codepoint can map to different variant glyphs depending on lang, and this is reflected in fonts.)

This addition would allow fonts to resolve lang-appropriately when a font family specified with @font-face is applied to a page with multiple langs.

@litherum
Copy link
Contributor

Exposing an implementation detail of existing font selection facilities is not a sufficient use case to justify a new web-facing feature.

@AmeliaBR
Copy link
Contributor

Here's a specific use case:

The Noto collection of fonts includes sans & serif fonts for most modern languages. For CJK, that means multiple fonts for the same code points, optimized for each language. They offer both subsetted font files (with just the glyphs for one language) and full CJK fonts, with the specified language as the default and the others available through OpenType features. Of course, the full fonts are much larger files.

For a website that is primarily in one language, but wants to have compatible glyphs for occasional content in another language, it would be preferable to be able to use the subsetted font files, only downloading the additional language when it is required:

@font-face {
font-family: Noto Sans;
src: url(.../noto-sans-jp.woff2);
lang: jp;
}
@font-face {
font-family: Noto Sans;
src: url(.../noto-sans-kr.woff2);
lang: kr;
}
@font-face {
src: url(.../noto-sans.woff2);
unicode-range: /* whatever would be the ascii/latin range */
}

The last rule shows how we can currently use unicode-range to do language fallbacks where the scripts have different unicode code points. But we can't do it for language-based switching of glyphs for a single codepoint. That is only available through OpenType features in a single font file -- or through the browser font selection algorithm for generic font families.

@kahsieh
Copy link
Author

kahsieh commented Aug 20, 2017

@AmeliaBR Thank you for the specific use case—it does seem to me that the CJK case is limited by the standard, rather than by existing implementations. Due to Han unification, rendering a character completely correctly requires both the codepoint and the lang.

I'll add the example from my post in #1736 of lang causing a glyph change on the same codepoint given font-family: sans-serif:
image
left: zh, right: ja

@tabatkins
Copy link
Member

:shakes fist at Han unification again:

@litherum Yeah, while this is definitely part of the built-ins, it's relevant for any page that wants to do high-quality rendering of both Chinese and Japanese text with a single combined font-family.

@fantasai fantasai added the css-fonts-4 Current Work label Sep 5, 2017
@faceless2
Copy link

I also think this is a good idea. Particularly if we define the "lang" property as a list of languages, and define the highest priority @font-face match as (assuming all other @font-face matching criteria are equal) the one with the longest common prefix. For example:

@font-face {
  font-family: thefont;
  src: url(thefont-zh-simp.otf);
  lang: zh
}
@font-face {
  font-family: thefont;
  src: url(thefont-zh-trad.otf);
  lang: zh-HK zh-TW
}
  • Language of "zh-TW" will match both rules, but the second has the higher priority.
  • Language of "zh-SG" will match the first rule due to the common prefix "zh"

@svgeesus
Copy link
Contributor

svgeesus commented Feb 2, 2018

Due to Han unification, rendering a character completely correctly requires both the codepoint and the lang

Han unification is a clear use case for language-specific font rendering, but not the only one. In Cyrillic, for example, there are differences between Russian, Serbian and Bulgarian rendering of the same Unicode code point. (A good explanation).

As to whether we need a new descriptor though, things are less clear. OpenType already has a language feature, and Fonts 3 Language-specific display already states:

If the content language of the element is known according to the rules of the document language, user agents are required to infer the OpenType language system from the content language and use that when selecting and positioning glyphs using an OpenType font.

If that is not enough, there is also the font-language-overide property.

I see the motivation for the proposed descriptor, but also see it re-inventing the OpenType language system with a combination of a new descriptor and a link to a (subsetted, single-language) font.

This seems at odds with the direction the font industry seems to be headed, with multi-language fonts which can have quite careful and nuanced typography for different languages, depending of course on the effort and interest of the font designer.

The matching scheme mentioned by @faceless2 looks the same as BCP 47 language tagging. On the other hand, OpenType language tagging uses a single level of 4-byte codes. So for example Traditional Chinese is ZHT .

@faceless2
Copy link

Yes, assuming a suitably enabled font you can do all of this with font-language-override, or Unicode variation selectors. But I believe the intention of the original poster was to prevent the font being loaded at all if it doesn't apply. Not so much an issue with cyrillic/arabic/urdu, more useful with the bigger Chinese/Japanese fonts.

@patrickdark
Copy link
Contributor

I think it should be more clear what problem needs solving:

Given a traditional Chinese document with a run of Japanese text, a property declaration such as font-family: "Traditional Chinese Font", "Simplified Chinese Font", "Japanese Font"; should just work with glyphs from the appropriate script displayed, including characters with one code point but multiple, language-specific glyphs.

Instead of using a language descriptor, a better approach may be to use a language exclusion descriptor. So, for the aforementioned "Traditional Chinese Font", one might declare something along the lines of exclude-lang: "*-hans", "*-jpan" in its @font-face rule and this would prevent it from being used to render text marked as ja-jpan, zh-hans, zho-hans, etc.

This approach also avoids having to make a decision about how to label a font that contains characters from multiple languages.

@kahsieh
Copy link
Author

kahsieh commented Feb 7, 2018

@svgeesus As mentioned by @faceless2, the intention is to skip loading the font if it doesn't apply. In @AmeliaBR's example with Noto fonts, subsetted, single-language fonts are already offered for use on the web because the full multi-language fonts are large. This somewhat accounts for the use of the BCP 47 language tags rather than OpenType language tags: the lang descriptor would be matched with HTML language tag(s) and not font file language tag(s).

@patrickdark I think it's okay if a font contains characters from multiple languages. The author need only specify lang in @font-face for languages that should be rendered in specific fonts, and then there can be fallback behavior for other languages. It seems to me that using an exclude-lang descriptor could get unwieldy. For example, a site about the use of Classical Chinese might use various combinations of the zh-Hans, zh-Hant, ja-JP, ko-KR, and vi-VN language tags on its pages.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Mar 13, 2018
@svgeesus
Copy link
Contributor

This looks like a good topic to f2f discussion at TPAC, perhaps together with I18n and also with Fonts WG

@faceless2
Copy link

For what it's worth, we've implemented this locally (with a prefix, naturally) and it's working well.

Referencing the steps for font matching outlined at
https://drafts.csswg.org/css-fonts/#font-style-matching, we have effectively added a step before 5a which is "sort the remaining fonts first by how well they match the element's language, based on BCP47, and secondly on the order they're defined in CSS".

So for example: you have three matching fonts with languages of "zh-hant", "zh" and no language. If the font is requested by an element with a language of "zh-hans", it will match "zh". If the element language is "en", all of the fonts have equal priority based on the language match so are prioritised by the order they're defined in CSS, exactly as they are in the current spec.

This approach is backwards compatible: with no languages on the @font-face rule, the matching algorithm runs as it is now. And, if a language is defined, it won't prevent it from matching an element with a different language: it will only change the priority they're matched on.

I have one more argument in favour of this. In the PDF world where we're coming from, every font has to be embedded. So as well as time taken to download the font and better language-specific shapes for the glyphs, we need this to reduce the size of the final document. For example:

@font-face {
  font-family: Noto;
  src: url(NotoSerifCJKzh-Regular.otf);
  lang: "zh"
}
@font-face {
  font-family: Noto;
  src: url(NotoSerifCJKjp-Regular.otf);
  lang: "ja"
}

<p lang="ja">日本の字</p>

Without language matching (and presuming Noto Chinese has no hiragana glyphs for the sake of example), we'd embed two fonts: the first two glyphs would match the Chinese fonts as it has a higher priority, with the third matching the Japanese font.

@asmusf
Copy link

asmusf commented Sep 29, 2018

Just to put that one on the record: language tuning may be appropriate for Latin-script fonts as well. It appears that Polish and French use some of the same accents but with different angles, to give just one example that was widely discussed at one point in Unicode. The standard ended up "unifying" the two language versions of these characters, meaning that one either has to use some compromise font or be able to select a font variant by language.
There are probably other examples.

@drott
Copy link
Collaborator

drott commented Oct 11, 2018

I think if the different glyph sets and language specific features are delivered in one font file, then lang attributes plus mapping to the OpenType mechanism is sufficient.

If the font family is split into language specific fonts for bandwidth savings and language specific subsetting, then I believe this can already be implemented using :lang() pseudo selectors and CSS custom properties, while still keeping the feel of one font family and avoiding unnecessary downloads. I made an example in this codepen.

If one of the divs is removed, the other font does not download.

<div lang="a">a (shows in cursive)</div>
<div lang="b">b (shows in sans-serif)</div>
@font-face {
  font-family: Pacifico;
  src: url(//fonts.gstatic.com/s/pacifico/v12/FwZY7-Qmy14u9lezJ-6H6MmBp0u-.woff2)
}

@font-face {
  font-family: Roboto;
  src: url(//fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxKKTU1Kg.woff2)
}

:root :lang(a) {
  --site-font-family: Pacifico;
}

:root :lang(b) {
  --site-font-family: Roboto;
}

* {
  font-family: var(--site-font-family);
  font-size: 50px;
}

Edit: Working in Chrome 70 and Safari TP, and FF 62.

@faceless2
Copy link

Your suggestion requires using a different font family per language - this proposal was about being able to do this within a single family.

The original author was referring to mappings for the generic fonts (serif, sans-serif) which makes that a requirement. For non-generic fonts, having to artificially split Noto into "Noto-Japanese", "Noto-Chinese" etc. is (I believe) the problem this is trying to solve.

@drott
Copy link
Collaborator

drott commented Oct 11, 2018

The example works just as well for web font files that have the same family name in their name table, if such fonts are placed in the src: url(), e.g. it would work for language specific versions of Noto CJK. In the CSS you can use the custom property to refer to the font by the same, aliased, family name.

For generic fonts, the user agent usually provides customisations to allow script/language specific generic font preferences.

@bobbytung
Copy link

For IRG N 2074 “a proposal for a HK character set”. I recommend when mention to Chinese, use lang-script-region will be better for indicator right character set as zh-Hant-HK, zh-Hant-TW.

http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg45/IRG45.htm

Also consider for support BCP 47 math language tag “Zmth”, I just dealt STIX font for mathematics...

https://blogs.msdn.microsoft.com/murrays/2015/02/14/math-language-tag/

@litherum
Copy link
Contributor

Yes, @drott's solution using :lang(a) is compelling.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Adding lang as a font-face descriptor.

The full IRC log of that discussion <emilio> Topic: Adding lang as a font-face descriptor
<emilio> Github: https://github.com//issues/1744
<emilio> myles_: This issue is about the desire of making a font only used by elements that are on a particular language
<emilio> myles_: So that each element with a particular language gets the right font attached to it
<emilio> myles_: My particular feeling is that this is the latest step in a long sequence of changes to add more styling capabilities to elements
<emilio> myles_: selectors already do that, in font-face we already have a bunch of other descriptor, and this moves into the model of adding more styling to font-face
<emilio> myles_: I don't want to implement all of CSS in @font-face, there are examples on the issue on how to do it using style rules
<addison> q+
<astearns> ack addison
<gregwhitworth> +1 to myles_
<emilio> addison: If you were to put this in there's a bunch of interesting problems you need to specify, like how the lang tags match and stuff
<emilio> addison: ???
<emilio> addison: If you name a language for a font-face, you don't render it anywhere else? How do you enforce that?
<emilio> emilio: Isn't there a `:lang` selector you could use for that?
<emilio> drott: that's the example of the issue, yeah
<myles_> s/add more styling capabilities to elements/add more styling capabilities to @font-face/
<emilio> drott: I share myles_' concerns and conflicting semantics between `:lang` an this
<emilio> drott: I think the intent is using the font in a more finegrained way than unicode-range and such, and I think the example also covers this, you can use css variables to use it as the same font, but not sure if I missed any part of the use case
<emilio> addison: I think the idea is to affect how font-fallback works.
<emilio> addison: that's pretty common for CJK
<emilio> myles_: we already have facilities to do this with CSS
<emilio> astearns: looks like preference is to use existing mechanisms for that, and not use descriptors, we should put the minutes in the issue and let people comment
<emilio> florian: does a note / suggestion in the spec about how to solve this problem seem useful?
<emilio> astearns: we should probably wait for more info from the proposers here
<r12a> q+
<astearns> ack r12a
<emilio> r12a: While seeking further info, one thing to ask is how would you treat with multiple languages as a single language
<emilio> astearns: can that be handled by selectors?
<emilio> nods

@fantasai
Copy link
Collaborator

fantasai commented Dec 1, 2018

For cases where language variants are within a font file, CSS already requires looking up the correct variant using the element's content language. In cases where the font should switch, the :lang() selector can do this already:

[lang]:lang(a) { font-family: Font For Language A, fallback for A, etc; }
[lang]:lang(b) { font-family: Font For Language B, fallback for B, etc; }

The CSSWG believes this should be an adequate solution, so the suggestion is to close this issue as wontfix. Let us know if there's anything we missed and should reconsider.

@kahsieh
Copy link
Author

kahsieh commented Dec 1, 2018

I agree that the :lang() selector solution is compelling, but I see a couple potential issues with it. The most common use case is embedding some text from another language within a page that's mostly in one language. But in this use case, if we don't use the star selector, then font-family is computed on the element to which it's applied, so the following slight modification of @drott's example doesn't have the expected behavior in Firefox 63 or Chrome 70 (Codepen link):

<div lang="a" class="myclass">
  a (shows in cursive)
  <div lang="b">
    b (shows in sans-serif)
  </div>
</div>
@font-face {
  font-family: Pacifico;
  src: url(//fonts.gstatic.com/s/pacifico/v12/FwZY7-Qmy14u9lezJ-6H6MmBp0u-.woff2)
}

@font-face {
  font-family: Roboto;
  src: url(//fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxKKTU1Kg.woff2)
}

:root :lang(a) {
  --site-font-family: Pacifico;
}

:root :lang(b) {
  --site-font-family: Roboto;
}

.myclass {
  font-family: var(--site-font-family);
  font-size: 50px;
}

Also, I think using CSS custom properties as a surrogate for font family is a little hacky, but perhaps that's fine. To include both serif and sans-serif Noto CJK fonts in a page we would have to something like this:

@font-face { font-family: NotoSansTC; src: url(NotoSansTC-Light.otf); }
@font-face { font-family: NotoSansJP; src: url(NotoSansJP-Light.otf); }
:root :lang(zh) { --NotoSans-Light-font-family: NotoSansTC; }
:root :lang(ja) { --NotoSans-Light-font-family: NotoSansJP; }

@font-face { font-family: NotoSerifTC; src: url(NotoSerifTC-Light.otf); }
@font-face { font-family: NotoSerifJP; src: url(NotoSerifJP-Light.otf); }
:root :lang(zh) { --NotoSerif-Light-font-family: NotoSerifTC; }
:root :lang(ja) { --NotoSerif-Light-font-family: NotoSerifJP; }

.my-sans-text {
  font-family: var(--NotoSans-Light-font-family);
}
.my-serif-text {
  font-family: var(--NotoSerif-Light-font-family);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Rejected as Wontfix by CSSWG Resolution Commenter Timed Out (Assumed Satisfied) css-fonts-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests