Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-inline-3] top metrics for non-Western non-CJK writing systems with obvious top edge #5244

Open
fantasai opened this issue Jun 19, 2020 · 4 comments
Labels
css-inline-3 Current Work i18n-hlreq Hebrew language enablement i18n-sealreq Southeast Asian language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: text edge control

Comments

@fantasai
Copy link
Collaborator

fantasai commented Jun 19, 2020

[This issue has been annotated in the spec for awhile, but doesn't seem to have a corresponding GH issue, so filing one here.]

Both Thai and Hebrew are writing systems with strong top edges (similar to Latin/CJK). But while OpenType defines multiple top edge metrics (cap-height, x-height, ideographic, and hanging), none of these necessarily coincide with the Hebrew or Thai top metrics, which in a given font will often fall somewhere between the x-height and the cap-height, but not consistently the same place across fonts.

If initial-letter-align and text-edge are to treat all writing systems as equal citizens of the Web, we need metrics for them in OpenType, and we need values for them in CSS that will select those metrics.

Note: See also CSSWG OpenType liaison statement.

@faceless2
Copy link

faceless2 commented Jul 28, 2020

I don't think we can realistically expect the baseline table to provide this information, now or in the immediate future. Even if baselines for the world's scripts were added in the next OpenType revision, every font would need updating before they could be used. I also don't think we should be itemising every one of these baselines as a list of idents that can be set in initial-letter-align.

Pulling the metrics from the ink bounds of a representative glyph for the script seems like the best option - this is already proposed for Hebrew. I did some testing - first, here's the results of using "cap-height" and the alphabetic baseline for the 16 hebrew fonts at fonts.google.com, plus Noto Sans and Noto Sans Serif. This is what we'll get if we get rid of the "hebrew" keyword and fall back to the default:

image

Awful. Lots of glyphs have big gaps at the top, which (in our implementation) currently causes the first line to run flush to the margin. Next, the top alignment point is taken from the horizontal center of the ink-outline of U+05BE (Hebrew Maqaf) as suggested in the spec:

image

Better, but not so great. But using the ink-top in the horizontal center of U+05D4 (Hebrew He) works really well:

image

So I think in general the idea of pulling alignment points from glyph outlines is a good one. We're already making use of glyph outlines in initial-letter anyway. So long as each script uses the same mechanism (i.e. choosing the point in the horizontal (or vertical) center at the appropriate edge of the glyph outline), then adding new scripts is no more than determining which glyph is representative. After the initial implementation, that should be fairly low cost both for developers, and for anyone wanting to propose a new script.

To that end I'd suggest we consider something like initial-letter-align: auto, which would determine the Unicode script from the first non-common character of the text following the initial letter, exactly as we're doing now for the script inside the initial-letter. We can then simply (and briefly) list the alignment points for each script, e.g.

  • Latn: over=cap-height, under=alphabetic baseline.
  • Hebr: over=U+05D4, under=alphabetic baseline.
  • Deva: over=BASE.hang or U+915 if not defined, under=alphabetic baseline.
  • Beng: over=BASE.hang or U+995 if not defined, under=alphabetic baseline.
  • Hans, Hant: over=BASE.icft or U+6C38 if not defined, under=BASE.icfb or U+6C38 if not defined.

If further control over which baseline to select is required (and, the more I think about this, the more I doubt it is) then perhaps something like initial-letter-align: [alphabetic | hanging | ideographic] || [<string> <string>?] - to let you select a baseline pair as we do now, and/or specify a glyph (or under and over glyphs) directly in case the baseline isn't available.

@litherum

This comment has been minimized.

@fantasai
Copy link
Collaborator Author

fantasai commented Aug 13, 2020

Please, let's not mix up this issue, which is about finding metrics for a given script, with the issue of whether the question of “which script” should be automatically determined. There is enough complexity in just this one issue.

@fantasai
Copy link
Collaborator Author

So I think in general the idea of pulling alignment points from glyph outlines is a good one.

Using glyph outlines is an acceptable heuristic for simply-styled fonts, and if UAs want to implement that I would be thrilled. But it is not as good as if the font designer sets the metric themselves. The font designer can account for the effects of flourishes, stroke variations, and other artistic effects correctly. We can only guess that the middle of the glyph is the least likely to be affected by such things, and try to pick a character that has a wide target to measure. So while measuring the glyph is a great tactic for handling fonts and font formats that don't have relevant metrics, that doesn't mean the need for metrics goes away.

As for maintaining a database of ideal glyphs to measure for these things... that should definitely not be the job of the CSS specs. We could make a jointly-maintained registry with i18n for the time being. But ideally I think Unicode and OpenType should be collaborating on this. There should be optional metrics in OpenType to provide this info; there should be defined fallback heuristics based on glyph outlines for when the font is missing those metrics so that implementers have a reference for all the scripts they are unfamiliar with; and the CSSWG should not be the ones maintaining this heuristics registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-inline-3 Current Work i18n-hlreq Hebrew language enablement i18n-sealreq Southeast Asian language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: text edge control
Projects
None yet
Development

No branches or pull requests

5 participants