Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text-decor] Clarifying skip-ink:auto behavior in relation to CJK text #4276

Closed
jfkthame opened this issue Sep 4, 2019 · 21 comments
Closed
Labels
Closed Accepted by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-decor-4 i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@jfkthame
Copy link
Contributor

jfkthame commented Sep 4, 2019

See Issue 8 in CSS Text Decoration Module Level 4.

Currently, it appears that Webkit and Blink browsers forcibly disable ink-skipping behavior (as if text-decoration-skip-ink:none were in effect) for a hardcoded list of “CJK” characters.

However, they disagree on the exact set of characters for which skipping is disabled. In particular, Blink seems to add a large but (apparently) rather ad hoc collection of punctuation and other symbols that are not especially associated with CJK text. Hence, these characters don’t get ink-skipping behavior in Chrome even though they may equally well be used in the context of Latin or other scripts, not just in CJK contexts. On the other hand, in Webkit they do get skipped, whether used in Latin or CJK contexts. (Compare testcase: https://jsfiddle.net/rhLjauq4/ in Chrome vs Safari.)

In Gecko’s implementation (not yet enabled in release builds, but available for testing in Firefox nightlies), we’ve taken a slightly different approach. Rather than a fixed set of characters for which ink-skipping is disabled, which is problematic precisely because of the large number of “common” characters — mainly punctuation and symbols — that are used in both CJK and non-CJK contexts, the decision whether to disable skip-ink is taken on a per-script-run basis. CJK ideographs are of course recognised as belonging to a CJK script, but in addition, “common” characters will be merged into the same script run when used in a CJK context, and so the same non-skipping behavior will be applied to them in CJK context only.

I think it would be helpful for authors if there were agreement as to which characters are or aren’t eligible for skip-ink behavior when auto is in effect. Given the substantial number of ambiguous “common” characters, a simple partitioning of individual characters (as currently implemented in Webkit and Blink, AFAICT) is not a particularly good solution. I expect layout engines already do some kind of script-run analysis in order to handle font selection and shaping appropriately, and therefore using script runs as the basis for deciding when to disable ink skipping should not be overly burdensome, and I believe results in more useful behavior. Would the WG and other browser developers be prepared to converge on this approach, and include it in the text-decoration spec?

@myakura
Copy link
Contributor

myakura commented Sep 5, 2019

perhaps related: #707

@fantasai
Copy link
Collaborator

@jfkthame Define “in CJK context”?

@jfkthame
Copy link
Contributor Author

I'm assuming browsers already do script run analysis as part of the rendering process. A Script=Common character, then, is regarded as occurring in CJK context (and therefore skip-ink should not apply to it) if script run analysis assigned it to a run of CJK text.

I'm not sure an exactly-specified algorithm for script run resolution is necessary here (though see http://www.unicode.org/reports/tr24/tr24-29.html#Implementation for some guidance); the details could be left as a quality of implementation issue, at least initially.

@litherum
Copy link
Contributor

WebKit uses the Unicode block to detect this.

@litherum
Copy link
Contributor

Wasn’t there some discussion about this being done by a user-agent style sheet by selecting elements with the lang attribute set to CJK languages?

@kojiishi
Copy link
Contributor

Blink uses heuristic, and it is different from the script-run analysis we use for shaping.

@kojiishi
Copy link
Contributor

kojiishi commented Oct 11, 2019

We also include some non-CJK characters because we've got rather strong feedback saying they look poor and weird in Blink, although they look ok in WebKit. This depends on font rasterizer, how we render underlines, how we skip ink, and a few more. I prefer to keep auto as a value for such purposes. We may want to adjust them as our underlying technology changes. I support adding all for interoperability as you suggested in #4277.

@jfkthame
Copy link
Contributor Author

Wasn’t there some discussion about this being done by a user-agent style sheet by selecting elements with the lang attribute set to CJK languages?

Yes - I would have liked to try this in Firefox, but it seems there's too much CJK content that appears on pages without being lang-tagged. E.g. if I search via an English-language search engine for a Chinese phrase, I get lots of result links that are in Chinese, but are not tagged as lang=zh. If skipping is enabled based on the fact that they're on an en-GB page, they look terrible.

@jfkthame
Copy link
Contributor Author

We also include some non-CJK characters because we've got rather strong feedback saying they look poor and weird in Blink

OK, so I assume this is why characters like dagger† and double-dagger‡ are included in the set of non-inkskippable chars in Blink. But I wonder - was the feedback that they "looked poor" referring to them in a Latin-script context, or was it in relation to use of these characters in a CJK context? ISTM that failing to ink-skip them within English text, where nearby descenders are being skipped, looks odd.

@himorin himorin added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Oct 11, 2019
@kojiishi
Copy link
Contributor

kojiishi commented Oct 12, 2019

was the feedback that they "looked poor" referring to them in a Latin-script context, or was it in relation to use of these characters in a CJK context?

There are two cases; one is that they are unified code points, and we don't implement smarter IsCJK function for those code points. Dagger and double-dagger are in this category. We hope to improve this in future.

The other case is some characters in Latin-script context. IIRC Consolas has rather tall slashes (solidius), and URLs using Consolas looked poor (jsbin), and URLs with underlines were too common to ignore. Maybe there are a few more fonts, and "//" skipping ink for proportional fonts looked even weird. Blink's internal function CanTextDecorationSkipInk() implements this.

We checked WebKit behavior. On Mac/iOS, monospace fonts have shorter glyphs, and WebKit has less gaps than Blink does. Blink rounds in the direction to widen the gap, and rounding is done by CSS pixels, not by device pixels, this may make it worse. We hope to improve this too, so I feel better if we don't define them in the spec.

Note, checked the above test on current Gecko build too, it looks like Gecko has more gaps to underlines that they don't interfere.

@kojiishi
Copy link
Contributor

FYI, sharing a few feedback we've got at blink-dev when we shipped, in case this explains better: 1, 2

@r12a r12a added i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement labels Oct 24, 2019
@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [css-text-decor] Clarifying skip-ink:auto behavior in relation to CJK text, and agreed to the following:

  • RESOLVED: fully specify an algorithm that specifies ink skipping that references other specifications that isn't codepoint-by-codepoint
The full IRC log of that discussion <Rossen__> Topic: [css-text-decor] Clarifying skip-ink:auto behavior in relation to CJK text
<astearns> github: https://github.com//issues/4276
<faceless> jkewL the issue is that text-decoration-skip-ink, browser have chosen generally not to apply this to CJK text because in practice it clases with most of the glyphs and looks terrible.
<faceless> s/jkewL/jkew
<faceless> jkew: what troubles me is that the webkit/chrome have chosen to skip this for a particular set of glyphs, but there's a disconnect as to which glyphs are skipped. In particular Blink has chosen to skip a number of punctuation chararacters
<faceless> jkew: I was hoping to the spec could pin this down to work on a sequence of script characters, so that punctuation surrounded by CJK is CJK.
<faceless> jkew: I'd like to settle on what we do in Firefox, which is better. At the moment the spec doesn't define it
<Rossen__> q?
<faceless> myles: consistancy is good but what the motivation? bug reports?
<faceless> jkew: I'm sure we did have reports
<faceless> myles: when you started implementing? Or was it issues around the specific characters?
<faceless> jkew: initially we simply implemented and found the same issues in CJK as everyone else
<koji> q+
<faceless> myles: in the absence of specific bug reports and users are not complaining, maybe we should leave it as it is?
<florian> q+
<faceless> jensimmons: can we perhaps specify it and see what comes form that?
<Rossen__> ack koji
<faceless> koji: i"m generally with myles on this. we had reports that our slashes looked quite bad. when looking at gecko they don't look bad
<jensimmons> jensimmons: is the desire of one browser to not put in the effort a reason to not spec interop. If interop on this is ideal, we can spec it and then each browser can make decisions about prioritization. (is the point I was making)
<faceless> kohi: so I believe we shuold add slashes to the list. So this is a heuristic. It's not testable. But I understand that if gecko gets reports that says the inconsistancy is troubling then this is an issue
<Rossen__> ack florian
<skk> s/kohi/koji
<faceless> florian: the spec is very vague, it says you can skip but not why. Even if we don't go all the way to defining a list, we may want to clarify the intent of this. That will not help with the immediate concern about interop, but it will help for anyone trying to understand or implement this
<faceless> myles: I can add some text about that
<Rossen__> ack dbaron
<faceless> dbaron: I think the situation today is if we don't define things, everyone will just copy what chrome does. So if what Chrome does is right, lets put that in the spec as we're going to copy it anyway. If not, put in the spec what is right.
<tantek> I feel like that needs to be repeated at the start of every CSSWG meeting
<faceless> myles: is not keen on that idea
<Rossen__> q?
<faceless> tab: if we do whatever chrome does, it should be an choice made because chroms is doing the right thing. I want' something written down because it will be a compat issue
<faceless> myles: if no-one has bug reports, it's not a compat issue yet. maybe we wait until the first report
<faceless> tab: we have enough issues to know that's not the best aproach
<faceless> s/aproach/approach
<Rossen__> q?
<Rossen__> ack dbaron
<faceless> dbaron: we've found that compat constraints get stricter over time. The longer things are out on the web, they require interop and expect it to get better over time. So if we find things that aren't we should fix that early
<faceless> dbaron: with the lack of bug reports, we have a cultural bias - filing them requires that you speak english and this is not the sort of bug report that english speakers will file
<tantek> ^^^ great FAQ answer for "Did you get a bug report?"
<koji> q+
<faceless> myles: I'm not going to push back on this. I would prefer that the approach taken is that text describing this is a reference to another spec, not a list of characters.
<faceless> koji: I'm fine to have some text added that allows the UA to have some heuristics. Our bug report was opposite. We had strong opinions. people said "don't just disable skipping because slashes look bad"
<faceless> myles: how would you formulate that in a spec? a list that need to be skipped and the rest are undefined? something else?
<faceless> koji: not strong on specifics, but if we got reports on a specific code point we could add that, but leave others undefined.
<faceless> rossen: who's going to write this up?
<faceless> myles: I volunteer jkew
<faceless> rossen: next action, jkew to modify the spec which - as myles suggests, references unicode - with a suggested approach that allows flexibility:
<faceless> ACTION: add specifics into ink-skipping details TBD. And that it's done by reference.
<faceless> ACTION: fully specify an algorithm that specifies ink skipping that references other specifications that isn't codepoint-by-codepoint
<Rossen__> RESOLVED: fully specify an algorithm that specifies ink skipping that references other specifications that isn't codepoint-by-codepoint
<faceless> fantasai: who's doing this?
<faceless> rossen: jkew

@macnmm
Copy link

macnmm commented Jan 29, 2020

When I read "in CJK context" I assume this brings with it a number of CJK typographic conventions like where underline normally is placed relative to the CJK embox, and thus whether the skip-ink feature would be usable as in a typical Latin-based descender-skipping context. Lined emphasis in CJK did not typically skip (or draw behind) the glyphs, but it does get positioned such that collisions are more rare, making it quite different from typical Latin underline. I would think whether or not the underline is contiguous is also a factor in determining behavior (and CJK context). Thoughts?

@jfkthame
Copy link
Contributor Author

jfkthame commented Feb 3, 2020

When I read "in CJK context" I assume this brings with it a number of CJK typographic conventions like where underline normally is placed relative to the CJK embox, and thus whether the skip-ink feature would be usable as in a typical Latin-based descender-skipping context.

I think this is what text-underline-position: under is intended to achieve, and if this were applied, the problem of skip-ink behavior producing poor results for CJK text would largely be mitigated. But not all browsers support text-underline-position: under, and even when they do, there'll be lots of existing content that doesn't apply it.

(Fonts designed primarily for CJK use could also set their underlineOffset parameter so as to place the underline lower than fonts designed for Latin; but even if they did this -- which many don't -- not all browsers respect the font's setting anyway.)

I suppose one option would be to have browsers automatically switch underline positions between a default that's suitable for Latin text and a position for CJK text on a character-by-character basis, but I expect this could give very messy results for mixed-script content. Turning ink-skipping on/off based on the script of the text seems like a better mitigation, although the "right" solution is for authors to use the tools -- such as text-underline-position and text-underline-offset -- that allow them to place the underline more appropriately.

@jfkthame
Copy link
Contributor Author

jfkthame commented Feb 4, 2020

Given that there is not a single canonical algorithm for determining the boundaries of script runs in arbitrary text, and given that some implementors want to retain the freedom to make adjustments based on user feedback, I don’t think we can or should currently specify precise, normative requirements for how text-decoration-skip-ink behaves down to the level of every individual character.

However, the spec could usefully include a (non-normative) note to offer guidance to implementors and pointers to the Unicode specifications that should provide the basis for behavior here.

I've opened #4737 with some proposed draft text for consideration; I hope this will be a useful starting point.

@fantasai
Copy link
Collaborator

Agenda+ to review @jfkthame’s proposed changes in https://github.com/w3c/csswg-drafts/pull/4737/files

@litherum
Copy link
Contributor

I thought the WG agreed to gate this skipping behavior on the lang= attribute, rather than Unicode properties. Am I misremembering?

@faceless2
Copy link

The proposal from @jfkthame is fairly close to the same process you have to go through for OpenType layout - each character has to be assigned to a script, with common, inherited or unknown characters adopting the script of their neighbouring characters. These are OpenType Script codes rather than Unicode Script codes, but for the scripts discussed in the proposal there's no ambiguity mapping between them.

I don't think the exact algorithm is specified - I think lots of people use Harfbuzz these days, perhaps someone that knows it can comment better (ref)

I can't comment on whether the proposed algorithm is the right one. But if it is, and if you're doing OpenType layout and can retrieve the script property it assigns to each run of text, then reusing it for ink skipping seems to be a good idea.

@jfkthame
Copy link
Contributor Author

I thought the WG agreed to gate this skipping behavior on the lang= attribute, rather than Unicode properties. Am I misremembering?

I think you're misremembering (at least according to my memory, which of course is fallible!)

The trouble with gating on lang= is that it's so often not appropriately set. This is particularly true for dynamic pages (e.g. search results) and pages hosting user-generated content, where the author cannot predict what languages will be present, but there are also cases of primarily-CJK pages tagged with lang=en, or no lang attribute at all, where using the attribute to control skipping behavior will give inferior results.

@litherum
Copy link
Contributor

litherum commented Apr 22, 2020

Ah, yes, the resolution in #4276 (comment) describes that it will be done via "other specifications", which seems to preclude using lang=. WebKit uses Unicode today, rather than lang=. It sounds like Chrome, WebKit, and Firefox are all in agreement in this direction. It appears I was misremembering. Sorry for the noise.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed Clarifying skip-ink:auto behavior in relation to CJK text, and agreed to the following:

  • RESOLVED: Accept PR #4737, close issue #4276
The full IRC log of that discussion <astearns> topic: Clarifying skip-ink:auto behavior in relation to CJK text
<astearns> github: https://github.com//issues/4276
<fantasai> https://github.com//pull/4737/files
<TabAtkins> fantasai: There's a PR from jfkthame about this issue. Looks correct to me, wanted to check with the WG
<fantasai> https://github.com//issues/4276
<TabAtkins> astearns: I see myles had a question that was answered in the issue
<TabAtkins> myles: It looks like an issue to me, not a PR
<fantasai> https://github.com//pull/4737/files
<TabAtkins> AmeliaBR: The last part of th eedit is an in-spec issue asking for other non-CJK scripts which want this behavior.
<TabAtkins> AmeliaBR: Is that something to discuss now, or leave until later?
<TabAtkins> fantasai: Later. That's a question for ipmls and i18n
<TabAtkins> astearns: So proposed reoslution is to accept the PR and close this issue. Objections?
<TabAtkins> RESOLVED: Accept PR #4737, close issue #4276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-decor-4 i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests