-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text-decor] Clarifying skip-ink:auto behavior in relation to CJK text #4276
Comments
perhaps related: #707 |
@jfkthame Define “in CJK context”? |
I'm assuming browsers already do script run analysis as part of the rendering process. A Script=Common character, then, is regarded as occurring in CJK context (and therefore skip-ink should not apply to it) if script run analysis assigned it to a run of CJK text. I'm not sure an exactly-specified algorithm for script run resolution is necessary here (though see http://www.unicode.org/reports/tr24/tr24-29.html#Implementation for some guidance); the details could be left as a quality of implementation issue, at least initially. |
WebKit uses the Unicode block to detect this. |
Wasn’t there some discussion about this being done by a user-agent style sheet by selecting elements with the lang attribute set to CJK languages? |
Blink uses heuristic, and it is different from the script-run analysis we use for shaping. |
We also include some non-CJK characters because we've got rather strong feedback saying they look poor and weird in Blink, although they look ok in WebKit. This depends on font rasterizer, how we render underlines, how we skip ink, and a few more. I prefer to keep |
Yes - I would have liked to try this in Firefox, but it seems there's too much CJK content that appears on pages without being lang-tagged. E.g. if I search via an English-language search engine for a Chinese phrase, I get lots of result links that are in Chinese, but are not tagged as lang=zh. If skipping is enabled based on the fact that they're on an en-GB page, they look terrible. |
OK, so I assume this is why characters like dagger† and double-dagger‡ are included in the set of non-inkskippable chars in Blink. But I wonder - was the feedback that they "looked poor" referring to them in a Latin-script context, or was it in relation to use of these characters in a CJK context? ISTM that failing to ink-skip them within English text, where nearby descenders are being skipped, looks odd. |
There are two cases; one is that they are unified code points, and we don't implement smarter IsCJK function for those code points. Dagger and double-dagger are in this category. We hope to improve this in future. The other case is some characters in Latin-script context. IIRC Consolas has rather tall slashes (solidius), and URLs using Consolas looked poor (jsbin), and URLs with underlines were too common to ignore. Maybe there are a few more fonts, and "//" skipping ink for proportional fonts looked even weird. Blink's internal function CanTextDecorationSkipInk() implements this. We checked WebKit behavior. On Mac/iOS, monospace fonts have shorter glyphs, and WebKit has less gaps than Blink does. Blink rounds in the direction to widen the gap, and rounding is done by CSS pixels, not by device pixels, this may make it worse. We hope to improve this too, so I feel better if we don't define them in the spec. Note, checked the above test on current Gecko build too, it looks like Gecko has more gaps to underlines that they don't interfere. |
The CSS Working Group just discussed
The full IRC log of that discussion<Rossen__> Topic: [css-text-decor] Clarifying skip-ink:auto behavior in relation to CJK text<astearns> github: https://github.com//issues/4276 <faceless> jkewL the issue is that text-decoration-skip-ink, browser have chosen generally not to apply this to CJK text because in practice it clases with most of the glyphs and looks terrible. <faceless> s/jkewL/jkew <faceless> jkew: what troubles me is that the webkit/chrome have chosen to skip this for a particular set of glyphs, but there's a disconnect as to which glyphs are skipped. In particular Blink has chosen to skip a number of punctuation chararacters <faceless> jkew: I was hoping to the spec could pin this down to work on a sequence of script characters, so that punctuation surrounded by CJK is CJK. <faceless> jkew: I'd like to settle on what we do in Firefox, which is better. At the moment the spec doesn't define it <Rossen__> q? <faceless> myles: consistancy is good but what the motivation? bug reports? <faceless> jkew: I'm sure we did have reports <faceless> myles: when you started implementing? Or was it issues around the specific characters? <faceless> jkew: initially we simply implemented and found the same issues in CJK as everyone else <koji> q+ <faceless> myles: in the absence of specific bug reports and users are not complaining, maybe we should leave it as it is? <florian> q+ <faceless> jensimmons: can we perhaps specify it and see what comes form that? <Rossen__> ack koji <faceless> koji: i"m generally with myles on this. we had reports that our slashes looked quite bad. when looking at gecko they don't look bad <jensimmons> jensimmons: is the desire of one browser to not put in the effort a reason to not spec interop. If interop on this is ideal, we can spec it and then each browser can make decisions about prioritization. (is the point I was making) <faceless> kohi: so I believe we shuold add slashes to the list. So this is a heuristic. It's not testable. But I understand that if gecko gets reports that says the inconsistancy is troubling then this is an issue <Rossen__> ack florian <skk> s/kohi/koji <faceless> florian: the spec is very vague, it says you can skip but not why. Even if we don't go all the way to defining a list, we may want to clarify the intent of this. That will not help with the immediate concern about interop, but it will help for anyone trying to understand or implement this <faceless> myles: I can add some text about that <Rossen__> ack dbaron <faceless> dbaron: I think the situation today is if we don't define things, everyone will just copy what chrome does. So if what Chrome does is right, lets put that in the spec as we're going to copy it anyway. If not, put in the spec what is right. <tantek> I feel like that needs to be repeated at the start of every CSSWG meeting <faceless> myles: is not keen on that idea <Rossen__> q? <faceless> tab: if we do whatever chrome does, it should be an choice made because chroms is doing the right thing. I want' something written down because it will be a compat issue <faceless> myles: if no-one has bug reports, it's not a compat issue yet. maybe we wait until the first report <faceless> tab: we have enough issues to know that's not the best aproach <faceless> s/aproach/approach <Rossen__> q? <Rossen__> ack dbaron <faceless> dbaron: we've found that compat constraints get stricter over time. The longer things are out on the web, they require interop and expect it to get better over time. So if we find things that aren't we should fix that early <faceless> dbaron: with the lack of bug reports, we have a cultural bias - filing them requires that you speak english and this is not the sort of bug report that english speakers will file <tantek> ^^^ great FAQ answer for "Did you get a bug report?" <koji> q+ <faceless> myles: I'm not going to push back on this. I would prefer that the approach taken is that text describing this is a reference to another spec, not a list of characters. <faceless> koji: I'm fine to have some text added that allows the UA to have some heuristics. Our bug report was opposite. We had strong opinions. people said "don't just disable skipping because slashes look bad" <faceless> myles: how would you formulate that in a spec? a list that need to be skipped and the rest are undefined? something else? <faceless> koji: not strong on specifics, but if we got reports on a specific code point we could add that, but leave others undefined. <faceless> rossen: who's going to write this up? <faceless> myles: I volunteer jkew <faceless> rossen: next action, jkew to modify the spec which - as myles suggests, references unicode - with a suggested approach that allows flexibility: <faceless> ACTION: add specifics into ink-skipping details TBD. And that it's done by reference. <faceless> ACTION: fully specify an algorithm that specifies ink skipping that references other specifications that isn't codepoint-by-codepoint <Rossen__> RESOLVED: fully specify an algorithm that specifies ink skipping that references other specifications that isn't codepoint-by-codepoint <faceless> fantasai: who's doing this? <faceless> rossen: jkew |
When I read "in CJK context" I assume this brings with it a number of CJK typographic conventions like where underline normally is placed relative to the CJK embox, and thus whether the skip-ink feature would be usable as in a typical Latin-based descender-skipping context. Lined emphasis in CJK did not typically skip (or draw behind) the glyphs, but it does get positioned such that collisions are more rare, making it quite different from typical Latin underline. I would think whether or not the underline is contiguous is also a factor in determining behavior (and CJK context). Thoughts? |
I think this is what (Fonts designed primarily for CJK use could also set their I suppose one option would be to have browsers automatically switch underline positions between a default that's suitable for Latin text and a position for CJK text on a character-by-character basis, but I expect this could give very messy results for mixed-script content. Turning ink-skipping on/off based on the script of the text seems like a better mitigation, although the "right" solution is for authors to use the tools -- such as |
Given that there is not a single canonical algorithm for determining the boundaries of script runs in arbitrary text, and given that some implementors want to retain the freedom to make adjustments based on user feedback, I don’t think we can or should currently specify precise, normative requirements for how However, the spec could usefully include a (non-normative) note to offer guidance to implementors and pointers to the Unicode specifications that should provide the basis for behavior here. I've opened #4737 with some proposed draft text for consideration; I hope this will be a useful starting point. |
Agenda+ to review @jfkthame’s proposed changes in https://github.com/w3c/csswg-drafts/pull/4737/files |
I thought the WG agreed to gate this skipping behavior on the lang= attribute, rather than Unicode properties. Am I misremembering? |
The proposal from @jfkthame is fairly close to the same process you have to go through for OpenType layout - each character has to be assigned to a script, with common, inherited or unknown characters adopting the script of their neighbouring characters. These are OpenType Script codes rather than Unicode Script codes, but for the scripts discussed in the proposal there's no ambiguity mapping between them. I don't think the exact algorithm is specified - I think lots of people use Harfbuzz these days, perhaps someone that knows it can comment better (ref) I can't comment on whether the proposed algorithm is the right one. But if it is, and if you're doing OpenType layout and can retrieve the script property it assigns to each run of text, then reusing it for ink skipping seems to be a good idea. |
I think you're misremembering (at least according to my memory, which of course is fallible!) The trouble with gating on |
Ah, yes, the resolution in #4276 (comment) describes that it will be done via "other specifications", which seems to preclude using |
The CSS Working Group just discussed
The full IRC log of that discussion<astearns> topic: Clarifying skip-ink:auto behavior in relation to CJK text<astearns> github: https://github.com//issues/4276 <fantasai> https://github.com//pull/4737/files <TabAtkins> fantasai: There's a PR from jfkthame about this issue. Looks correct to me, wanted to check with the WG <fantasai> https://github.com//issues/4276 <TabAtkins> astearns: I see myles had a question that was answered in the issue <TabAtkins> myles: It looks like an issue to me, not a PR <fantasai> https://github.com//pull/4737/files <TabAtkins> AmeliaBR: The last part of th eedit is an in-spec issue asking for other non-CJK scripts which want this behavior. <TabAtkins> AmeliaBR: Is that something to discuss now, or leave until later? <TabAtkins> fantasai: Later. That's a question for ipmls and i18n <TabAtkins> astearns: So proposed reoslution is to accept the PR and close this issue. Objections? <TabAtkins> RESOLVED: Accept PR #4737, close issue #4276 |
See Issue 8 in CSS Text Decoration Module Level 4.
Currently, it appears that Webkit and Blink browsers forcibly disable ink-skipping behavior (as if
text-decoration-skip-ink:none
were in effect) for a hardcoded list of “CJK” characters.However, they disagree on the exact set of characters for which skipping is disabled. In particular, Blink seems to add a large but (apparently) rather ad hoc collection of punctuation and other symbols that are not especially associated with CJK text. Hence, these characters don’t get ink-skipping behavior in Chrome even though they may equally well be used in the context of Latin or other scripts, not just in CJK contexts. On the other hand, in Webkit they do get skipped, whether used in Latin or CJK contexts. (Compare testcase: https://jsfiddle.net/rhLjauq4/ in Chrome vs Safari.)
In Gecko’s implementation (not yet enabled in release builds, but available for testing in Firefox nightlies), we’ve taken a slightly different approach. Rather than a fixed set of characters for which ink-skipping is disabled, which is problematic precisely because of the large number of “common” characters — mainly punctuation and symbols — that are used in both CJK and non-CJK contexts, the decision whether to disable skip-ink is taken on a per-script-run basis. CJK ideographs are of course recognised as belonging to a CJK script, but in addition, “common” characters will be merged into the same script run when used in a CJK context, and so the same non-skipping behavior will be applied to them in CJK context only.
I think it would be helpful for authors if there were agreement as to which characters are or aren’t eligible for skip-ink behavior when
auto
is in effect. Given the substantial number of ambiguous “common” characters, a simple partitioning of individual characters (as currently implemented in Webkit and Blink, AFAICT) is not a particularly good solution. I expect layout engines already do some kind of script-run analysis in order to handle font selection and shaping appropriately, and therefore using script runs as the basis for deciding when to disable ink skipping should not be overly burdensome, and I believe results in more useful behavior. Would the WG and other browser developers be prepared to converge on this approach, and include it in the text-decoration spec?The text was updated successfully, but these errors were encountered: