How much can be fixed using font rules or CGJ? #493

r12a · 2017-09-27T12:10:42Z

4.3 Examples
http://www.unicode.org/reports/tr53/

The Amiri font is able to reproduce the expected results for examples 2a and 2b without reordering the combining characters. I find myself wondering whether other things could be addressed this way.

example 2a in Amiri

example 2b

I also wonder how further one can get with the use of CGJ.

behnam · 2017-10-03T19:05:00Z

@r12a, I think every sequence desired is possible with CGJ. Basically, inserting one CGJ between every single mark with UAOA would be the same as the sequence without UAOA (in many cases).

Or maybe I misunderstood the question here?

khaledhosny · 2017-10-03T20:20:23Z

Dealing with the broken canonical ordering of Arabic marks was very painful while developing Amiri, I don't even remember all the font hacks I had to do. In general I’m personally in favor of having a solution that would work with the common uses without requiring special behavior from the fonts or excessive usage of CGJ.

behnam · 2017-10-03T20:38:39Z

@khaledhosny, I case it's not clear from my comments, I'm not against the algorithm in general. In fact, I believe it has potential, but needs to be documented better and put in the right place.

From all the discussions, it sounds like it's only supposed to be used for text rendering (fonts), but not text processing, transforming, or transferring. If so, maybe it belongs to OpenType or other standards, as a normative step. The fact that it's being proposed as a generic algorithm for processing and editing Arabic text (that's how it looks like at the moment, at least) is not a good thing for the users of the script and applications trying to understand and support the script.

More importantly, the examples provided to justify the need of the algorithm are based on corner cases of a single use-case: Quranic text. The document completely lacks any analysis on how the algorithm impacts average users of all the variety of use cases of the script. In fact, by claiming to be "within the stability requirements of Unicode", it's claiming to have no affect on average use cases. (More in #495)

The selection rational for MCM list sounds arbitrary and needs data for the claims of differences between "small seen" and "small meem" marks and such. The same is true for other similar claims in the document, which

behnam · 2017-10-13T22:04:35Z

I submitted a long individual feedback. Here's the part related to this issue:

4. Consequences of the Algorithm: Semantics

With UAOA applied on text during rendering, some strings collapse into a single sequence. Basically, there are plenty of strings X and Y, where toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)).

Basically, this is changing the semantics of existing text encoded in Unicode, since the rendering will be different afterwards. The document is not clear about this semantic change and only claims to “correcting” all the problems.

The proposal is suggesting to use CGJ to preserve the old semantics when needed. The document needs to be more clear about how to preserve the semantics. In fact, there should be a clear algorithm to convert a string X to preserve the semantics when changing the (rendering) interpretation, since for a couple of decades users have been storing text in the current semantics of the encoding, which has been the only recommended way to do so by Unicode.

r12a · 2017-10-16T13:16:02Z

Set label to Waiting. Reassess comment after UTR53 has been updated.

r12a added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:utr53 labels Sep 27, 2017

r12a added waiting and removed pending Issue not yet sent to WG, or raised by tracker tool & needing labels. labels Oct 16, 2017

r12a added the needs-resolution i18n expects this item to be resolved to their satisfaction. label Nov 16, 2018

r12a added close? The related issue was closed by the Group but open here and removed waiting labels Dec 4, 2020

aphillips closed this as completed Jul 22, 2021

r12a added the t:typ_misc 8.7 Miscellaneous label Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How much can be fixed using font rules or CGJ? #493

How much can be fixed using font rules or CGJ? #493

r12a commented Sep 27, 2017 •

edited

behnam commented Oct 3, 2017 •

edited

khaledhosny commented Oct 3, 2017

behnam commented Oct 3, 2017

behnam commented Oct 13, 2017

r12a commented Oct 16, 2017

How much can be fixed using font rules or CGJ? #493

How much can be fixed using font rules or CGJ? #493

Comments

r12a commented Sep 27, 2017 • edited

behnam commented Oct 3, 2017 • edited

khaledhosny commented Oct 3, 2017

behnam commented Oct 3, 2017

behnam commented Oct 13, 2017

4. Consequences of the Algorithm: Semantics

r12a commented Oct 16, 2017

r12a commented Sep 27, 2017 •

edited

behnam commented Oct 3, 2017 •

edited