Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much can be fixed using font rules or CGJ? #493

Closed
r12a opened this issue Sep 27, 2017 · 5 comments
Closed

How much can be fixed using font rules or CGJ? #493

r12a opened this issue Sep 27, 2017 · 5 comments
Labels
close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:utr53 t:typ_misc 8.7 Miscellaneous

Comments

@r12a
Copy link
Contributor

r12a commented Sep 27, 2017

4.3 Examples
http://www.unicode.org/reports/tr53/

The Amiri font is able to reproduce the expected results for examples 2a and 2b without reordering the combining characters. I find myself wondering whether other things could be addressed this way.

example 2a in Amiri
screen shot 2017-09-27 at 12 53 47

example 2b
screen shot 2017-09-27 at 12 53 40

I also wonder how further one can get with the use of CGJ.

@r12a r12a added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:utr53 labels Sep 27, 2017
@behnam
Copy link
Member

behnam commented Oct 3, 2017

@r12a, I think every sequence desired is possible with CGJ. Basically, inserting one CGJ between every single mark with UAOA would be the same as the sequence without UAOA (in many cases).

Or maybe I misunderstood the question here?

@khaledhosny
Copy link

Dealing with the broken canonical ordering of Arabic marks was very painful while developing Amiri, I don't even remember all the font hacks I had to do. In general I’m personally in favor of having a solution that would work with the common uses without requiring special behavior from the fonts or excessive usage of CGJ.

@behnam
Copy link
Member

behnam commented Oct 3, 2017

@khaledhosny, I case it's not clear from my comments, I'm not against the algorithm in general. In fact, I believe it has potential, but needs to be documented better and put in the right place.

From all the discussions, it sounds like it's only supposed to be used for text rendering (fonts), but not text processing, transforming, or transferring. If so, maybe it belongs to OpenType or other standards, as a normative step. The fact that it's being proposed as a generic algorithm for processing and editing Arabic text (that's how it looks like at the moment, at least) is not a good thing for the users of the script and applications trying to understand and support the script.

More importantly, the examples provided to justify the need of the algorithm are based on corner cases of a single use-case: Quranic text. The document completely lacks any analysis on how the algorithm impacts average users of all the variety of use cases of the script. In fact, by claiming to be "within the stability requirements of Unicode", it's claiming to have no affect on average use cases. (More in #495)

The selection rational for MCM list sounds arbitrary and needs data for the claims of differences between "small seen" and "small meem" marks and such. The same is true for other similar claims in the document, which

@behnam
Copy link
Member

behnam commented Oct 13, 2017

I submitted a long individual feedback. Here's the part related to this issue:


4. Consequences of the Algorithm: Semantics

With UAOA applied on text during rendering, some strings collapse into a single sequence. Basically, there are plenty of strings X and Y, where toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)).

Basically, this is changing the semantics of existing text encoded in Unicode, since the rendering will be different afterwards. The document is not clear about this semantic change and only claims to “correcting” all the problems.

The proposal is suggesting to use CGJ to preserve the old semantics when needed. The document needs to be more clear about how to preserve the semantics. In fact, there should be a clear algorithm to convert a string X to preserve the semantics when changing the (rendering) interpretation, since for a couple of decades users have been storing text in the current semantics of the encoding, which has been the only recommended way to do so by Unicode.

@r12a r12a added waiting and removed pending Issue not yet sent to WG, or raised by tracker tool & needing labels. labels Oct 16, 2017
@r12a
Copy link
Contributor Author

r12a commented Oct 16, 2017

Set label to Waiting. Reassess comment after UTR53 has been updated.

@r12a r12a added the needs-resolution i18n expects this item to be resolved to their satisfaction. label Nov 16, 2018
@r12a r12a added close? The related issue was closed by the Group but open here and removed waiting labels Dec 4, 2020
@r12a r12a added the t:typ_misc 8.7 Miscellaneous label Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:utr53 t:typ_misc 8.7 Miscellaneous
Projects
None yet
Development

No branches or pull requests

4 participants