New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How much can be fixed using font rules or CGJ? #493
Comments
@r12a, I think every sequence desired is possible with Or maybe I misunderstood the question here? |
Dealing with the broken canonical ordering of Arabic marks was very painful while developing Amiri, I don't even remember all the font hacks I had to do. In general I’m personally in favor of having a solution that would work with the common uses without requiring special behavior from the fonts or excessive usage of CGJ. |
@khaledhosny, I case it's not clear from my comments, I'm not against the algorithm in general. In fact, I believe it has potential, but needs to be documented better and put in the right place. From all the discussions, it sounds like it's only supposed to be used for text rendering (fonts), but not text processing, transforming, or transferring. If so, maybe it belongs to OpenType or other standards, as a normative step. The fact that it's being proposed as a generic algorithm for processing and editing Arabic text (that's how it looks like at the moment, at least) is not a good thing for the users of the script and applications trying to understand and support the script. More importantly, the examples provided to justify the need of the algorithm are based on corner cases of a single use-case: Quranic text. The document completely lacks any analysis on how the algorithm impacts average users of all the variety of use cases of the script. In fact, by claiming to be "within the stability requirements of Unicode", it's claiming to have no affect on average use cases. (More in #495) The selection rational for MCM list sounds arbitrary and needs data for the claims of differences between "small seen" and "small meem" marks and such. The same is true for other similar claims in the document, which |
I submitted a long individual feedback. Here's the part related to this issue: 4. Consequences of the Algorithm: SemanticsWith UAOA applied on text during rendering, some strings collapse into a single sequence. Basically, there are plenty of strings X and Y, where toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)). Basically, this is changing the semantics of existing text encoded in Unicode, since the rendering will be different afterwards. The document is not clear about this semantic change and only claims to “correcting” all the problems. The proposal is suggesting to use CGJ to preserve the old semantics when needed. The document needs to be more clear about how to preserve the semantics. In fact, there should be a clear algorithm to convert a string X to preserve the semantics when changing the (rendering) interpretation, since for a couple of decades users have been storing text in the current semantics of the encoding, which has been the only recommended way to do so by Unicode. |
Set label to Waiting. Reassess comment after UTR53 has been updated. |
4.3 Examples
http://www.unicode.org/reports/tr53/
The Amiri font is able to reproduce the expected results for examples 2a and 2b without reordering the combining characters. I find myself wondering whether other things could be addressed this way.
example 2a in Amiri
example 2b
I also wonder how further one can get with the use of CGJ.
The text was updated successfully, but these errors were encountered: