Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of CGJ #498

Closed
r12a opened this issue Oct 4, 2017 · 3 comments
Closed

Use of CGJ #498

r12a opened this issue Oct 4, 2017 · 3 comments
Labels
close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:utr53 t:typ_misc 8.7 Miscellaneous

Comments

@r12a
Copy link
Contributor

r12a commented Oct 4, 2017

4.2 Override Mechanism for Exceptions
http://www.unicode.org/reports/tr53/

The default display order implemented by the UAOA will be correct for most uses. However, these are situations where a different mark order is desired. In these cases, U+034F COMBINING GRAPHEME JOINER (CGJ) can be used to achieve the desired display order.

Question:
It isn't really clear to me, but i'm guessing that the expectation is that content authors will use the CGJ in their content prior to application of the UAOA rules, rather than as an additional step in ordering text per the UAOA rules.

Editorial suggestion:
If that assumption holds, I'm surprised that the UAOA algorithm in section 3.2 doesn't mention that CGJ is a non-starter. For example, the step "Move any shadda characters ... to the beginning of S" should presumably not apply to a shadda that is preceded by CGJ.

@r12a r12a added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:utr53 labels Oct 4, 2017
@behnam
Copy link
Member

behnam commented Oct 13, 2017

I submitted a long individual feedback. Here's the part related to this issue:


4. Consequences of the Algorithm: Semantics

With UAOA applied on text during rendering, some strings collapse into a single sequence. Basically, there are plenty of strings X and Y, where toNFC(X) ≠ toNFC(Y), but UAOA(toNFC(X)) = UAOA(toNFC(Y)).

Basically, this is changing the semantics of existing text encoded in Unicode, since the rendering will be different afterwards. The document is not clear about this semantic change and only claims to “correcting” all the problems.

The proposal is suggesting to use CGJ to preserve the old semantics when needed. The document needs to be more clear about how to preserve the semantics. In fact, there should be a clear algorithm to convert a string X to preserve the semantics when changing the (rendering) interpretation, since for a couple of decades users have been storing text in the current semantics of the encoding, which has been the only recommended way to do so by Unicode.

@r12a
Copy link
Contributor Author

r12a commented Oct 16, 2017

The original comment above may be sent after UTR53 has been updated, if it is still relevant. Setting a status of Waiting for now.

@r12a r12a added waiting and removed pending Issue not yet sent to WG, or raised by tracker tool & needing labels. labels Oct 16, 2017
@r12a r12a removed the waiting label Dec 11, 2017
@r12a
Copy link
Contributor Author

r12a commented Dec 11, 2017

Resend the comment.

@r12a r12a added the needs-resolution i18n expects this item to be resolved to their satisfaction. label Nov 16, 2018
@r12a r12a added the close? The related issue was closed by the Group but open here label Dec 4, 2020
@r12a r12a added the t:typ_misc 8.7 Miscellaneous label Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close? The related issue was closed by the Group but open here needs-resolution i18n expects this item to be resolved to their satisfaction. s:utr53 t:typ_misc 8.7 Miscellaneous
Projects
None yet
Development

No branches or pull requests

3 participants