New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add worked examples of case folding [I18N-ACTION-974] #214
Conversation
Merge commits to my fork
- Add a new subsection illustrating the interplay between normalization and case folding - Modify recommendation text to reference new examples
Here's how i'm framing the topic for myself. Does it help? (Either for planning your document explanation, or for showing me where i'm missing something.) If you case fold the precomposed (NFC) character ᾌ [U+1F8C GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI] (which is the most common way to represent this combination of base and diacritic characters) and you just run case fold transformations you end up with: If you start from a fully decomposed (NFD) sequence representing the same letter, ᾌ [U+0391 GREEK CAPITAL LETTER ALPHA + U+0313 COMBINING COMMA ABOVE + U+0301 COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI] Clearly, these two don't match, and some normalisation will be necessary. However, in both of those cases, the acute accent is associated with the alpha base character. If, however, you begin with the half-precomposed sequence ᾌ [U+1F88 GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI + U+0301 COMBINING ACUTE ACCENT] This produces a sequence that can't be normalised to match the others! The way to resolve this problem is to normalise all the text beforehand to NFD. Then If you now case-fold that sequence, it produces a match for all cases. What i'm still not clear about, is why you need to double-normalise. |
@r12a Thanks, that's a useful and clear explanation and I'll borrow heavily from it in my next revision. I just now posted my "work in progress" version. The need for double normalization is: the first (pre-case fold) handles the prosgegrammeni problem. The post-case-fold normalization handles all of the other cases that flow from case folding by canonicalizing the code points and the ordering of combining marks (denormalized values for which can be produced by the case fold). Hopefully the just posted text makes that clearer (although more work is needed on that) |
Note, btw, that i think it really helps to have the names in addition to the code point values, for people to understand what's going on. (They are all marked up in my comment, so if you're able to see the edited version that may help you. If you can't, i could send you a plain text version.) |
Btw, my Greek char app now has a casefold function (in the drop down menu). You'll also find it very useful to use the normalisation switch on the panel lower down, which allows you to transform the text to NFC or NFD, or prevents any normalisation as you paste/type if you set to 'None'. https://r12a.github.io/pickers/grek/ |
@r12a I always use the I will probably make a new subsection out of the prosgegrammeni note and add discussion of why the pre-normalization step is optional and the workaround for the affected characters. Then I need to update 3.2.2.3/3.2.2.4 to reflect that. |
…ion with case fold. Added notes about the optional step. Adjusted text.
- Moved 'user-supplied values' next to 'syntactic content' - Changed 'natural language content' to 'textual content' - Added definition of 'natural language' with link to LTLI - Modified all references to match - Assorted cleanup. Some light editing of sections being addressed. - Added a > 4 character example for hex notation (smiling cat)
concept of "localizable content" clear.
- Remove subscript 16 from hex range of code point and style consistently with document. - Remove 'termref' from one anchor for consistent styling. - Change example3 from 'coverImage' to 'downloadLocation' (which is not localizable) - Fix spacing in example3
Adds new examples illustrating how casefolding and normalization interact.
Preview | Diff