Add worked examples of case folding [I18N-ACTION-974] #214

aphillips · 2020-12-02T00:57:24Z

Adds new examples illustrating how casefolding and normalization interact.

Merge commits to my fork

- Add a new subsection illustrating the interplay between normalization and case folding - Modify recommendation text to reference new examples

r12a · 2020-12-04T16:52:29Z

Here's how i'm framing the topic for myself. Does it help? (Either for planning your document explanation, or for showing me where i'm missing something.)

If you case fold the precomposed (NFC) character ᾌ [U+1F8C GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI] (which is the most common way to represent this combination of base and diacritic characters) and you just run case fold transformations you end up with:
ἄι [U+1F04 GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA + U+03B9 GREEK SMALL LETTER IOTA]

If you start from a fully decomposed (NFD) sequence representing the same letter, ᾌ [U+0391 GREEK CAPITAL LETTER ALPHA + U+0313 COMBINING COMMA ABOVE + U+0301 COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI]
you end up with
ἄι [U+03B1 GREEK SMALL LETTER ALPHA + U+0313 COMBINING COMMA ABOVE + U+0301 COMBINING ACUTE ACCENT + U+03B9 GREEK SMALL LETTER IOTA]

Clearly, these two don't match, and some normalisation will be necessary. However, in both of those cases, the acute accent is associated with the alpha base character.

If, however, you begin with the half-precomposed sequence ᾌ [U+1F88 GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI + U+0301 COMBINING ACUTE ACCENT]
you end up with
ἀί [U+1F00 GREEK SMALL LETTER ALPHA WITH PSILI + U+03B9 GREEK SMALL LETTER IOTA + U+0301 COMBINING ACUTE ACCENT]
where the acute accent is associated with the iota.

This produces a sequence that can't be normalised to match the others!

The way to resolve this problem is to normalise all the text beforehand to NFD. Then
ᾌ [U+1F8C GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI]
and
ᾌ [U+1F88 GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI + U+0301 COMBINING ACUTE ACCENT]
both end up the same as the decomposed version, ie.
ᾌ [U+0391 GREEK CAPITAL LETTER ALPHA + U+0313 COMBINING COMMA ABOVE + U+0301 COMBINING ACUTE ACCENT + U+0345 COMBINING GREEK YPOGEGRAMMENI]

If you now case-fold that sequence, it produces a match for all cases.

What i'm still not clear about, is why you need to double-normalise.

aphillips · 2020-12-04T17:11:27Z

@r12a Thanks, that's a useful and clear explanation and I'll borrow heavily from it in my next revision. I just now posted my "work in progress" version.

The need for double normalization is: the first (pre-case fold) handles the prosgegrammeni problem. The post-case-fold normalization handles all of the other cases that flow from case folding by canonicalizing the code points and the ordering of combining marks (denormalized values for which can be produced by the case fold). Hopefully the just posted text makes that clearer (although more work is needed on that)

r12a · 2020-12-04T17:16:53Z

Note, btw, that i think it really helps to have the names in addition to the code point values, for people to understand what's going on. (They are all marked up in my comment, so if you're able to see the edited version that may help you. If you can't, i could send you a plain text version.)

r12a · 2020-12-04T17:21:21Z

Btw, my Greek char app now has a casefold function (in the drop down menu). You'll also find it very useful to use the normalisation switch on the panel lower down, which allows you to transform the text to NFC or NFD, or prevents any normalisation as you paste/type if you set to 'None'. https://r12a.github.io/pickers/grek/

aphillips · 2020-12-04T17:39:05Z

@r12a I always use the char [U+1234 NAME GOES HERE] for individual characters (with styles) when done.

I will probably make a new subsection out of the prosgegrammeni note and add discussion of why the pre-normalization step is optional and the workaround for the affected characters. Then I need to update 3.2.2.3/3.2.2.4 to reflect that.

…ion with case fold. Added notes about the optional step. Adjusted text.

- Moved 'user-supplied values' next to 'syntactic content' - Changed 'natural language content' to 'textual content' - Added definition of 'natural language' with link to LTLI - Modified all references to match - Assorted cleanup. Some light editing of sections being addressed. - Added a > 4 character example for hex notation (smiling cat)

concept of "localizable content" clear.

- Remove subscript 16 from hex range of code point and style consistently with document. - Remove 'termref' from one anchor for consistent styling. - Change example3 from 'coverImage' to 'downloadLocation' (which is not localizable) - Fix spacing in example3

aphillips added 2 commits October 9, 2020 08:55

Merge pull request #2 from w3c/gh-pages

176e8e9

Merge commits to my fork

Add worked examples of casefold matching [I18N-ACTION-974]

45ccdb8

- Add a new subsection illustrating the interplay between normalization and case folding - Modify recommendation text to reference new examples

aphillips requested a review from r12a December 2, 2020 00:57

Work in progress from 2020-12-03 telecon

bc9d5cd

aphillips added 6 commits December 11, 2020 13:37

Incorporate Richard's suggest text with many many edits to boot.

b3b4712

Added additional examples of the compatibility normalization interact…

fc619b7

…ion with case fold. Added notes about the optional step. Adjusted text.

Fix a typo in section 3.2.7

8f3f242

Significant edits and rewrites of the terminology section to make the

5a88ebb

concept of "localizable content" clear.

aphillips merged commit 21284b0 into w3c:gh-pages May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add worked examples of case folding [I18N-ACTION-974] #214

Add worked examples of case folding [I18N-ACTION-974] #214

aphillips commented Dec 2, 2020 •

edited by pr-preview bot

r12a commented Dec 4, 2020

aphillips commented Dec 4, 2020

r12a commented Dec 4, 2020

r12a commented Dec 4, 2020

aphillips commented Dec 4, 2020

Add worked examples of case folding [I18N-ACTION-974] #214

Add worked examples of case folding [I18N-ACTION-974] #214

Conversation

aphillips commented Dec 2, 2020 • edited by pr-preview bot

r12a commented Dec 4, 2020

aphillips commented Dec 4, 2020

r12a commented Dec 4, 2020

r12a commented Dec 4, 2020

aphillips commented Dec 4, 2020

aphillips commented Dec 2, 2020 •

edited by pr-preview bot