Skip to content

Commit

Permalink
Addressed #190 with @r12a's text.
Browse files Browse the repository at this point in the history
  • Loading branch information
aphillips committed Jan 23, 2019
1 parent 03a8ea2 commit b65d8ac
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -935,14 +935,13 @@ <h4>Canonical vs. Compatibility Equivalence</h4>
</section>
<section id="composition_decomposition">
<h4>Composition vs. Decomposition</h4>
<p>These two types of Unicode-defined equivalence are then grouped by another pair of variations: "decomposition" and "composition". In "decomposition", separable logical parts of a visual character are broken out into a sequence of base characters and combining marks and the resulting code points are put into a fixed, canonical order. In "composition", the decomposition is performed and then combining marks are recombined to the extent possible with their base characters.</p>
<p>These two types of Unicode-defined equivalence are then grouped by another pair of variations: "decomposition" and "composition". In "decomposition", separable logical parts of a visual character are broken out into a sequence of base characters and combining marks and the resulting code points are put into a fixed, canonical order. In "composition", the decomposition is performed and then combining marks are recombined according to certain rules with their base characters.</p>
<div class="warning">
<p>Roughly speaking, <abbr title="Normalization Form C">NFC</abbr> is defined such that each combining character sequence (a base character followed by one or more combining characters) is replaced, as far as possible, by a canonically equivalent precomposed character.</p>

<p>It is rather important to notice what this does <strong>not</strong> mean. The resulting character sequence can still contain combining marks, since not all character sequences have a precomposed equivalent. Also, as we've seen, many scripts require the use of combining marks, such as the Devanagari vowels in <a href="#graphemeExample">this example</a>. In other cases, a given base character and combining mark is not replaced with a precomposed character because the combination is blocked by various normalization rules. For example, another combining mark might be between the two characters. Some scripts have specific exceptions to the composition rules.</p>

<p>What NFC gives the user is a string that can be compared to other NFC strings for equality with the minimum number of combining marks for that purpose.</p>
<p>It is rather important to notice what this does <strong>not</strong> mean. The resulting character sequence can still contain combining marks, since not all character sequences have a precomposed equivalent. Indeed, as we've seen, many scripts offer no alternative to the use of combining marks, such as the Devanagari vowels in <a href="#graphemeExample">this example</a>. In other cases, a given base character and combining mark is not replaced with a precomposed character because the combination is blocked by normalization rules. For example, some Indic scripts do not compose certain sequences of base plus diacritic, even though a matching precomposed character exists, due to composition exclusion rules. Composition may also be blocked by another combining mark between the two characters that would otherwise combine.</p>
</div>

</section>
<section id="normalization_forms">
<h4>Unicode Normalization Forms</h4>
Expand Down

0 comments on commit b65d8ac

Please sign in to comment.