From 4f0b1c09a4ac423ebcddda2e8231866e0772e8e2 Mon Sep 17 00:00:00 2001 From: "@aphillips" Date: Fri, 27 Oct 2017 11:23:09 -0700 Subject: [PATCH] Extensive work on the examples, addition of new examples, styling, and minor word tweaks. Includes work on #122 --- index.html | 223 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 146 insertions(+), 77 deletions(-) diff --git a/index.html b/index.html index 26be61b..6ae7ee8 100644 --- a/index.html +++ b/index.html @@ -368,10 +368,46 @@

Terminology and Notation

class="uname" translate="no">U+092F U+0942 U+0928 U+093F U+0915 U+094B U+0921). However, most users would identify this - word as containing four units of text—यू, नि, को, and ड. Each of the + word as containing four units of text. Each of the first three graphemes consists of two characters: a syllable and a modifying vowel character. So the word contains seven Unicode - characters, but only four graphemes.

+ characters, but only four graphemes: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Wordयूनिकोड
Graphemesयूनिको
Code Pointsि
U+092FU+0942U+0928U+093fU+0915U+094bU+0921
+ +

Terminology Examples
@@ -1117,23 +1153,59 @@

Identical-Appearing Characters and the Limitations of Normalization

But two logically distinct characters or grapheme clusters can still look the same or very similar. When a pair of graphemes look identical (or very similar), they are called homographs. When a pair of graphemes look similar or are homographs but actually represent logically different characters or character sequences, they are said to be confusable.

-

One example of this are the letters U+03A1 (Ρ), U+0420 (Р), - and U+0050 (P). These letters look identical in most fonts (that is, they are homographs), - but they are encoded separately as part of the - alphabets used in the Greek, Cyrillic, and Latin scripts respectively. Unicode Normalization - will not fold these characters together.

-

Examples of identical or identical-seeming appearance can appear - even within a single script. Some examples of this include: -

+ + +

Examples of identical or identical-seeming appearance can appear even within a single script. This can take the form of similarly shaped characters, such as "0" and "O" or "l" and "1". But other scripts or the use of different compatibility characters can present much less readily distinguished variations. In some cases, Unicode Normalization brings these together, but in many other cases it does not.

+

Characters that are identical or confusable in appearance can present spoofing and other security risks. This can be true within a single script or for similar characters in separate scripts. For further discussion and examples of homoglyphs and confusability, @@ -1400,92 +1472,89 @@

Other Types of Equivalence

specific and shouldn't be overlooked by specifications or implementations as an additional consideration.

Another similar example is called digit shaping. Some scripts, - such as Arabic, have their own digit characters for the numbers from 0 to 9. + such as Arabic or Thai, have their own digit characters for the numbers from 0 to 9. In some Web applications, the familiar ASCII digits are replaced for display purposes with the local digit shapes. In other cases, the text actually might contain the Unicode characters for the local digits. Users attempting to search a document might expect that typing one form of digit will find the eqivalent digits.

-