diff --git a/index.html b/index.html index 26be61b..6ae7ee8 100644 --- a/index.html +++ b/index.html @@ -368,10 +368,46 @@
Word | +यूनिकोड | +||||||
Graphemes | +यू | +नि | +को | +ड | +|||
Code Points | +य | +ू | +न | +ि | +क | +ो | +ड | +
+ | U+092F | +U+0942 | +U+0928 | +U+093f | +U+0915 | +U+094b | +U+0921 | +
confusable. -
One example of this are the letters U+03A1
(Ρ), U+0420
(Р),
- and U+0050
(P). These letters look identical in most fonts (that is, they are homographs),
- but they are encoded separately as part of the
- alphabets used in the Greek, Cyrillic, and Latin scripts respectively. Unicode Normalization
- will not fold these characters together.
Examples of identical or identical-seeming appearance can appear - even within a single script. Some examples of this include: -
U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
ࢡ which is similar to the sequence U+0628 U+0654
بٔ
- (ARABIC LETTER BEH
followed by ARABIC HAMZA ABOVE
)U+17D2 KHMER SIGN COENG
such as U+17D2 U+178F
- (ក្ត) and U+17D2 U+178A
(ក្ដ) (each shown here, for legibility, with the
- base character U+1780 KHMER LETTER KA
ក)U+0133 LATIN SMALL LIGATURE IJ
ij (versus individual letters ij in sequence)Examples of identical or identical-seeming appearance can appear even within a single script. This can take the form of similarly shaped characters, such as "0" and "O" or "l" and "1". But other scripts or the use of different compatibility characters can present much less readily distinguished variations. In some cases, Unicode Normalization brings these together, but in many other cases it does not.
+Characters that are identical or confusable
in appearance can present spoofing and
other security risks. This can be true within a single script or for similar characters in
separate scripts. For further discussion and examples of homoglyphs and confusability,
@@ -1400,92 +1472,89 @@
Another similar example is called digit shaping. Some scripts, - such as Arabic, have their own digit characters for the numbers from 0 to 9. + such as Arabic or Thai, have their own digit characters for the numbers from 0 to 9. In some Web applications, the familiar ASCII digits are replaced for display purposes with the local digit shapes. In other cases, the text actually might contain the Unicode characters for the local digits. Users attempting to search a document might expect that typing one form of digit will find the eqivalent digits.
-