-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusable definition #127
Comments
Sounds good. I would introduce the term "exact homoglyph/homograph" for truly identical code points, and point to the UCD file "Intentional.txt" that lists all of them. I would add to the examples: 01DD and 0259 and or the three cases of capital D with stroke. These look very nicely identical and are more easily understood by readers used to the Latin script (which by definition, includes anyone reading this text in the original English). |
@r12a: adopted your wording with revisions to make it consistent with the preceding introduction of the term homograph. @asmusf: what UCD file is that? 10.0.0 doesn't have it and I don't see it in UTR36. Where am I forgetting to look? I could add more examples, but hunting about for the characters and such seems overkill. The 01DD and 0259 example is not nearly as familiar as the "P" example given and isn't particularly different from the P example (which security folks remember as part of the "paypal bug" in IDNA or which causes consternation because one is an ASCII letter). |
On 10/25/2017 8:41 PM, Addison Phillips wrote:
@r12a <https://github.com/r12a>: adopted your wording with revisions
to make it consistent with the preceding introduction of the term
homograph.
@asmusf <https://github.com/asmusf>: what UCD file is that? 10.0.0
doesn't have it and I don't see it in UTR36. Where am I forgetting to
look?
I could add more examples, but hunting about for the characters and
such seems overkill. The 01DD and 0259 example is not nearly as
familiar as the "P" example given and isn't particularly different
from the P example (which security folks remember as part of the
"paypal bug" in IDNA or which causes consternation because one is an
ASCII letter).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#127 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANbTHvPEQ1Dlg9A53o1Y5ssUA7sfC1xKks5sv_9fgaJpZM4P-ELT>.
security
|
suggest 'still' -> 'also'
This seems to be suggesting that homographs and confusables are different things, and that the logical difference only applies for confusables, which i find confusing. I think this needs more work. btw, for the P example, you may want to say that they actually represent different letters of the alphabet, ie. the pronunciation is different from the Latin. It's not just that there are copies of the same letter for each alphabet. |
Homographs and confusables are (slightly) separate concepts. There are confusables that are not exact homographs (1 vs. lowercase-L). There are homographs that are not confusable (À vs À, where one is U+00C0 and one is U+0041 U+0300) because they are logically the same thing. The "P" example was difficult to convey. I was looking for a way to say that more elegantly than I ended up with. Perhaps go from:
To:
I wanted to mention that they were separate alphabets to draw attention to the fact that each alphabet is complete unto itself. Some letters, after all, are more closely related between the separate scripts. |
But that not what the text says. How about this. One or more graphemes that look identical (or very similar) are called homographs. The character sequences underlying homographs may be alternative ways of expressing the same logical grapheme, or may represent different graphemes that just happen to look alike. In the latter case, the character sequences are said to be "confusable". |
Btw, we need to think about making images for the examples because (a) the different Ps may actually look less identical if your system substitutes different fonts, and (b) because even my system doesn't display the ARABIC LETTER BEH WITH HAMZA ABOVE, so good luck to anyone else in understanding the point there ;-) |
Regarding your previous comment, I think that you make a good point and I'll make the change. |
Would you believe Times New Roman? |
It appears to be working in my browser now, even though my browser's default font is set to something else. Did you add the letter to the webfont? |
I have not yet regenerated the webfont. Could be either (a) a fix to iOS or (b) I got the fallback order in the font names correct ;-). I will regenerate the webfont as part of the clearing up for publication process. |
The latter comments were veering off-track, so i raised a new issue at #188 I'm happy to close the current issue (discussion about confusables). |
2.3 Identical-Appearing Characters and the Limitations of Normalization
https://w3c.github.io/charmod-norm/#normalizationLimitations
Shouldn't that say 'two logically different characters'?
Or perhaps better:
When graphemes look similar but actually represent things that are logically different, they are said to be confusable.
The text was updated successfully, but these errors were encountered: