-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' not always rendered correctly #125
Comments
Screenshot on my system, with buggy fonts marked highlighted red: Creating these kinds of ligatures, specially RIAL and ALLAH are very common in fonts. The bug here seams to be the font assigning U+FDF2 to a ligature glyph for the second joining segment of the word ALLAH (which is LLAH), instead of creating a composed glyph for U+FDF2 using the ligature. CLDR data, which is our primary source for character support, misses any kind of information about ligatures (and their possible codepoints). Seeing this bug being common, specially in the more open-source fonts, I think we can cover the topic in ALReq and, even, maybe, provide an Annex with some details about the important ligatures and their implementation details in fonts (like the detail here that the ligature doesn't get U+FDF2 codepoint, but U+FDF2 uses the ligature.) What do you think? |
Since U+FDF2 is a presentation form character, I think we shouldn’t say much more than discouraging the use of presentation forms in text input. As for the fonts, though they indeed break the glyph for U+FDF2, the ligatures for |
Right, @khaledhosny. True that we want to discourage them in text. So, the question is, do we want to cover the issue for the sake of improving font development processes and font products for the script? Since the topic is not exactly text layout, I think it could be a separate (wiki) document, or maybe an annex on font development. |
I agree this does not belong to the main document, an annex on Arabic font development best practices might be a good idea. |
My thinking is :
Html code to test your fonts: @behnam and @khaled, +1 to cover font development best practices. |
The Unicode Standard 11.0.0 says the following in section 9.2 Arabic Presentation Forms-A: U+FB50–U+FDFF, Word Ligatures (this was added in Unicode 7.0.0):
|
I decided it was time for me to explore this a little more deeply. Here are some other results. I created a test page at: Here are some results i screen-captured on my Mac. Grey backgrounds from a v quick scan indicate things i think are probably incorrect. Essentially, this whole thing is quite broken, it seems. (Which is surprising given the content involved.) |
Arial overcompensating by adding a double shadda/alif is very surprising (and somewhat hilarious) to me given how commonly that font is used. Then again, I guess very little about non-latin text not working on computers should surprise me anymore 😩 |
My perception is that, contrary to what Unicode suggests, Arabic users expect bare [alef] lam lam heh to ligate and that is what almost all Arabic fonts do. Arabic non-God name words that would match the same sequence of letters are very uncommon to the extent that I never encountered any of them until I was researching this very issue. In Amiri I approached this from the other end; actively matching sequences that are unlikely to be the name of God and unligating them, e.g. خالله does not ligate, but فالله ligates while فالَله does not. |
When I discussed this issue with @roozbehp he had some examples of Persian words that do this, IIRC. Just to lay it out, there are multiple issues here, of varying severity:
|
As @r12a notes in https://r12a.github.io/scripts/arabic/block#charFDF2 the compatibility decomposition for FDF2 is <alif, lam, lam, heh> (“≈ [isolated] 0627 0644 0644 0647”). While the (non normative) reference glyph is a ligature <alif, lam, lam, shadda, superscript alif, heh>, this hasn’t always been the case. In the Appendix H. New Characters of the Unicode Standard 1.1, the reference glyph used is a ligature <alif, lam, lam, heh> without shadda nor superscript alif. |
The production process changed between Unicode 2.x and 3.0. From that point on, different custom software was used with an entirely new collection of TrueType fonts. With many upgrades, both to the software and the font collection, that process is still very much in place today. Every update of the font collection bears the risk of unintentional changes, and not all of them are caught be reviewers. Therefore, it would take some digging to find out whether the change from a glyph matching the decomposition to a glyph adding shadda and alif was indeed intentional at the time. |
I was curious to see if any fonts have FDF2 as alif, lam, lam, heh without shadda and superscript alif. I managed to find a handful:
There are most probably more. Including these, there are also more typefaces that do not ligate <lam, lam, heh> (regardless of what FDF2 they have). Some of these do have an optional discretionary ligature feature that does the ligature.
There may also be fonts that do FDF2 with shadda but no s. alif like https://www.linotype.com/1079191/hasan-alquds-unicode-regular-product.html?site=webfonts&format=ot-ttf&branding=std |
U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' (ﷲ) is supposed to render as alef-lam-lam-meem (with diacritics), but in some fonts, including Courier New, the Alef is missing.
http://www.fileformat.info/info/unicode/char/fdf2/fontsupport.htm
The code point could conceivably mean "the main l-l-m ligature in 'allah'", however the spec decomposes it as a-l-l-h, so all fonts should render the leading alef.
The text was updated successfully, but these errors were encountered: