Skip to content

Arabic numerals (Draft)

Najib Tounsi edited this page May 28, 2017 · 14 revisions

Topics to talk about

Different families
Origin
Issues related to Bidi, numeral+signs, etc.
Issues related to families (fonts, keyboards, etc.)
What else...

Arabic Numeral Writing

There are mostly two families of numerals in Arabic script. One known as European digits aka as Arabic Digits (Unicode range U+0030-U+0039), the second is Arabic-Indic Digits (Unicode range U+0660-U+0669). The latter further gave another sub-notation for Persian/Urdu digits, the Extended (or eastern) Arabic-Indic Digits (Unicode range U+06F0-U+06F9), in which digits 4, 5 and 6 have another glyph. The following table summarizes those families

Arabic Numerals

TODO here in some words, historical considerations about the origin of these three families and why they differ, although they have the same indian origin.

Digits on the first row are predominant in Western Arabic regions, while the second row digits are used in most Middle-East countries, sometime along with the former. Persian (and Urdu) mostly uses the third category.

Arabic numbers are written with the lowest significant digits to the right and the highest digits to the left. That arrangement is identical to the Western one, even though Arabic script is written from right to left. Numbers with many digits may use delimiters for the decimal part and thousands separator.

Western digits use comma (U+002C) and full stop (U+002E) as decimal or thousands separator

  • 1.234,5 in Western (francophone) regions
  • 1,234.5 in Middle-East (anglophone) regions Thin space (U+2009) or narrow no-break space (U+202F) may also be used as thousands separator.

Arabic-indic numerals use two specific separators :

  • Arabic Decimal Separator ٫ (U+066B) like comma
  • Arabic Thousand Separator ٬ (U+066C)
  • Example : ١٬٢٣٤٫٥

An important fact to note here is the bidirectional category of these numbers.

  • European Digits (U+0030..U+0039) are of category "EN - European number",
  • Arabic-Indic Digits (U+0660..U+0669) are "AN - Arabic number",
  • Extended Arabic-Indic Digits (U+06F0..U+06F9) are classified "EN - European number", differently from their counterpart just above.

The difference in bidi category between Arabic-Indic digits and Eastern Arabic-Indic digits is due to the difference in bidi behavior desired in Arabic vs. Persian. (TODO, the origine of this decision.)

As a consequence, a sentence like "Five is written ۵ in Iran and ٥ in Egypt", say, will give (in RTL context) :

"‫Five is written ۵ in Iran and ٥ in Egypt‬"

which may seem weird/surprising

Arabic Numerals and other usages

Numbers do not always appear alone, and may come with other characters like financial symbols, fraction sign, decimals and/or thousands signs (excluding math expressions here). Note that there are Arabic percent/permille sign ٪(U+066A) and ؉ (U+0609), Arabic decimal separator ٫ (U+66B) and Arabic thousands separator ٬ (U+066C), mostly used with Arabic-Indics. Numerals can also come "separated" by (or mixed with) space or other signs (e.g. phone numbers +12 34 56 78 89, cars licence plate like 123 د‎ 4 etc.

A particular attention is needed here. Firstly, numbers have a weak directionality with regards to Bidi algorithm. For example, alongside a number, certain otherwise neutral characters, such as negative/positive sign, currency symbols, are likely to be treated as part of the number rather than a neutral.

Secondly, the placement of the accompanying signs and symbols may depend on regions. Generally Middle East vs. North-West Africa. This is not to mention punctuation signs.

  • The percent sign is to be placed on the left after the number (٪١٢ not ١٢٪), without space (٪ ١٢). With European numbers, the % percent sign is sometime used. @@ images to put here @@

  • Arabic decimal and thousand sign obey to the same rule as for European numbers ( ١٬٢٣٤٫٥٦ ). European signs are used with European numbers (1.234,56 or 1,234.56).

  • The fractions could be written, for a one-half say, @@ 1/2 or 2/1 @@

Other points :

  • How to know that a sign (space, comma ...) is a separator or a sign within a number? 12 34 56 78 90 is a phone number or a sequence of digits? Which may be inverted in RTL. A tip is to use a syntax like 12.34.56.78.90 or 12-34-56-78-90 for phones.

  • String like licence plate above 123 د‎ 4 would require a tag or a control character, but this not always desirable.

  • etc.

Note: We do not mention math expression. @@ See elsewhere @@

Other topics to talk about: keyboards layout WRT regions, which digits are used by default in different OS/Applications ...