diff --git a/index.html b/index.html index ec4442a..4bb2297 100644 --- a/index.html +++ b/index.html @@ -223,18 +223,18 @@

Direction set from right to left.

-

Unicode Bidirectional Algorithm - [[!BIDI]] details an algorithm for rendering right-to-left text and covers a myriad of - situations in mixing different kinds of characters. A simpler explanation of the basics of - the algorithm exists in the W3C article Unicode +

Unicode Bidirectional Algorithm (or + bidi algorithm, for short) [[!BIDI]] details an algorithm for + rendering right-to-left text and covers a myriad of situations in mixing different kinds of + characters. A simpler explanation of the basics of the algorithm exists in the W3C article + Unicode Bidirectional Algorithm basics. [[UBA-BASICS]] You can refer to these documents for more information about Unicode’s bidirectional algorithm.

-

A brief overview of the bidirectional (bidi for short) - algorithm follows, because the direction is an essential part of how Arabic script is - used.

+

A brief overview of the bidirectional algorithm follows, because the direction is + an essential part of how Arabic script is used.

The characters of a text are digitally stored and transferred in the same order that they @@ -303,9 +303,10 @@

Direction -

Unicode has a bidi category property defined for each character - that is used to determine the direction of each character. All the Arabic letters are marked - as right-to-left characters, while Latin characters have the left-to-right category.

+

Unicode has a bidi class (or bidi + type) property defined for each character that is used to determine the direction of + each character. All the Arabic letters are marked as right-to-left characters, while Latin + characters have the left-to-right category.

Some characters, mostly punctuations, are neutral. The @@ -347,19 +348,20 @@

Joining Forms @@ -381,14 +383,14 @@

Joining Forms @@ -398,50 +400,49 @@

Joining Categories

-

There are different categories of letters based on their joining - behavior: -

+

There are different categories of letters based on their joining behavior:

- - + +
- Right-joining letters only have two forms of final and - isolated. + Right-joining letters only have two forms of final and isolated.
- Most of Arabic letters are either dual-joining or right-joining. + Most of Arabic letters are either dual-joining or right-joining.
One joining form of U+0621 ARABIC LETTER HAMZAH.
- Non-Joining letters only have one form: isolated. + Non-Joining letters only have one form: isolated.
@@ -457,26 +458,29 @@

Joining Rules
    -
  1. Letters of each word join together whenever possible, implicitly.
  2. +
  3. Letters of each word join together whenever possible, + implicitly.
  4. -
  5. In some languages, like Persian and Urdu, there are words—mostly, but not limited to, - compound words—that require explicit breaks in the joining of letters, although joining - would otherwise be possible.
  6. +
  7. In some languages, like Persian and Urdu, there are words—mostly, + but not limited to, compound words—that require explicit breaks in the joining of + letters, although joining would otherwise be possible.
  8. -
  9. In certain cases, a letter can be in a join-to-left form +
  10. In certain cases, a letter can be in a join-to-left form without actually connecting to anything on the left, whether there’s any letter or not. This is often seen in list counters, abbreviations, and other cases where letters do not - have a word context, or are taken out of their original word context.
  11. + have a word context, or are taken out of their original word context. + -
  12. In rare cases of words splitting where letters are joined, first letter of the second - half will be in a join-to-right form without any previous +
  13. In rare cases of words splitting where letters are joined, first + letter of the second half will be in a join-to-right form without any previous letter. This behavior is limited to special cases like blanking specific letters of a word, line breaks in a paragraph, and word breaks across poetry verses. No standalone - word can have any letters in join-to-right form without - joining on the right-hand side.
  14. + word can have any letters in join-to-right form without joining on the right-hand + side. +
@@ -518,15 +522,15 @@

Disjoining Enforcement one word) which would normally join together, but should not. In Unicode, for such a case, a special character should be used to enforce disjoining of these letters. This character is called U+200C ZERO WIDTH NON-JOINER, or - ZWNJ for short.

+ ZWNJ for short.

-
+
TBD: ZWNJ example.
- Example of using ZWNJ for disjoining - enforcement. + Example of using ZWNJ for disjoining + enforcement.
@@ -541,19 +545,18 @@

Joining Enforcement joining form when it would not happen normally. For example, some abbreviation methods us Initial Form of letters, when possible, for every letter in the abbreviation. Again, in Unicode, a special character should be used to enforce joining on this letter. This - character is called U+200D ZERO WIDTH JOINER, or ZWJ for short.

+ character is called U+200D ZERO WIDTH JOINER, or + ZWJ for short.

-

Besides ZWJ, there’s another special Unicode character, - U+0640 ARABIC TATWEEL, which enforces joining behavior (join - causing) on letters next to it. But, in contrast to ZWJ, - TATWEEL has a glyph shape, looking like a hyphen and usually - as wide as the SPACE glyph, which connects to the letters on the main joining line - (a.k.a. base-line). So, using TATWEEL would give a similar - Joining Enforcement behavior, but has a side effect of wider length for the letter, which - is not always desired. That’s why it’s highly recommended to only use ZWJ for joining control.

+

Besides ZWJ, there’s another special Unicode character, U+0640 ARABIC TATWEEL, which enforces joining behavior (join causing) on + letters next to it. But, in contrast to ZWJ, TATWEEL has a glyph shape, + looking like a hyphen and usually as wide as the SPACE glyph, which connects to the + letters on the main joining line (a.k.a. base-line). So, using TATWEEL would give + a similar Joining Enforcement behavior, but has a side effect of wider length for the + letter, which is not always desired. That’s why it’s highly recommended to only use + ZWJ for joining control.

@@ -561,14 +564,12 @@

Joining Enforcement "TBD: TATWEEL example.">
- Example of using ZWJ (recommended) and TATWEEL (not recommended) for joining - enforcement. + Example of using ZWJ (recommended) and TATWEEL (not recommended) for + joining enforcement.

- In Unicode, ZWNJ and ZWJ are called - Joining Control Characters. + In Unicode, ZWNJ and ZWJ are called Joining Control Characters.

Joining-Disjoining Enforcement @@ -576,19 +577,18 @@

Joining-Disjoining EnforcementTwo enforcement methods mentioned above can be combined together to form a - Joining-Disjoining Enforcement method, that enables - Joining Rule 3 for cases when there’s a dual-joining/right-joining letter after a - join-to-left letter, which should not be joined to its - previous letter.

+ Joining-Disjoining Enforcement method, that enables
Joining Rule 3 for cases when there’s a + dual-joining/right-joining letter after a join-to-left letter, which + should not be joined to its previous letter.

TBD: ZWJ+ZWNJ example.
- Example of using <ZWJ, ZWNJ&gd; for joining-disjoining enforcement. + Example of using <ZWJ, ZWNJ&gd; for + joining-disjoining enforcement.

@@ -612,13 +612,14 @@

Joining Segments

-

A sequence of letters that join together are called a Joining - Segment. Regardless of language, joining segments have no - direct relationship to syllables.

+

A sequence of letters that join together are called a Joining Segment. + Regardless of language, joining segments have no direct relationship to + syllables.

-

Two types of joining segments exist: closed and open.

+

Two types of joining segments exist: closed and open.

@@ -626,23 +627,23 @@

Closed Joining Segments

-

Joining Segments usually have a closed form, meaning that they start in a non-join-to-right form and end in a non-join-to-left - form. Closed joining segments are the result of segments - either start and end with their normal behavior (Joining Rule - 1), or by disjoining enforcement (Joining Rule 2).

+

Joining Segments usually have a closed form, meaning that they start in a + non-join-to-right form and end in a non-join-to-left form. Closed + joining segments are the result of segments either start and end with their normal + behavior (Joining Rule 1), or by disjoining enforcement (Joining + Rule 2).

There are two possible types of closed segments:

    -
  • Single-Letter Closed Segment, which contains only one letter that - is in its Isolated form.
  • +
  • Single-Letter Closed Segment, which contains only one letter that is in + its Isolated form.
  • -
  • Multi-Letter Closed Segment, which contains more than one letter, +
  • Multi-Letter Closed Segment, which contains more than one letter, starting with an Initial form, zero or more Medial forms, and ending with a Final form.
@@ -665,27 +666,25 @@

Open Joining Segments

-

Under the certain cases, as noted in Joining Rules 3 and 4, - joining segments can start with a join-to-right form, or end with an join-to-left - form, or both.

+

Under the certain cases, as noted in Joining Rules 3 + and 4, joining segments can start with a + join-to-right form, or end with a join-to-left form, or both.

There are three possible types of these segments:

    -
  • Open-On-Left Segment, which contains one or more Dual-Joining - letters, starting with an Initial form and continuing with zero or more Medial - forms.
  • +
  • Open-On-Left Segment, which contains one or more Dual-Joining letters, + starting with an Initial form and continuing with zero or more Medial forms.
  • -
  • Open-On-Right Segment, which starts with zero or more Medial Form +
  • Open-On-Right Segment, which starts with zero or more Medial Form letters, and ends with a Final Form letter.
  • -
  • Open-On-Both-Sides Segment, which contains one or more - Dual-Joining letters, all in their Medial Form.
  • +
  • Open-On-Both-Sides Segment, which contains one or more Dual-Joining + letters, all in their Medial Form.
@@ -708,22 +707,21 @@

Non-Joining Characters

-

Arabic Letters, two Joining Control Characters (ZWNJ and ZWJ), and TATWEEL are the only characters used in the Arabic writing system with - joining behavior.

+

Arabic Letters, two Joining Control Characters (ZWNJ and ZWJ), and + TATWEEL are the only characters used in the Arabic writing system with joining + behavior.

Arabic diacritics, other Unicode non-spacing marks, and most - Unicode format control characters are considered transparent in joining behavior.

+ Unicode format control characters are considered transparent in joining behavior.

All other Unicode characters in Arabic script (as well as Latin and many other major scripts) are non-joining and do not take any joining forms other than Isolated.

-

For the details of how Arabic Cursive Joining algorithm +

For more the details on Arabic Cursive Joining algorithm, please refer to chapter Middle East-I — Modern and Liturgical Scripts of The Unicode Standard. [[!UNICODE]]

@@ -824,12 +822,12 @@

Arabic Style and Calligraphy Korʼan.

-

In general we group under the generic term Naskh - (copy/inscription) the scripts which are meant for reading at smaller sizes and are - suitable for books and texts to be read, e.g. the Korʼan, and as - Kufic the highly stylized font styles used for ornamentation and - more styled writings. Nevertheless, the rich evolution of the Arabic script led to the - distinctive enumeration of a number of additional named styles.

+

In general we group under the generic term Naskh (copy/inscription) the + scripts which are meant for reading at smaller sizes and are suitable for books and texts + to be read, e.g. the Korʼan, and as Kufic the highly stylized font + styles used for ornamentation and more styled writings. Nevertheless, the rich evolution of + the Arabic script led to the distinctive enumeration of a number of additional named + styles.

Similarly, two other generic terms are used to classify styles : Mabsut (wa @@ -1042,205 +1040,195 @@

Arabic Script and Typography simplest Naskh style?

-
    -
  1. -

    Multi-level baselines -

    +
    +
    Multi-level baselines
    -

    Letters may join through a finely inclined line

    +

    Letters may join through a finely inclined line

    -
    - -
    +
    + +
    -

    or two, square-ended lines

    +

    or two, square-ended lines

    -
    - -
    +
    + +
    -

    Multilevel baselines don't occur in all fonts. The above examples use the Arabic - Typesetting font. Compare those examples to to more typical fonts:

    +

    Multilevel baselines don't occur in all fonts. The above examples use the Arabic + Typesetting font. Compare those examples to to more typical fonts:

    -

    - -

    -
  2. +

    normal Font +

    + -
  3. -

    Multi-context joining -

    +
    +
    Multi-context joining
    -

    Rendering of letters depends not only on their place in the word (initial, medial, - final) but also on their neighboring letters, i.e. the letter they join with. Each - letter has a different appearance in each combination.

    +

    Rendering of letters depends not only on their place in the word (initial, medial, + final) but also on their neighboring letters, i.e. the letter they join with. Each letter + has a different appearance in each combination.

    -
    - Different initial shape of noon +
    + Different initial shape of noon -
    - Initial letter noon, showing many different forms. -
    -
    +
    + Initial letter noon, showing many different forms. +
    +
    -

    Fonts don't always comply with or respect this kind of tuning. To do so, fonts need many glyphs in order to adapt to each - context. In more modern typefaces some of these connections are implemented by - ligatures, but ligatures can't capture or cover all joining behavior.

    +

    Fonts don't always comply with or respect this kind of tuning. To do so, fonts need many glyphs in order to adapt to each + context. In more modern typefaces some of these connections are implemented by ligatures, + but ligatures can't capture or cover all joining behavior.

    -

    In the two left most words, the initial noon differs in that one raises a kind of - stroke. This property of raising a stroke is common for a number of letters (beh, teh, - noon, theh) which are taller than their connected letters in order to be distinguished - in some contexts, such as- vs. Beh without stroke after seen , or to resolve ambiguity. See also the section about teeth letters - below.

    -
  4. +

    In the two left most words, the initial noon differs in that one raises a kind of + stroke. This property of raising a stroke is common for a number of letters (beh, teh, + noon, theh) which are taller than their connected letters in order to be distinguished in + some contexts, such as+ vs. Beh without stroke after seen , or to resolve ambiguity. See also the section about teeth letters + below.

    + -
  5. -

    Words as groups of letters -

    +
    +
    Words as groups of letters
    -

    A word shape is not (only) a "horizontal" connections of letters, but of groups of - letters (syntagmes).

    +

    A word shape is not (only) a "horizontal" connections of letters, but of groups of + letters (syntagmes).

    -

    Example two words in some nice Naskh font.

    +

    Example two words in some nice Naskh font.

    - - +
    - Groups of letters are colored blue or red -
    + - - - + + + - - - -
    + Groups of letters are colored blue or red +
    Aleph and two groups of letters to form a word -
    Aleph and two groups of letters to form a word + two other group of letters -
    + two other group of letters + + + + -

    To compare with the same words in more usual font:

    +

    To compare with the same words in more usual font:

    - - +
    - Can't really say letter groups. Rather a "horizontal sequence of letters of almost - same width". -
    + - - - + + + - - - -
    + Can't really say letter groups. Rather a "horizontal sequence of letters of almost + same width". +
    same word in more normal font -
    same word in more normal font + same word in default font -
    + same word in default font + + + + -

    Group combinations cannot be covered by general or usual ligatures.

    -
  6. +

    Group combinations cannot be covered by general or usual ligatures.

    + -
  7. -

    Vertical joining

    +
    +
    Vertical joining
    -

    Groups of letters may also "join" vertically (top down) instead of right to left. - And not all fonts permit this.

    +

    Groups of letters may also "join" vertically (top down) instead of right to left. And + not all fonts permit this.

    - - - - +
    Vertical joining -
    + + + - + - - + + - - + + - + - - - -
    Vertical joining + vs.vs.horizontal joing -
    horizontal joing +
    Joining happens almost vertical
    Joining happens almost vertical - + Joining happens horizontal
    + Joining happens horizontal + + + -

    Once again, some fonts try standard ligatures, but this is not ligature. This is - rather (good) writing practice/style.

    +

    Once again, some fonts try standard ligatures, but this is not ligature. This is + rather (good) writing practice/style.

    -

    One should note that all this characteristics has not only an aesthetic side, but - also play a role in justification. It is at the discretion of (hand writing) authors to - chose the best kind of joining to suit the desired line width. Should then be a general - rule on that. But to achieve such justification would require sophisticated - algorithms.

    -
  8. +

    One should note that all this characteristics has not only an aesthetic side, but also + play a role in justification. It is at the discretion of (hand writing) authors to chose + the best kind of joining to suit the desired line width. Should then be a general rule on + that. But to achieve such justification would require sophisticated algorithms.

    + -
  9. -

    The so called teeth letters. -

    +
    +
    The so called teeth letters.
    -

    Letters having uniform medial shape, align in a kind of teeth.

    +

    Letters having uniform medial shape, align in a kind of teeth.

    -
    - -
    +
    + +
    -

    Even in the teeth context letter shape may vary. It's not the same letters (in red) - which raise the stroke in the two figures.

    -
  10. -
+

Even in the teeth context letter shape may vary. It's not the same letters (in red) + which raise the stroke in the two figures.

+ @@ -1571,12 +1559,13 @@

Preferred Terminology
    -
  • European Numerals are 0, 1, 2, 3, 4, 5, - 6, 7, 8, 9. They are also referred to as Western Arabic - Numerals or simply as Arabic Numerals. Although these - are terminologically correct terms, to avoid confusions we will refrain from using these - phrases to refer to these numerals. European Numerals or - ASCII Digits are used instead;
  • +
  • + European Numerals are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. They are also referred + to as Western Arabic Numerals or simply as Arabic Numerals. Although + these are terminologically correct terms, to avoid confusions we will refrain from + using these phrases to refer to these numerals. European Numerals or ASCII + numerals are used instead; +
  • Arabic-Indic Numerals are Preferred Terminology "ltr">٩;
  • -
  • Eastern Arabic-Indic Numerals are ۰, ۱, - ۲, ۳, ۴, ۵, ۶, ۷, ۸, ۹;
  • +
  • Eastern Arabic-Indic Numerals are ۰, ۱, ۲, ۳, ۴, ۵, ۶, ۷, ۸, ۹;
  • -
  • Extended Arabic-Indic Numerals same as Eastern Arabic-Indic - Numerals.
  • +
  • Extended Arabic-Indic Numerals same as Eastern Arabic-Indic Numerals.
  • -
  • Western Arabic Numerals same as European Numerals;
  • +
  • Western Arabic Numerals same as European Numerals;
  • -
  • Eastern Arabic Numerals is used to refer to both Arabic-Indic and - Eastern Arabic-Indic Numerals. Should be avoided due to ambiguity;
  • +
  • Eastern Arabic Numerals is used to refer to both Arabic-Indic and Eastern + Arabic-Indic Numerals. Should be avoided due to ambiguity;
  • -
  • Indic Numerals should be avoided to refer to either of Arabic-Indic - or Eastern Arabic-Indic numerals.
  • +
  • Indic Numerals should be avoided to refer to either of Arabic-Indic or + Eastern Arabic-Indic numerals.
  • -
  • Digit, Numeral digit, and Numeral - are used as synonyms.
  • +
  • Digit, Numeral digit, and Numeral are used as + synonyms.
@@ -2172,9 +2159,9 @@

Justification

Of the four basic justification methods (flush left, flush right, justified, and centered), justified is the most challenging, as it requires changing the widths of the lines - to a pre-defined measure. Measure refers to the width of a column - of text. In a justified paragraph the width of all the lines should be the same as the - paragraph’s measure (except, of course, the last line).

+ to a pre-defined measure. Measure refers to the width of a column of text. In a + justified paragraph the width of all the lines should be the same as the paragraph’s measure + (except, of course, the last line).

In Arabic there are six mechanisms for changing the width of a line of text. Each one has @@ -2183,10 +2170,9 @@

Justification

An important factor in the application of these mechanisms is their success in creating an - even color. The color of the text refers to the amount of - ink/blackness used to print or show a block of text. Color describes the density of the text - against its background. Poorly justifying paragraphs can create uneven distribution of - color.

+ even color. The color of the text refers to the amount of ink (or blackness) used + to print or show a block of text. Color describes the density of the text against its + background. Poorly justifying paragraphs can create uneven distribution of color.

These mechanisms are not exclusive. Quite the contrary, they are commonly used