Skip to content

Commit

Permalink
Addresses #14
Browse files Browse the repository at this point in the history
Mention of using the language subtag registry instead of making your own
list.

Also some cleanup of the language tagging of the Arabic examples.
  • Loading branch information
aphillips committed Jan 29, 2023
1 parent b73b570 commit d0c0799
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,15 @@ <h3>Defining language values</h3>
</details>
</div>

<p>BCP 47 contains one RFC dedicated to the syntax and subtags of language tags, and another dedicated to how to match two or more subtags. (This topic needs more detail, and may merit being a separate section.)</p>
<p>BCP 47 contains one RFC dedicated to the syntax and subtags of language tags, and another dedicated to how to match two or more subtags.</p>

<aside class="issue"><p>The topic of matching language tags needs more detail, and may merit being a separate section.</p></aside>

<div class="req" id="use_lstr">
<p class="advisement">Specifications SHOULD refer to the IANA Language Subtag Registry instead of providing lists of codes extracted from ISO 639, ISO 3166, or other standards.</p>
</div>

<p>As part of BCP 47, IANA maintains the language subtag registry, which is a publicly available, machine-readable list of valid subtags for use in language tags. While this registry is based on underlying ISO standards, such as ISO 639 (languages) and ISO 3166 (regions), the list is actively maintained, stabilized, and comprehensive in ways that other lists found on the Internet may not be. Each of the subtag types is kept in sync with parent standards with the help and participation of those standards maintainers. These include the various parts of ISO 639 (639-1, 639-2, 639-3), the ISO 15924 script codes, and ISO 3166 and UN M.49 region codes. Extracting or making your own list of codes or referring to ones found elsewhere can lead to maintenance problems or confusion.</p>
</section>


Expand Down Expand Up @@ -951,15 +959,15 @@ <h3>Generating or requiring creation of mixed direction strings</h3>
<p>Such a label, when generated in a right-to-left language, might not display correctly when assembled by the system from various substrings. Spillover effects can happen when text that has mixed left-to-right and right-to-left text are used in a larger label or token. A device label, such as a monitor name, might include strings such as the brand name ("Dell", "HP", etc.), part number ("S2721H", "A157-B", etc.), device capabilities ("75 Hz", "4ms", etc.), screen resolution (1024x768), and so forth. These often include ASCII letters, digits, and punctuation. For example, the English label for a monitor might be:</p>

<pre>
Brand A123B (1920 x 1080) 36" monitor 75 Hz, 4ms, built-in speakers
Brand A123B (1920 x 1080) 36" monitor 75 Hz, 4ms, built-in speakers
</pre>

<p>A naive translation to Arabic might look like this:</p>

<pre dir="rtl">
&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B (1920 x 1080) 36" &#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;, 75 Hz, 4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;, &#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;</pre>

<p>Notice how the part number (<kbd>A123B</kbd>) is separated from the brand name (<kbd>&#x0645;&#x0627;&#x0631;&#x0643;&#x0629;</kbd>), the measurement <kbd>36</kbd> and the marker for inches (<kbd>"</kbd>) have become separated and that the values <kbd>75 Hz</kbd> (where the measurement is in ASCII) and <kbd>4 ms</kbd> (where the measurement has an Arabic translation <kbd>&#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</kbd>) both separate the number from the measurement. Generating labels from a sequence of string tokens requires extra care to ensure that the complete string is "bidirectionally clean" and will display properly to the user. Adding isolating bidirectional controls to the above string produces better results:</p>
<p>Notice how the part number (<kbd lang="zxx" translate="no">A123B</kbd>) is separated from the brand name (<kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629;</kbd>), the measurement <kbd>36</kbd> and the marker for inches (<kbd>"</kbd>) have become separated and that the values <kbd>75 Hz</kbd> (where the measurement is in ASCII) and <kbd>4 ms</kbd> (where the measurement has an Arabic translation <kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</kbd>) both separate the number from the measurement. Generating labels from a sequence of string tokens requires extra care to ensure that the complete string is "bidirectionally clean" and will display properly to the user. Adding isolating bidirectional controls to the above string produces better results:</p>

<pre dir="rtl">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B &#x2066;(1920 x 1080)&#x2069; &#x2066;36"&#x2069; &#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;, &#x2067;75 Hz&#x2069;, &#x2067;4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;&#x2069;, &#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;
</pre>
Expand Down

0 comments on commit d0c0799

Please sign in to comment.