Skip to content

Commit

Permalink
Address teleconference comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
aphillips committed Feb 2, 2023
1 parent 7825d50 commit bb58dfa
Showing 1 changed file with 38 additions and 32 deletions.
70 changes: 38 additions & 32 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -950,44 +950,43 @@ <h3><em>Detecting &amp; matching direction (TBD)</em></h3>
</section>

<section id="bidi_gen" class="subtopic">
<h3>Generating or requiring creation of mixed direction strings</h3>
<h3>Concatenation of strings</h3>

<p class="reviewComments"><a href="https://github.com/w3c/i18n-activity/labels/t%3Abidi_gen" target="_blank">See related review comments.</a></p>

<p>Specifications for APIs, protocols, or document formats sometimes provide [=natural language=] content fields, such as implementation or user-generated labels or descriptions. In addition to bidirectional text requirements found in the preceding sections, specifications can also need to provide guidance to users or content authors.</p>
<div class="req" id="bidi_gen_advice">
<p class="advisement">When a specification needs to suggest the creation or generation of a display name or other string value and the value could potentially come from multiple sub-strings, guidance SHOULD be provided about how to avoid problems with text directionality.</p>
</div>

<aside class="example" title="Example of bidi generation guidance">
<div class="note" role="note" id="bidi-gen-note-example">
<p class="example_note">If <em><code>_field_name_</code></em> contains or might contain [= bidirectional text =], care should be used to ensure that the string will display correctly without the application needing to process the string. For more information see <a href="https://www.w3.org/International/questions/qa-bidi-unicode-controls"><cite>How to use Unicode controls for bidi text</cite></a></p>
</div>
</aside>

<p>Specifications for APIs, protocols, or document formats sometimes require an implementation or end user to create display names or descriptions. When such a string is assembled from separate parts, it can result in problems with presentation or understanding due to the way that the <cite>Unicode Bidirectional Algorithm</cite> [[UAX9]] processes the assembled string.</p>

<aside class="example" title="Generating a display label">
<p>Suppose a specification provides a descriptive field in an API that is meant to be filled in by the implementation at runtime. One example of this is the <code>label</code> field in the [window-placement] API. The value of <code>label</code> might take various different implementation-dependent forms that include natural language text generated by the system or user-agent.</p>
<aside class="example" title="Spillover issues in a generated display name">
<p>Suppose a specification provides a field in an API that is meant to be filled in by the implementation at runtime. One example of this is the <code>label</code> field in the [[window-placement]] API. The value of <code>label</code> might take various different implementation-dependent forms that include natural language text generated by the system or user-agent.</p>

<p>Such a label, when generated in a right-to-left language, might not display correctly when assembled by the system from various substrings. Spillover effects can happen when text that has mixed left-to-right and right-to-left text are used in a larger label or token. A device label, such as a monitor name, might include strings such as the brand name ("Dell", "HP", etc.), part number ("S2721H", "A157-B", etc.), device capabilities ("75 Hz", "4ms", etc.), screen resolution (1024x768), and so forth. These often include ASCII letters, digits, and punctuation. For example, the English label for a monitor might be:</p>
<p>Such a label might include multiple pieces of information describing a given screen. In the example shown here, the label contains the brand name (<kbd>Brand</kbd>), part number (<kbd>A123B</kbd>), resolution (<kbd>(1920 x 1080)</kbd>), size (<kbd>36" monitor</kbd>), as well as various features (refresh rate, response time, and the presence of speakers). The resulting string in English might look like this (color has been added to make the effects more visible):</p>

<p style="background-color:#fdfdfd;indent:10px" dir="ltr"><code>
Brand A123B (1920 x 1080) 36" monitor, 75 Hz, 4ms, built-in speakers
<span style="color:red">Brand A123B</span> <span style="color:blue">(1920 x 1080)</span> <span style="color:green">36"</span> <span style="color:orange">monitor</span>, <span style="color:purple">75 Hz</span>, <span style="color:brown">4ms</span>, <span style="color:darkgrey">built-in speakers</span>
</code></p>

<p>A naive translation to Arabic might look like this:</p>
<p>This display string is merely the concatenation of the various sub-strings, some of which are separated by commas. If the implementation assembling this string were on a system running in a locale (such as the Arabic examples shown here) that uses a right-to-left language, the results of the same concatentation might look something like this:</p>

<p style="background-color:#fdfdfd;indent:10px" dir="rtl"><code>
&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B (1920 x 1080) 36" &#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;, 75 Hz, 4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;, &#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;</code></p>
<span style="color:red">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B</span> <span style="color:blue">(1920 x 1080)</span> <span style="color:green">36"</span> <span style="color:orange">&#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;</span>, <span style="color:purple">75 Hz</span>, <span style="color:brown">4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</span>, <span style="color:darkgrey">&#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;</span></code></p>

<p>Notice how different parts of the description have become separated: the brand name (<kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629;</kbd>) is no longer next to the part number (<kbd lang="zxx" translate="no">A123B</kbd>); the measurements <kbd>36"</kbd>, <kbd>75 Hz</kbd>, and <kbd>4 ms</kbd> (where the measurement has an Arabic translation <kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</kbd>) are each broken up and mixed together. The resulting string is unintelligible. Generating labels from such a sequence of string tokens requires extra care in order to display properly to the user. The addition of isolating bidirectional controls (either Unicode control characters or markup) to the above string produces better results:</p>
<p>The logical sequence of characters remains the same, but the visual presentation is no longer intelligible. Notice how different parts of the description have become separated: the brand name (<kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629;</kbd>) is no longer next to the part number (<kbd lang="zxx" translate="no">A123B</kbd>); the measurements <kbd>36"</kbd>, <kbd>75 Hz</kbd>, and <kbd>4 ms</kbd> (where the measurement has an Arabic translation <kbd lang="ar" dir="rtl" translate="no">&#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</kbd>) are each broken up and mixed together. Generating labels from such a sequence of string tokens requires extra care in order to display properly to the user. The addition of isolating bidirectional controls (either Unicode control characters or markup) to the above string produces better results:</p>

<p style="background-color:#fdfdfd;indent:10px" dir="rtl"><code>
<span dir="rtl">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B</span> <span dir="ltr">(1920 x 1080)</span> <span dir="rtl"><span dir="ltr">36"</span> &#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;</span>, <span dir="rtl">75 Hz</span>, <span dir="rtl">4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</span>, <span dir="rtl">&#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;</span>
<span style="color:red" dir="rtl">&#x0645;&#x0627;&#x0631;&#x0643;&#x0629; A123B</span> <span style="color:blue" dir="ltr">(1920 x 1080)</span> <span style="color:green" dir="ltr">36"</span> <span style="color:orange" dir="rtl">&#x0634;&#x0627;&#x0634;&#x0629; &#x0627;&#x0644;&#x0643;&#x0645;&#x0628;&#x064A;&#x0648;&#x062A;&#x0631;</span>, <span style="color:purple" dir="rtl">75 Hz</span>, <span style="color:brown" dir="rtl">4 &#x0645;&#x0644;&#x0644;&#x064A; &#x062B;&#x0627;&#x0646;&#x064A;&#x0629;</span>, <span style="color:darkgrey" dir="rtl">&#x0645;&#x0643;&#x0628;&#x0631;&#x0627;&#x062A; &#x0635;&#x0648;&#x062A; &#x0645;&#x062F;&#x0645;&#x062C;&#x0629;</span>
</code></p>
</aside>

<div class="req" id="bidi_gen_advice">
<p class="advisement">Specifications that require or suggest system or user-generated natural language text values SHOULD provide guidance for the generation of bidirectional labels for those languages that require it.</p>
</div>

<aside class="example" title="Example of bidi generation guidance">
<div class="note" role="note" id="bidi-gen-note-example">
<p class="example_note">If <em><code>_field_name_</code></em> contains or might contain [= bidirectional text =], care should be used to ensure that the string will display correctly without the application needing to process the string. For more information see <a href="https://www.w3.org/International/questions/qa-bidi-unicode-controls"><cite>How to use Unicode controls for bidi text</cite></a></p>
</div>
</aside>


</section>
</section>

Expand Down Expand Up @@ -2834,20 +2833,27 @@ <h3>Specifying sort and search functionality</h3>
</aside>


<p>Applications often need to organize sets of information or content. Frequently this involves sorting the content so that users can find what they are looking for. Many data types, such as numbers or dates, are easily sorted by comparing the values. When it comes to textual information, however, the nature of character encodings and user expectations regarding "alphabetical" order brings some additional complexity.</p>
<p>Applications often need to organize sets of information or content. Frequently this involves sorting the content. Many data types, such as numbers or dates, are easily sorted by comparing the values. When it comes to textual information, however, the nature of character encodings and user expectations regarding "alphabetical" order brings some additional complexity.</p>

<p>One key choice is whether the sorting of textual data will strictly internal or whether the results will be shown to users and thus need to be sorted in a [=locale=]-sensitive (that is, following the sorting rules of a specific language or culture) manner.</p>

<section id="internal_sort" class="subtopic">
<h4>Program Internal Sorting</h4>

<div class="req" id="char_sort_internal_only">
<p class="advisement">Specifications or implementations that require a program-internal, fast, and deterministic sorting of text which is not intended for human viewing or interaction SHOULD specify that strings are sorted according to their definition of string. For string types based on UTF-16 (such as DOMString or JavaScript), specify <em>ascending code unit</em> order. For data that uses scalar value strings (such as USVString or many XML processes), specify <em>ascending code point</em> order.</p>
<p class="advisement">Specifications or implementations that require a program-internal, fast, and deterministic sorting of text which is not intended for human viewing or interaction SHOULD specify that strings are sorted according to their definition of string. For scalar value strings (such as <a href="https://webidl.spec.whatwg.org/#idl-USVString">USVString</a> or many XML processes), specify <em>ascending code point</em> order. For string types based on UTF-16 (such as <a href="https://webidl.spec.whatwg.org/#idl-DOMString">DOMString</a> or in many JavaScript APIs), specify <em>ascending code unit</em> order. </p>
<details class="links"><summary>explanations &amp; examples</summary>
<a href="#char_string">Defining 'string'</a>
</details>
</div>

<p>One key choice is whether the sorting of textual data will be shown to users and thus need to be [=locale=]-sensitive (that is, following the sorting rules of a specific language or culture) or whether the sorting is strictly internal. There are two potential internal sorting sequences: ordering by Unicode [=code point=] or ordering by [=code unit=]. For either type of ordering, the resulting list will not match any particular alphabet or lexicographical order.</p>
<p>There are two potential internal sorting sequences: ordering by Unicode [=code point=] or ordering by UTF-16 [=code unit=]. For either type of ordering, the resulting list will not match any particular alphabetic or lexicographical order.</p>

<p>Sorting by [=code point=] makes sense when strings are stored and processed as a sequence of code points, such as in a <a href="https://webidl.spec.whatwg.org/#idl-USVString">USVString</a>. Sorting by [=code unit=] makes sense when strings are stored and processed using the underlying encoding, such as in a <a href="https://webidl.spec.whatwg.org/#idl-DOMString">DOMString</a>.</p>

<!--
<p>For example, consider JavaScript's function <code>Array.prototype.sort</code> applied to an <code>Array</code> of <code>String</code> values. In JavaScript, a String is a sequence of UTF-16 code units. This ordering compares the 16-bit (UTF-16) code units in each string, so [=supplementary characters=], which are encoded as a [=surrogate pair=], compare differently in this sort order than when ordering by code point.</p>
-->

<aside class="example" title="Code point vs. code unit ordering">
<p>Consider two strings, one containing <span class="codepoint" translate="no"><bdi lang="ja">&#x1f63a;</bdi> [<span class="uname">U+1F63A SMILING CAT FACE WITH OPEN MOUTH</span>]</span> and the other containing <span class="codepoint" translate="no"><bdi lang="ja">&#xff5e;</bdi> [<span class="uname">U+FF5E FULL WIDTH TILDE</span>]</span>.</p>
Expand All @@ -2866,22 +2872,21 @@ <h3>Specifying sort and search functionality</h3>
<p>Note that UTF-8 <em>code unit order</em> (that is, when sorting by byte values in UTF-8 encoded byte strings) is the same as code point order.</p>
</aside>


</section>
<section id="human_sorting" class="subtopic">
<h4>Human-visible Sorting</h4>

<p>Specifications or applications that need to deal with sorting natural language text for display to users face some additional complexity. Unicode defines a default collation (sorting) order as part of the <cite>Unicode Collation Algorithm</cite> [[UTS10]], which is then tailored to meet the needs of specific languages, [=locales=], and cultures.</p>

<div class="req" id="char_sort_units">
<p class="advisement">Software that sorts or searches text for display to users SHOULD do so on the basis of appropriate collation units and ordering rules for the relevant language and/or application.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod/#sec-CollationUnits">Units of collation, C006</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
</details>
</div>

<p>Specifications or applications that need to deal with sorting natural language text for display to users face some additional complexity. Unicode defines a default collation (sorting) order as part of the <cite>Unicode Collation Algorithm</cite> [[UTS10]], which can then be tailored to meet the needs of specific languages, <a>locales</a>, and cultures.</p>

<aside class="issue" id="char_sort_user_issue">
<p>The following requirement is somewhat unclear for specification authors. There are many places where what I'd want to advise specs to do is follow the language (locale) of the given document or of the application or to provide controls so that the application can choose appropriately. The "current user", where it means "operating system" or "user agent host system's locale" or "browser's localization" is not always what is expected.</p>
</aside>

<div class="req" id="char_sort_user">
<p class="advisement">Where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' SHOULD be determined to be that of the current user, and may thus differ from user to user.</p>
<p class="advisement">When sorting text for presentation to users, the sort order SHOULD be tailored according to the most appropriate [=locale=] for the specific user in that application; thus the presentation order may differ from user to user.</p>
<details class="links"><summary>explanations &amp; examples</summary>
<p><a href="https://www.w3.org/TR/charmod/#sec-CollationUnits">Units of collation, C007</a>, in <cite>Character Model for the World Wide Web: Fundamentals</cite></p>
</details>
Expand All @@ -2902,6 +2907,7 @@ <h3>Specifying sort and search functionality</h3>
</div>
</section>
</section>
</section>



Expand Down

0 comments on commit bb58dfa

Please sign in to comment.