Editorial improvements #46

r12a · 2020-06-30T15:53:26Z

aphillips

Generally looks really good, some comments below.

aphillips · 2020-06-30T16:21:28Z

index.html

@@ -113,11 +113,12 @@ <h3>Terminology</h3>
  <p>A <dfn data-lt="agreement|agreements|serialization agreement|serialization agreements|serialization">serialization agreement</dfn> (or "agreement" for short) is the common understanding between a producer and consumer about the serialization of string metadata: how it is to be understood, serialized, read, transmitted, removed, etc.</p>
  <p><dfn data-lt="language negotiation">Language negotiation</dfn> is any process which selects or filters content based on language. Usually this implies selecting content in a single language (or falling back to some meaningful default language that is available) by finding the best matching values when several languages or locales [[LTLI]] are present in the content. Some common language negotiation algorithms include the Lookup algorithm in [[BCP47]] or the BestFitMatcher in [[ECMA-402]].</p>
  <p><dfn>LTR</dfn> stands for "left-to-right" and refers to the inline base direction of left-to-right [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the left side of the page in horizontal text. It's used for scripts such as Latin, Cyrillic, Devanagari, and many others.</p>
-  <p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and a few others.</p>
+  <p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and  others.</p>


s/and others/and others/ (two spaces to one).

Perhaps a full rephrase is in order here, such as:

It's used for a variety of scripts which include Arabic, Hebrew, and Syriac among others.

Good idea. I added a few other scripts, too.

aphillips · 2020-06-30T16:23:46Z

index.html

 <p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p>
-<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p>
+<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p>


staying with the theme, change:

generally, Arabic and Hebrew letters

to

generally, letters in right-to-left scripts such as Arabic and Hebrew

aphillips · 2020-06-30T16:31:30Z

index.html

 <p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p>
-<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p>
+<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p>
+<p><dfn>Base direction</dfn>  determines whether items of content will be arranged <em>left-to-right</em> or <em>right-to-left</em>, relative to each other in bidirectional text. The focus of the Unicode Bidirectional Algorithm (UBA) is the way individual adjacent characters <em>of the same direction</em> are arranged relative to each other. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (the base direction). </p>


This paragraph doesn't make sense to me, particularly the emphasized part about same direction? Perhaps:

Base direction determines the starting point and general progression of content, either left-to-right or right-to-left, relative to each other in bidirectional text. The focus of UBA is the way in which adjacent logical characters are arranged relative to each other visually. When characters are of the same direction, this is primarily driven by the characters themselves. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (which stems from the base direction).

Here's my suggestion (i'll add to the update i'm making):

Base direction determines the general arrangement and progression of content when bidirectional text is displayed. The UBA is primarily focused on arranging adjacent characters, based on character properties. Base direction works at a higher level, and dictates (a) the visual order and direction in which runs of strongly-typed LTR and RTL character are displayed, and (b) where there are weakly-typed characters such as punctuation, the placement of those items relative to the other content.

aphillips · 2020-06-30T16:38:40Z

index.html

@@ -382,7 +376,7 @@ <h3 id="string_specific_direction">String-specific directional information</h3>
 <p class="advisement" id="bp-using_rlm_lrm">If relying on first-strong heuristics, encourage content developers to use RLM/LRM at the beginning of a string where it is necessary to force a particular base direction, but do not prepend one of these characters to existing strings.</p>
 <p class="advisement" id="bp-rlm_lrm_availability">Do not rely on the availability of RLM/LRM formatting characters in most cases.</p>
 <p>If string data is being provided by users or content developers in web forms or other simple environments, users may not be able to enter these formatting characters.  In fact, most users will probably be unaware that such characters exist, or how to use them.  A web form can render their use unnecessary for immediate inspection if it sets the base direction for the input (which it should).</p>
-<p class="advisement" id="bp-inferring_from_language">If metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p>
+<p class="advisement" id="bp-inferring_from_language">(Only) if metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p>


I think this ought to be reversed:

Specifications SHOULD NOT allow a base direction to be interpolated from available language metadata unless direction metadata is not available and cannot otherwise be provided.

Although the original MAY might be closer to being nice about it:

Specifications that cannot otherwise provide direction metadata or for situations where metadata is not provided MAY allow a base direction to be interpolated from available language metadata.

What do you think?

I liked your first alternative, since we need to make it clear that this is not fundamentally a bad approach, rather than an opportunity.

aphillips · 2020-06-30T16:47:54Z

index.html

@@ -577,7 +571,7 @@ <h4 id="bidiCase1">Final punctuation</h4>
  <p lang="he" dir="rtl" style="font-size: 1.8em; color: grey;">תוצאה: "בינלאומי!"</p>

 <p>The Hebrew characters are automatically displayed right-to-left by applying the Unicode Bidirectional Algorithm (UBA). However, in a LTR context the UBA cannot make the exclamation mark appear to the left of the Hebrew text, where it belongs, unless the base direction is set to RTL around the inserted string.</p>
-<p>In HTML this can be done by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p>
+<p>In HTML this can be be achieved for a LTR context by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p>


Two thoughts here:

"...inserting the string into a dir attribute with the value..." reads oddly. Probably it should say "...inserting the string into an element with a dir attribute with the value..."

I'm not sure your edit adds anything? It's true that in an RTL context this is kind of a no-op---provided there aren't other bidi issues involved. In HTML, the dir attribute also activates bidi isolation, so a case like an LTR run at the end (and outside) the quoted text would also be solved. Consider

ברוך הבא ל- W3C!

Edit 1 made. Thanks for catching.

Wrt point 2, i wanted to indicate the context of the example. As you say, the implications are different for RTL contexts, and i didn't want to get into that complexity but i did want to ensure that my frame of reference is clearly delineated.

iherman · 2020-07-01T13:51:00Z

index.html

@@ -354,7 +348,7 @@ <h3 id="string_specific_language">String-specific language information</h3>

 <p class="advisement" id="bp-lang_field_based_metadata">Use field-based metadata or string datatypes to indicate the language and the base direction for individual natural language strings.</p>

-<p>There is widespread low-level support for natural language string metadata because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p>
+<p> Low-level support for natural language string metadata is widespread because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p>


Sorry to be a bit picky...

various RDF specifications including [[JSON-LD]]

is not really precise. As far as the core RDF is concerned there is only one (family) of spec; in this case the reference should probably be https://www.w3.org/TR/rdf11-concepts/. Then there are several specification for the serialization of the general RDF concepts, of which JSON-LD is one. The example that you give below is in JSON-LD; it may be worth to note, then, that in this document the JSON-LD serialization is used for the examples, but it also applies to, e.g., Turtle.

Oh yes, there are many edits to be made relating to the current state, mostly in the section about best practices but also elsewhere, which need to be made. Addison has an action to work on that, and will hopefully take your comment into account. I'm not really intending to touch that stuff - i was just improving the english.

r12a · 2020-07-01T13:59:44Z

I have begun (still a long way to go) to create a separate article about use cases and requirements at https://w3c.github.io/i18n-drafts/articles/lang-bidi-use-cases/index.en

Editorial improvements

5f2953c

r12a requested a review from aphillips June 30, 2020 15:53

aphillips reviewed Jun 30, 2020

View reviewed changes

Updated per Addison's comments

af14fe4

iherman reviewed Jul 1, 2020

View reviewed changes

r12a merged commit bdf6e03 into gh-pages Jul 3, 2020

r12a deleted the r12a-patch-1 branch July 3, 2020 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial improvements #46

Editorial improvements #46

r12a commented Jun 30, 2020 •

edited by pr-preview bot

Loading

aphillips left a comment

aphillips Jun 30, 2020

r12a Jul 1, 2020

aphillips Jun 30, 2020

r12a Jul 1, 2020

aphillips Jun 30, 2020

r12a Jul 1, 2020

aphillips Jun 30, 2020

r12a Jul 1, 2020

aphillips Jun 30, 2020

r12a Jul 1, 2020

iherman Jul 1, 2020

r12a Jul 1, 2020

r12a commented Jul 1, 2020

Editorial improvements #46

Editorial improvements #46

Conversation

r12a commented Jun 30, 2020 • edited by pr-preview bot Loading

aphillips left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r12a commented Jul 1, 2020

r12a commented Jun 30, 2020 •

edited by pr-preview bot

Loading