Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial improvements #46

Merged
merged 2 commits into from
Jul 3, 2020
Merged

Editorial improvements #46

merged 2 commits into from
Jul 3, 2020

Conversation

r12a
Copy link
Contributor

@r12a r12a commented Jun 30, 2020

@r12a r12a requested a review from aphillips June 30, 2020 15:53
Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks really good, some comments below.

index.html Outdated
@@ -113,11 +113,12 @@ <h3>Terminology</h3>
<p>A <dfn data-lt="agreement|agreements|serialization agreement|serialization agreements|serialization">serialization agreement</dfn> (or "agreement" for short) is the common understanding between a producer and consumer about the serialization of string metadata: how it is to be understood, serialized, read, transmitted, removed, etc.</p>
<p><dfn data-lt="language negotiation">Language negotiation</dfn> is any process which selects or filters content based on language. Usually this implies selecting content in a single language (or falling back to some meaningful default language that is available) by finding the best matching values when several languages or locales [[LTLI]] are present in the content. Some common language negotiation algorithms include the Lookup algorithm in [[BCP47]] or the BestFitMatcher in [[ECMA-402]].</p>
<p><dfn>LTR</dfn> stands for "left-to-right" and refers to the inline base direction of left-to-right [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the left side of the page in horizontal text. It's used for scripts such as Latin, Cyrillic, Devanagari, and many others.</p>
<p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and a few others.</p>
<p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and others.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and others/and others/ (two spaces to one).

Perhaps a full rephrase is in order here, such as:

It's used for a variety of scripts which include Arabic, Hebrew, and Syriac among others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I added a few other scripts, too.

index.html Outdated
<p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p>
<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p>
<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

staying with the theme, change:

generally, Arabic and Hebrew letters

to

generally, letters in right-to-left scripts such as Arabic and Hebrew

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

index.html Outdated
<p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p>
<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p>
<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p>
<p><dfn>Base direction</dfn> determines whether items of content will be arranged <em>left-to-right</em> or <em>right-to-left</em>, relative to each other in bidirectional text. The focus of the Unicode Bidirectional Algorithm (UBA) is the way individual adjacent characters <em>of the same direction</em> are arranged relative to each other. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (the base direction). </p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph doesn't make sense to me, particularly the emphasized part about same direction? Perhaps:

Base direction determines the starting point and general progression of content, either left-to-right or right-to-left, relative to each other in bidirectional text. The focus of UBA is the way in which adjacent logical characters are arranged relative to each other visually. When characters are of the same direction, this is primarily driven by the characters themselves. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (which stems from the base direction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my suggestion (i'll add to the update i'm making):

Base direction determines the general arrangement and progression of content when bidirectional text is displayed. The UBA is primarily focused on arranging adjacent characters, based on character properties. Base direction works at a higher level, and dictates (a) the visual order and direction in which runs of strongly-typed LTR and RTL character are displayed, and (b) where there are weakly-typed characters such as punctuation, the placement of those items relative to the other content.

index.html Outdated
@@ -382,7 +376,7 @@ <h3 id="string_specific_direction">String-specific directional information</h3>
<p class="advisement" id="bp-using_rlm_lrm">If relying on first-strong heuristics, encourage content developers to use RLM/LRM at the beginning of a string where it is necessary to force a particular base direction, but do not prepend one of these characters to existing strings.</p>
<p class="advisement" id="bp-rlm_lrm_availability">Do not rely on the availability of RLM/LRM formatting characters in most cases.</p>
<p>If string data is being provided by users or content developers in web forms or other simple environments, users may not be able to enter these formatting characters. In fact, most users will probably be unaware that such characters exist, or how to use them. A web form can render their use unnecessary for immediate inspection if it sets the base direction for the input (which it should).</p>
<p class="advisement" id="bp-inferring_from_language">If metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p>
<p class="advisement" id="bp-inferring_from_language">(Only) if metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ought to be reversed:

Specifications SHOULD NOT allow a base direction to be interpolated from available language metadata unless direction metadata is not available and cannot otherwise be provided.

Although the original MAY might be closer to being nice about it:

Specifications that cannot otherwise provide direction metadata or for situations where metadata is not provided MAY allow a base direction to be interpolated from available language metadata.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked your first alternative, since we need to make it clear that this is not fundamentally a bad approach, rather than an opportunity.

index.html Outdated
@@ -577,7 +571,7 @@ <h4 id="bidiCase1">Final punctuation</h4>
<p lang="he" dir="rtl" style="font-size: 1.8em; color: grey;">תוצאה: "בינלאומי!"</p>

<p>The Hebrew characters are automatically displayed right-to-left by applying the Unicode Bidirectional Algorithm (UBA). However, in a LTR context the UBA cannot make the exclamation mark appear to the left of the Hebrew text, where it belongs, unless the base direction is set to RTL around the inserted string.</p>
<p>In HTML this can be done by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p>
<p>In HTML this can be be achieved for a LTR context by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts here:

  1. "...inserting the string into a dir attribute with the value..." reads oddly. Probably it should say "...inserting the string into an element with a dir attribute with the value..."

  2. I'm not sure your edit adds anything? It's true that in an RTL context this is kind of a no-op---provided there aren't other bidi issues involved. In HTML, the dir attribute also activates bidi isolation, so a case like an LTR run at the end (and outside) the quoted text would also be solved. Consider

ברוך הבא ל- W3C!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit 1 made. Thanks for catching.

Wrt point 2, i wanted to indicate the context of the example. As you say, the implications are different for RTL contexts, and i didn't want to get into that complexity but i did want to ensure that my frame of reference is clearly delineated.

@@ -354,7 +348,7 @@ <h3 id="string_specific_language">String-specific language information</h3>

<p class="advisement" id="bp-lang_field_based_metadata">Use field-based metadata or string datatypes to indicate the language and the base direction for individual natural language strings.</p>

<p>There is widespread low-level support for natural language string metadata because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p>
<p> Low-level support for natural language string metadata is widespread because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be a bit picky...

various RDF specifications including [[JSON-LD]]

is not really precise. As far as the core RDF is concerned there is only one (family) of spec; in this case the reference should probably be https://www.w3.org/TR/rdf11-concepts/. Then there are several specification for the serialization of the general RDF concepts, of which JSON-LD is one. The example that you give below is in JSON-LD; it may be worth to note, then, that in this document the JSON-LD serialization is used for the examples, but it also applies to, e.g., Turtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, there are many edits to be made relating to the current state, mostly in the section about best practices but also elsewhere, which need to be made. Addison has an action to work on that, and will hopefully take your comment into account. I'm not really intending to touch that stuff - i was just improving the english.

@r12a
Copy link
Contributor Author

r12a commented Jul 1, 2020

I have begun (still a long way to go) to create a separate article about use cases and requirements at https://w3c.github.io/i18n-drafts/articles/lang-bidi-use-cases/index.en

@r12a r12a merged commit bdf6e03 into gh-pages Jul 3, 2020
@r12a r12a deleted the r12a-patch-1 branch July 3, 2020 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants