-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial improvements #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks really good, some comments below.
index.html
Outdated
@@ -113,11 +113,12 @@ <h3>Terminology</h3> | |||
<p>A <dfn data-lt="agreement|agreements|serialization agreement|serialization agreements|serialization">serialization agreement</dfn> (or "agreement" for short) is the common understanding between a producer and consumer about the serialization of string metadata: how it is to be understood, serialized, read, transmitted, removed, etc.</p> | |||
<p><dfn data-lt="language negotiation">Language negotiation</dfn> is any process which selects or filters content based on language. Usually this implies selecting content in a single language (or falling back to some meaningful default language that is available) by finding the best matching values when several languages or locales [[LTLI]] are present in the content. Some common language negotiation algorithms include the Lookup algorithm in [[BCP47]] or the BestFitMatcher in [[ECMA-402]].</p> | |||
<p><dfn>LTR</dfn> stands for "left-to-right" and refers to the inline base direction of left-to-right [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the left side of the page in horizontal text. It's used for scripts such as Latin, Cyrillic, Devanagari, and many others.</p> | |||
<p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and a few others.</p> | |||
<p><dfn>RTL</dfn> stands for "right-to-left" and refers to the inline base direction of right-to-left [[UAX9]]. This is the base text direction used by languages whose starting character progression begins on the right side of the page in horizontal text. It's used for scripts such as Arabic, Hebrew, Syriac, and others.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and others/and others/ (two spaces to one).
Perhaps a full rephrase is in order here, such as:
It's used for a variety of scripts which include Arabic, Hebrew, and Syriac among others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I added a few other scripts, too.
index.html
Outdated
<p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p> | ||
<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p> | ||
<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
staying with the theme, change:
generally, Arabic and Hebrew letters
to
generally, letters in right-to-left scripts such as Arabic and Hebrew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
index.html
Outdated
<p><dfn>Bidi isolation</dfn> often needs to be applied to a range of text in order to prevent the automatic rules of the Unicode Bidirectional Algorithm incorrectly ordering that content in relation to the surrounding text. For example, numbers following right-to-left text in memory are automatically positioned to the left of that text by the Bidi Algorithm, but sometimes need to appear to the right. Another example occurs when lists of RTL items occur in a LTR sentence: the Bidi Algorithm will automatically assume that the order of items in the list should be "3 ,2 ,1", but actually what's needed is "1, 2, 3". In HTML, bidi isolation can be applied to a range of text by enclosing it in an element with a <code class="kw" translate="no">dir</code> attribute. In plain text there are Unicode formatting characters that can do the job. These mechanisms remove unwanted 'spillover effects'.</p> | ||
<p>Unicode code points are associated with properties relating to text direction. Generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found. <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole.</p> | ||
<p> <dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate base direction for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, Arabic and Hebrew letters have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p> | ||
<p><dfn>Base direction</dfn> determines whether items of content will be arranged <em>left-to-right</em> or <em>right-to-left</em>, relative to each other in bidirectional text. The focus of the Unicode Bidirectional Algorithm (UBA) is the way individual adjacent characters <em>of the same direction</em> are arranged relative to each other. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (the base direction). </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph doesn't make sense to me, particularly the emphasized part about same direction? Perhaps:
Base direction determines the starting point and general progression of content, either left-to-right or right-to-left, relative to each other in bidirectional text. The focus of UBA is the way in which adjacent logical characters are arranged relative to each other visually. When characters are of the same direction, this is primarily driven by the characters themselves. However, when there are clumps of both LTR and RTL character sequences, or when there are weak characters such as punctuation, the relative placement of these items depends on the surrounding directional context (which stems from the base direction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my suggestion (i'll add to the update i'm making):
Base direction determines the general arrangement and progression of content when bidirectional text is displayed. The UBA is primarily focused on arranging adjacent characters, based on character properties. Base direction works at a higher level, and dictates (a) the visual order and direction in which runs of strongly-typed LTR and RTL character are displayed, and (b) where there are weakly-typed characters such as punctuation, the placement of those items relative to the other content.
index.html
Outdated
@@ -382,7 +376,7 @@ <h3 id="string_specific_direction">String-specific directional information</h3> | |||
<p class="advisement" id="bp-using_rlm_lrm">If relying on first-strong heuristics, encourage content developers to use RLM/LRM at the beginning of a string where it is necessary to force a particular base direction, but do not prepend one of these characters to existing strings.</p> | |||
<p class="advisement" id="bp-rlm_lrm_availability">Do not rely on the availability of RLM/LRM formatting characters in most cases.</p> | |||
<p>If string data is being provided by users or content developers in web forms or other simple environments, users may not be able to enter these formatting characters. In fact, most users will probably be unaware that such characters exist, or how to use them. A web form can render their use unnecessary for immediate inspection if it sets the base direction for the input (which it should).</p> | |||
<p class="advisement" id="bp-inferring_from_language">If metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p> | |||
<p class="advisement" id="bp-inferring_from_language">(Only) if metadata is not available and cannot otherwise be provided, specifications MAY allow a base direction to be <a href="#script_subtag">interpolated from available language metadata</a>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ought to be reversed:
Specifications SHOULD NOT allow a base direction to be interpolated from available language metadata unless direction metadata is not available and cannot otherwise be provided.
Although the original MAY
might be closer to being nice about it:
Specifications that cannot otherwise provide direction metadata or for situations where metadata is not provided MAY allow a base direction to be interpolated from available language metadata.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked your first alternative, since we need to make it clear that this is not fundamentally a bad approach, rather than an opportunity.
index.html
Outdated
@@ -577,7 +571,7 @@ <h4 id="bidiCase1">Final punctuation</h4> | |||
<p lang="he" dir="rtl" style="font-size: 1.8em; color: grey;">תוצאה: "בינלאומי!"</p> | |||
|
|||
<p>The Hebrew characters are automatically displayed right-to-left by applying the Unicode Bidirectional Algorithm (UBA). However, in a LTR context the UBA cannot make the exclamation mark appear to the left of the Hebrew text, where it belongs, unless the base direction is set to RTL around the inserted string.</p> | |||
<p>In HTML this can be done by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p> | |||
<p>In HTML this can be be achieved for a LTR context by inserting the string into a <code class="kw" translate="no">dir</code> attribute with the value <code class="kw" translate="no">rtl</code>. That yields the following:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts here:
-
"...inserting the string into a
dir
attribute with the value..." reads oddly. Probably it should say "...inserting the string into an element with adir
attribute with the value..." -
I'm not sure your edit adds anything? It's true that in an RTL context this is kind of a no-op---provided there aren't other bidi issues involved. In HTML, the dir attribute also activates bidi isolation, so a case like an LTR run at the end (and outside) the quoted text would also be solved. Consider
ברוך הבא ל- W3C!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit 1 made. Thanks for catching.
Wrt point 2, i wanted to indicate the context of the example. As you say, the implications are different for RTL contexts, and i didn't want to get into that complexity but i did want to ensure that my frame of reference is clearly delineated.
@@ -354,7 +348,7 @@ <h3 id="string_specific_language">String-specific language information</h3> | |||
|
|||
<p class="advisement" id="bp-lang_field_based_metadata">Use field-based metadata or string datatypes to indicate the language and the base direction for individual natural language strings.</p> | |||
|
|||
<p>There is widespread low-level support for natural language string metadata because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p> | |||
<p> Low-level support for natural language string metadata is widespread because the use of metadata for storage and interchange of the language of data values is long-established and widely supported in the basic infrastructure of the Web. This includes language attributes in [[XML]] and [[HTML]]; string types in schema languages (e.g. [[xmlschema11-2]]) or the various RDF specifications including [[JSON-LD]]; or protocol- or document format-specific provisions for language.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be a bit picky...
various RDF specifications including [[JSON-LD]]
is not really precise. As far as the core RDF is concerned there is only one (family) of spec; in this case the reference should probably be https://www.w3.org/TR/rdf11-concepts/
. Then there are several specification for the serialization of the general RDF concepts, of which JSON-LD is one. The example that you give below is in JSON-LD; it may be worth to note, then, that in this document the JSON-LD serialization is used for the examples, but it also applies to, e.g., Turtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, there are many edits to be made relating to the current state, mostly in the section about best practices but also elsewhere, which need to be made. Addison has an action to work on that, and will hopefully take your comment into account. I'm not really intending to touch that stuff - i was just improving the english.
I have begun (still a long way to go) to create a separate article about use cases and requirements at https://w3c.github.io/i18n-drafts/articles/lang-bidi-use-cases/index.en |
Preview | Diff