Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[I18N-ACTION-1046] Add warning about Unicode tag characters #56

Merged
merged 18 commits into from
Mar 3, 2022

Conversation

aphillips
Copy link
Contributor

@aphillips aphillips commented Jul 8, 2021

This also includes some minor updates and incorporates Felix's pull request (which was against my repo???) for action-1036.


Preview | Diff

aphillips and others added 7 commits June 18, 2021 09:23
merging in richard's styling changes
Proposal of a definition for metadata. This implements https://www.w3.org/International/track/actions/1036
- Added a new best practice to "general"
- Added description quoting Unicode's "strong discourage"
- Added a warning box to the section on Unicode tag characters
- Added a .uname style, since I needed to quote a character name
- Make the UBA definition consistent with i18n-glossary
- Rearrange items to be more logically sequenced
@aphillips aphillips requested a review from r12a July 14, 2021 16:08
@aphillips
Copy link
Contributor Author

@r12a If you could take a close look at the changes I just added to the glossary. I applied the "definition" style to definitions and, in the course of doing that, re-arranged the order to be more logical (fewer "forward" definitions). I also made the UBA definition more-or-less consistent with i18n-glossary and made light edits to the "base direction" materials. It may be easier to view my edits from my repo, i.e. https://aphillips.github.io/string-meta

index.html Outdated
@@ -68,18 +68,34 @@ <h2 id="introduction">Introduction</h2>
<section id="terminology">
<h3>Terminology</h3>
<p>This section defines terminology necessary to understand the contents of this document. Most of the terms defined here are specific to this document. Terminology borrowed from other Internationalization documents have a link to the original definition.</p>

<p class="definition"><dfn data-lt="metadata|metadata">Metadata</dfn> is information about data defined in terms of functions, form and scope. In this document, the function of metadata is to express information about direction and language. The form for direction metadata is described in [[[#bidi-approaches]]], the form for language metadata is described in [[[#language-approaches]]]. In this document, the scope for both types of metadata is a string or a set of strings. In absence of direction or language metadata, defaults apply, see [[[#resource_wide_defaults]]].</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there should be a comma after 'information about data', otherwise it's slightly ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... although i'm not actually clear what is meant by "defined in terms of functions, form and scope", anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence about 'in this document..' is a good addition. But the following sentences worry me.

Rather than 'The form for xxx metadata is described in' i think you mean 'Various possible ways of representing xxx metadata are described in...'. (Note that not all are recommended.)

But actually, i'm not convinced that we need those two links, anyway. If you're going to point to them, then why not link to the 2 sections that contain our actual recommendations, too. But if you do that, you're largely just repeating what's clear anyway in the table of contents.

Finally, 'In absence of' -> 'In the absence of'.

Copy link
Contributor Author

@aphillips aphillips Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was Felix's pull request verbatim. I have removed the bottom linked part of the paragraph and rewritten the definition. See what you think (after the next PR).

index.html Outdated

<p class="definition"><dfn data-lt="unicode bidirectional algorithm|uba|bidi algorithm">Unicode Bidirectional Algorithm</dfn> (<q>UBA</q>) or <em>Bidi algorithm</em>. This is the name for the rules described in <a href="http://www.unicode.org/reports/tr9/"><cite>Unicode Standard Annex #9: <q>Unicode Bidirectional Algorithm</q></cite></a> [[UAX9]]. Those rules describe how inline bidirectional text should be rendered. The effects of the bidi algorithm depend on the <a>base direction</a> and the directional properties of the characters to which it is applied.</p>

<p class="definition"><dfn>Base direction</dfn> is the initial direction applied to a paragraph and determines the general arrangement and progression of content in the <a>bidi algorithm</a> when bidirectional text is displayed.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base direction is the initial direction applied to both paragraphs and inline runs. In this document, however, we are only concerned with the base direction of a string as a whole. That's probably similar to paragraph level when the string is stored in a manifest, but it may well become the direction of an inline run when included in the rendered text. Perhaps we should say something like:

"Base direction is the initial direction applied to a paragraph or a defined run of inline text, and determines .... In this document we are concerned with identifying the base direction of a whole string, and we do not talk about how to define base direction for runs of text within a string."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand "base direction" as being applied to "inline runs" save where we are speaking of isolated runs of text. Otherwise isn't the direction of a run of text its embedding level over the base direction?

That would make your suggestion more like "... In this document we are concerned with identifying the base direction of a whole string and to apply that base direction with displaying strings in a given context, and we do not talk about how to determine the direction of runs of text within a string."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding levels change the base direction for an inline run of text. So if one is defining what base direction is, one needs to take that into account. Of course, the more important reason for having a definition of base direction in this document is to say what's out of scope for the document - ie. embedded base direction changes within a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I never call what embedding levels are changing the "base direction". I just call it the direction of that run. The base direction (in my mind) is the floor (whether one is starting with L or R as the direction or 0 or 1 as the level I suppose).

It is super unhelpful that UTR#9 contains no definitions for this stuff (I looked) and I suspect we should contribute definitions to the Unicode glossary as well. Have a look at my edits just now and see what you think?

index.html Outdated

<p class="definition"><dfn>First-strong detection</dfn> is an algorithm that looks for the first strongly-directional character in a string, and then uses that to guess at the appropriate <a>base direction</a> for the string as a whole. Unicode code points are associated with properties relating to text direction: generally, letters in right-to-left scripts such as Arabic and Hebrew have a strong RTL direction, whereas Latin and Han characters have a strong LTR direction. Other characters, such as punctuation, only have a weak intrinsic directionality, and the actual directionality is determined according to the context in which they are found.</p>

<p class="note">If you are unfamiliar with bidirectional or right-to-left text, there is a basic introduction <a href="https://www.w3.org/International/articles/inline-bidi-markup/uba-basics">here</a>. This will give you a basic grasp of how the <a>Unicode Bidirectional Algorithm</a> works and the interplay between it and the base direction, which will stand you in good stead for reading this document. Additional materials can be found in the Internationalization Working Group's <a href="https://www.w3.org/International/techniques/developing-specs.en?open=text_direction#text_direction">Techniques Index</a>.</p>

<p class="definition"><dfn data-lt="natural language">Natural Language</dfn> The spoken, written, or signed communications used by human beings. [[LTLI]]</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terms such as this and several that follow, that do not begin a sentence, should have a full stop after, to make things clearer for people using voice browsers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some definitions, such as Syntactic Content, point to LTLI or CHARMOD-NORM. I know that that's where these definitions came from, but if we're going to keep the links, i think we should point to The Glossary, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first comment is a good one and I note that we don't actually define some of the terms. I have made of go of it just now.

index.html Outdated
@@ -348,28 +372,38 @@ <h2 id="defining_bidi_keywords">Defining Bidirectional Keywords in Specification
"language": "ar"
}</pre>

<p><strong>Example of a <a>display direction attribute</a>.</strong> If the above JSON were received by a process that was assembling a Web page for display, it might be filling in a template similar to the top line in this example. Here the <code>dir</code> attribute from [[HTML]] is an example of a <a>display direction attribute</a>.</p>
<p><strong>Example of a <a>display direction attribute</a>.</strong> If the above JSON were received by a process that was assembling a Web page for display, it might be filling in a template similar to the top line in this example to produce markup like the second line. Here the <code>dir</code> attribute from [[HTML]] is an example of a <a>display direction attribute</a>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of comments on section 2.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'field direction value' refers to the value of the direction field, whereas 'display direction attribute' refers to the attribute, rather than its value. I think that better terms would be, respectively, 'direction field' and 'direction attribute'. We could then talk about the direction field value, etc. where needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definitions for the above 2 terms are hard to grasp without examples, but very easy with. I therefore suggest splitting EXAMPLE 2 so that each of the 2 examples it contains occurs immediately after the appropriate definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the title structure has a value direction which represents the field direction of the value field.

???

How about: "... the title structure has a direction field, who's value represents the direction of the title." Unless you had an RDF-like @ value at the end of the string, i don't think you can say that the direction field represents the content of the value field, though you could say 'which represents the value of the title's value field" (which is somewhat clunky).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the field name...

I'd be inclined to write "Consider using the field name...", since i don't think this is an absolute or strong recommendation like the others - which others we really want people to observe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates a base direction of left-to-right, in exactly the same manner indicated by CSS writing modes [CSS-WRITING-MODES-4]

Remind me why we are pointing to the CSS and HTML specs here, rather than to the UBA? If they contain something different from the UBA, we should probably explain that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all great comments and I must admit that I'm not keen on 'field direction value' and 'display direction attribute'. See what you think of the edits.

comments.
- Changed definitions to shorter 'direction value' and 'direction
  attribute'
- Split example into two parts.
- Edits for consistency.
@aphillips
Copy link
Contributor Author

It could get worse: The direction of the value of the title value is given by the direction value value. ;-)

We're defining here the two uses of direction. Would direction metadata work better than direction value?

@r12a
Copy link
Contributor

r12a commented Jul 19, 2021

I think it's better than 'direction value', yes. But attributes also provide metadata – what's different is the mechanism used. I wonder whether there's another alternative: tuplet? property?

@r12a
Copy link
Contributor

r12a commented Jul 19, 2021

Perhaps 'key'? As in key/value pairs.

@r12a
Copy link
Contributor

r12a commented Jul 19, 2021

An attribute contains a name eq value sequence (per the XML spec).

JSON has key/value pairs.

So how about balancing the 'direction attribute' term with the term 'direction key/value pair' ?

direction definitions.

After discussing with @r12a on the phone, I changed 'direction value' to
'directional metadata field' and changed 'base direction' to 'starting
base direction'. I suspect another round of edits will be forthcoming
after this :-).
@aphillips
Copy link
Contributor Author

So how about balancing the 'direction attribute' term with the term 'direction key/value pair' ?

I went a different direction with directional metadata field since k-v pairs are just one way of doing this. Admittedly most data fields and data structures are basically key-value pairs. I also made edits (see d1c091e) to make clear(er) that attributes can be used to transmit or store metadata also.

- Corrected other references, although 'base direction' links probably
  need more attention.
- Fixed on spacing issue.

Note well: I copied the definition from @r12a's PR against
i18n-glossary, but removed some text and made light edits that are not
consistent with my comments on his PR.
- add lint-ignore to local <dfn> that are not referenced later
@aphillips
Copy link
Contributor Author

@r12a Is this ready to merge yet? I want to do other work on string-meta (to answer my action items and to help with the explainer), but am blocked by these changes. I think we discussed them in telecon and have fixed all the issues?

- Remove most definitions in favor of i18n-glossary
- Add xref to i18n-glossary
- Change all `dfn` tags to `a` tags
- Change all references to "paragraph base direction" to use "paragraph
  direction"
- Move "if you are unfamiliar" note to top of terminology section
- Remove "natural language" note
@aphillips
Copy link
Contributor Author

Per I18N-ACTION-1125 merging this PR will the results of 573cb9b. @r12a should take a look at the resulting terminology section.

@aphillips aphillips merged commit 5efd013 into w3c:gh-pages Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants