Skip to content

I18N review

Ivan Herman edited this page Jun 28, 2019 · 4 revisions

Intro

This a shortened version of the i18n questionnaire. Following the Short i18n review checklist, the original list has been pruned, retaining only the sections that are relevant to the JSON-LD Specs. Comments have been added, when necessary/possible in italics.

The glaring issue is of course the text direction question, but that is now subject of a separate discussion. But that may end up as a topic for a separate Working Group (see a draft charter), and the JSON-LD Working Group should follow what is happening there.


Language

Language basics

  1. It should be possible to associate a language with any piece of natural language text that will be read by a user. more
  2. Where possible, there should be a way to label natural language changes in inline text. more
    • Not applicable
  3. Consider whether it is useful to express the intended linguistic audience of a resource, in addition to specifying the language used for text processing. more
    • Not applicable
  4. A language declaration that indicates the text processing language for a range of text must associate a single language value with a specific range of text. more
  5. Use the HTML lang and XML xml:lang language attributes where appropriate to identify the text processing language, rather than creating a new attribute or mechanism. more
    • Not really applicable, though the user may use rdf:HTML datatype with these tags
  6. It should be possible to associate a metadata-type language declaration (which indicates the intended use of the resource rather than the language of a specific range of text) with multiple language values. more
    • This may translate into the fact of expressing language changes within one literal; this can be done using rdf:HTML
  7. Attributes that express the language of external resources should not use the HTML lang and XML xml:lang language attributes, but should use a different attribute when they represent metadata (which indicates the intended use of the resource rather than the language of a specific range of text). more
    • Not applicable: JSON-LD does not add any metadata to external references (i.e., URL-s).

Defining language values

  1. Values for language declarations must use BCP 47. more
  2. Refer to BCP 47, not to RFC 5646. more
  3. Be specific about what level of conformance you expect for language tags. The word "valid" has special meaning in BCP 47. Generally "well-formed" is a better choice.
  4. Reference BCP47 for language tag matching.

Declaring language at the resource level

  1. The specification should indicate how to define the default text-processing language for the resource as a whole. more
  2. Content within the resource should inherit the language of the text-processing declared at the resource level, unless it is specifically overridden.
  3. Consider whether it is necessary to have separate declarations to indicate the text-processing language versus metadata about the expected use of the resource. more
  4. If there is only one language declaration for a resource, and it has more than one language tag as a value, it must be possible to identify the default text-processing language for the resource. more

Establishing the language of a content block

  1. By default, blocks of content should inherit any text-processing language set for the resource as a whole. more
  2. It should be possible to indicate a change in language for blocks of content where the language changes. more

Establishing the language of inline runs

  1. It should be possible to indicate language for spans of inline text where the language changes. more
    • Not applicable

Text direction

JSON-LD does not currently handle text directions due to the defficiencies of RDF. See separate discussion that MAY lead to a possible solution, albeit possibly in a later version of JSON-LD only. The answers below are answered WITH THE ASSUMPTION that a separate @direction is introduced (now or later) alongside @language.

Basic requirements

  1. It must be possible to indicate base direction for each individual paragraph-level item of natural language text that will be read by someone. more
  2. It must be possible to indicate base direction changes for embedded runs of inline bidirectional text for all natural language text that will be read by someone. more
    • this is doable via the usage of rdf:HTML
  3. Annotating right-to-left text must require the minimum amount of effort for people who work natively with right-to-left scripts. more

Background information

  1. Do not assume that direction can be determined from language information. more

Base direction values

  1. Values for the default base direction should include left-to-right, right-to-left, and auto. more
    • It is unclear whether auto is meaningful in JSON-LD. But can be easily done if necessary

Handling direction in markup

  1. The spec should indicate how to define a default base direction for the resource as a whole, ie. set the overall base direction. more
  2. The default base direction, in the absence of other information, should be LTR. more
    • can be defined alongside @direction, just like @language.
  3. The content author must be able to indicate parts of the text where the base direction changes. At the block level, this should be achieved using attributes or metadata, and should not rely on Unicode control characters.
    • This can only be done using rdf:HTML. JSON-LD does not provide information inside a literal
  4. It must be possible to also set the direction for content fragments to auto. This means that the base direction will be determined by examining the content itself.
    • Not applicable
  5. If the overall base direction is set to auto for plain text, the direction of content paragraphs should be determined on a paragraph by paragraph basis.
  6. To indicate the sides of a block of text where relative to the start and end of its contained lines, you should use 'before' and 'after' (maybe block-start/block-end – the terminology is changing), rather than 'top' and 'bottom'.
    • Not applicable
  7. To indicate the start/end of a line you should use 'start' and 'end' rather than 'left' and 'right'.
    • Not applicable
  8. Provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.
    • Not applicable

Handling base direction for strings

  1. Provide metadata constructs that can be used to indicate the base direction of any natural language string. more
  2. Specify that consumers of strings should use heuristics, preferably based on the Unicode Standard first-strong algorithm, to detect the base direction of a string except where metadata is provided. more
    • Not applicable
  3. Where possible, define a field to indicate the default direction for all strings in a given resource or document. more
  4. Do NOT assume that a creating a document-level default without the ability to change direction for any string is sufficient. more
  5. If metadata is not available due to legacy implementations and cannot otherwise be provided, specifications MAY allow a base direction to be interpolated from available language metadata. more
    • This is the current assumption, actually
  6. Specifications MUST NOT require the production or use of paired bidi controls. more

Setting base direction for inline or substring text

For the whole section: JSON-LD does not, and should not, provide means to specify the "internals" of Literal and the way receiving applications handle Literals. (Using rdf:HTML, if necessary, can be used for this, but that is not a matter of JSON-LD. This makes all the points in this section not applicable.

  1. It must be possible to indicate spans of inline text where the base direction changes. If markup is available, this is the preferred method. Otherwise your specification must require that Unicode control characters are recognized by the receiving application, and correctly implemented.
    • JSON-LD cannot make asuumption or requirements on the receiving application; in this case it "just" transfers data.
  2. It must be possible to also set the direction for a span to auto. This means that the base direction will be determined by examining the content itself. A typical approach here would be to set the direction based on the first strong directional character outside of any markup. more
  3. If users use Unicode bidirectional control characters, the isolating RLI/LRI/FSI with PDI characters must be supported by the application and recommended (rather than RLE/LRE with PDF) by the spec.
  4. Use of RLM/LRM should be appropriate, and expectations of what those controls can and cannot do should be clear in the spec. more
  5. For markup, provide dedicated attributes for control of base direction and bidirectional overrides; do not rely on the user applying style properties to arbitrary markup to achieve bidi control.
  6. For markup, allow bidi attributes on all inline elements in markup that contain text.
  7. For markup, provide attributes that allow the user to (a) create an embedded base direction or (b) override the bidirectional algorithm altogether; the attribute should allow the user to set the direction to LTR or RTL or the aforementioned Auto in either of these two scenarios.