Skip to content

Commit

Permalink
annotation language, added a possible scenario for language labelling…
Browse files Browse the repository at this point in the history
… of the target
  • Loading branch information
r12a committed May 20, 2016
1 parent 407a9d9 commit d6499c7
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions notes/annotation-language-use-cases.html
Expand Up @@ -28,15 +28,15 @@ <h2>Definitions</h2>
<p>The question is, how will the recorded language information be used - do we need separate properties for language metadata and text-processing or not?</p>

<h2>Language(s) of the target</h2>
<p>When applied to a target, language information is almost certainly metadata, since it isn't likely that the target will be operated on or changed in a language sensitive fashion. Section 3.2.1 has a use case where Beatrice's target is labelled as 'en', whereas her body is 'fr'. I'm not sure what the value of recording the language of the target is in this case. I must admit that i'm not entirely sure how the language information about the target will be useful later, unless it has something to do with language negotiation. <span class="ed">AAAAA</span></p>
<p>When applied to a target, language information is almost certainly metadata, since it isn't likely that the target will be operated on or changed in a language sensitive fashion. Section 3.2.1 has a use case where Beatrice's target is labelled as 'en', whereas her body is 'fr'. I'm not sure what the value of recording the language of the target is in this case. However, if she is annotating a document that is available in more than one language, and the same content is available in both (ie. it's a translation), then she might point to more than one target (with appropriate differences in the selector). In this case, someone going from the annotation to the target may be able to choose which target to refer to(?). <span class="ed">AAAAA</span></p>
<h2>Language(s) of the body</h2>
<h3>Use case 1</h3>
<p>Kensuke is reading an old Tibetan manuscript from the Dunhuang collection. The tool he is using to read the manuscript has access to annotations created by scholars working in the various languages of the International Dunhuang Project, who are commenting on the text. The section of the manuscript he is currently looking at has commentaries by people writing in Chinese, Japanese and Russian. Each of these commentaries is stored in a separate body within the same annotation. Each commentary is mainly written in the language of the scholar, but may contain excerpts from the manuscript and other sources written in Tibetan as well quoted text in Chinese and English.</p>
<p>Kensuke speaks Japanese, so he wants to be presented with the Japanese commentary.</p>
<p>The body containing the Japanese commentary has a <code class="kw" translate="no">language</code> property set to <code class="kw" translate="no">ja</code> (Japanese). The tool he is using knows that he wants to read Japanese commentaries, and it uses this information to select and present to him the text contained in that body. This is language information being used as metadata – it indicates to the application doing the retrieval that the intended consumer of the information wants Japanese. </p>
<p>The Japanese commentary for this particular annotation starts with a sentence in Japanese, but later contains some excerpts from Chinese and Tibetan sources. It's possible for the value of the <code class="kw" translate="no">language</code> property, when used as metadata, to contain three language tags, <code class="kw" translate="no">ja,zh,bo</code> (japanese, chinese, and tibetan, respectively), but i'm not sure how useful that is in this particular use case. <span class="ed">BBBBB</span></p>
<p>Having identified the relevant annotation text to present to Kensuke, his application has to then display it so that he can read it. It's important to apply the correct font to the text. Ideographic characters such as 雪, 父 and 垔 have slight but important differences in Japanese vs Chinese fonts, and it's important not to apply a Chinese font to the Japanese text that Kensuke is reading, but is also important to use a Chinese font for the Chinese quoted text. There are also language-specific differences in the way text is wrapped at the end of a line. For these reasons we need to identify the actual language of the text to which the font or the wrapping algorithm will be applied. </p>
<p>If the <code class="kw" translate="no">language</code> property value contains only <code class="kw" translate="no">ja</code>, that's a good indicator that the application should use the Japanese font unless instructed otherwise. If, however, the <code class="kw" translate="no">language</code> property has the value <code class="kw" translate="no">ja,zh,bo</code>, it's not clear what the default font should be. In that case, we need an alternative way to indicate that the first sentence in the text presented to Kensuke is actually in Japanese.</p>
<p>Having identified the relevant annotation text to present to Kensuke, his application has to then display it so that he can read it. It's important to apply the correct font to the text. Ideographic characters such as 雪, 父 and 垔 have slight but important differences in Japanese vs Chinese fonts, and it's important not to apply a Chinese font to the Japanese text that Kensuke is reading, but is also important to use a Chinese font for the Chinese quoted text. There are also language-specific differences in the way text is wrapped at the end of a line. For these reasons we need to identify the actual language of the text to which the font or the wrapping algorithm will be applied. Also, a voice browser will need to know whether to use Japanese or Chinese pronunciations for the ideographic characters contained in the annotation body text.</p>
<p>If the <code class="kw" translate="no">language</code> property value contains only <code class="kw" translate="no">ja</code>, that's a good indicator that the application should expect the fiinstructed otherwise. If, however, the <code class="kw" translate="no">language</code> property has the value <code class="kw" translate="no">ja,zh,bo</code>, it's not clear what the default font should be. In that case, we need an alternative way to indicate that the first sentence in the text presented to Kensuke is actually in Japanese.</p>
<p>We also need a way to indicate the change of language to Chinese and Tibetan later in the commentary for this annotation, so that appropriate fonts and wrapping algorithms can be applied there.</p>
<p>&nbsp;</p>
</body>
Expand Down

0 comments on commit d6499c7

Please sign in to comment.