Skip to content

Commit

Permalink
Further improvements of "try it" examples
Browse files Browse the repository at this point in the history
  • Loading branch information
aphillips committed Aug 9, 2023
1 parent 18518f2 commit 56189c6
Showing 1 changed file with 54 additions and 1 deletion.
55 changes: 54 additions & 1 deletion questions/qa-backwards-deletion.en.html
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@
var explainer = document.getElementById('exampleExplainer');
explainer.innerHtml = "moo";
}

function reset(what, value) {
var item = document.getElementById(what);
item.value = value;
}
</script>
</head>

Expand Down Expand Up @@ -196,15 +201,32 @@ <h3 id="combining_marks">Combining marks</h3>

<p><img src="./qa-backwards-deletion-data/backwards-deletion.png" alt="Hindi backwards deletion"></p>

<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<p><input id="tryHindi" type="text" name="tryHindi" lang="hi" style="font-size:24pt;" value="&#x92f;&#x942;&#x928;&#x93f;&#x915;&#x94b;&#x921;"></input>
<button type="button" onclick="reset('tryHindi', '&#x92f;&#x942;&#x928;&#x93f;&#x915;&#x94b;&#x921;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

<p>One reason suggested for the difference between delete and backspace behavior is that removing the base character (which is always the first character in a Unicode character sequence, and thus the first code point encountered in forward deletion) usually consumes any combining marks associated with it. That way combining marks associated with the base character aren't left over to combine with the preceeding sequence of characters, or, if there were no preceeding characters, be "orphaned" and form an invalid sequence.</p>

<p>Backspace, meanwhile, can safely remove combining characters hanging from a given base character without causing other characters in the character sequence to change meaning. One reason sometimes attributed for this behavior is that it allows characters that have been "built-up" using multiple keypresses or other input mechanisms to be corrected without retyping the whole sequence.</p>

<p>Tamil presents the same concept in a visually more striking way. The syllable <samp>&#xb95;&#xbcb;</samp> (pronounced like 'ko') looks as if it is made of three units. However, it consists of a two character sequence (U+0B95 U+0BCB). What's more the base character is the one visually in the middle. These characters still behave the same as those in Hindi (or other languages): cursoring, selection, and forward deletion move over the pair as a single unit. Backspacing deletes the combining mark first.</p>

<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<p><input id="tryTamil" type="text" name="tryTamil" lang="ta" style="font-size:24pt;" value="&#xb95;&#xbcb;"></input>
<button type="button" onclick="reset('tryTamil', '&#xb95;&#xbcb;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

<p>Indic scripts, such as the Devanagari and Tamil examples above, are not the only scripts affected by this. The same can be found for combining marks in many languages. For example, the first cluster in this Thai word: <q>คืออะไร</q>.</p>

<!-- TODO [get better example; demonstrate middle cursor deletion effects in Thai] -->
<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<p><input id="tryThai" type="text" name="tryThai" lang="th" style="font-size:24pt;" value="คืออะไร"></input>
<button type="button" onclick="reset('tryThai', 'คืออะไร')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

<p>Some character sequences can be written in either a "composed" or a "decomposed" forms that affect how selection and deletion performs. For example, Korean characters can be written in either a precomposed form or using a sequence of combining marks (called <em>jamo</em>). Here's one example: </p>

Expand All @@ -229,23 +251,53 @@ <h3 id="combining_marks">Combining marks</h3>

<p>When written in the precomposed form, each Korean character remains atomic for all operations. When composed from jamo, most systems allow backspacing into the character (while treating the character as atomic for selection and forward deletion).</p>

<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<label for="tryKoreanA" style="width:30%">U+AC01</label> <input id="tryKoreanA" type="text" name="tryKoreanA" lang="ko" style="font-size:24pt;" value="&#xac01;"></input>
<button type="button" onclick="reset('tryKoreanA', '&#xac01;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
<label for="tryKoreanB" style="width:30%">U+1100 U+1161 U+11A8</label> <input id="tryKoreanB" type="text" name="tryKoreanB" lang="ko" style="font-size:24pt;" value="&#x1100;&#x1161;&#x11a8;"></input>
<button type="button" onclick="reset('tryKoreanB', '&#x1100;&#x1161;&#x11a8;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

<p>Korean is just an example of this. Ones that are less common in real life but are sometimes used as examples also help illustrate this mysterious "character duality". While most Latin script text with accents is encoded as precomposed characters, it is possible to encode most characters as a base character with one or more combining marks. When this decomposed sequence is used, the behavior is similar to the Korean: cursor, text selection, and forward deletion include the base character and all of its associated accents, while backspacing deletes the combining marks one-at-a-time before the base character is reached.</p>

<p>For example, the character U+01FA Latin Capital Letter A with Ring Above and Acute can also be composed as the sequence U+0041 U+030A U+0301. Both behave like a single letter for selection and deletion, but backspacing reveals the decomposed structure of the latter:</p>

<p><img src="./qa-backwards-deletion-data/latin-backspace-progression.png" alt="Latin script backspace progression"></p>

<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<p><label for="tryLatinPre" style="display:inline-block">Precomposed Latin U+01FA</label>
<input id="tryLatinPre" type="text" name="tryLatinPre" lang="en" style="font-size:24pt;" value="&#x1fa;"></input>
<button type="button" onclick="reset('tryLatinPre', '&#x1fa;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>

<p><label for="tryLatinDecomp" style="display:inline-block">Decomposed U+0040 U+030A U+0301</label>
<input id="tryLatinDecomp" type="text" name="tryLatinDecomp" lang="en" style="font-size:24pt;" value="A&#x30a;&#x301;"></input>
<button type="button" onclick="reset('tryLatinDecomp', 'A&#x30a;&#x301;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

<section>
<h3>Exceptions</h3>

<p>There are also exceptions to these general rules.</p>

<p>For example, depending on your platform, emoji sequences sometimes behave as if they were atomic characters. For example, a "family" emoji such as 👨‍👩‍👧‍👧, when it is composed as an emoji sequence (here it is U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F467), might be treated for selection as well as both forwards and backwards deletion as a single unit of text, while on other platforms the individual characters might be accessible to both the cursor and deletion.</p>

<p>Another counter case appears in some Indic script langauges where some conjuncts are formed with multiple base characters. An example from the Devanagari script is the syllable <em>kshi</em> (&#x0915;&#x094d;&#x0937;&#x093f;) which is formed using the sequence U+0915 U+094D U+0937 U+093F. The characters U+0915 and U+0937 are both base characters and technically this forms two grapheme clusters. However, in many fonts (and to many users) this character sequence forms a single "shape" perceived to be a single unit of text. In spite of this perception, though, the user can both cursor into the conjunct and forward delete only a part of the sequence.</p>
<p>Another counter case appears in some Indic script langauges where some conjuncts are formed with multiple base characters. An example from the Devanagari script is the syllable <em>kshi</em> (&#x0915;&#x094d;&#x0937;&#x093f;) which is formed using the sequence U+0915 U+094D U+0937 U+093F. The characters U+0915 and U+0937 are both base characters and technically this forms two grapheme clusters. However, in many fonts (and to many users) this character sequence forms a single "shape" perceived to be a single unit of text. In spite of this perception, though, on some browsers the user can both cursor into the conjunct and forward delete only a part of the sequence.</p>

<div style="border:1px solid black; background-color:#aaa;">
<h4>Try It</h4>
<p><label for="tryExceptionA" style="display:inline-block">Family Emoji</label>
<input id="tryExceptionA" type="text" name="tryExceptionA" lang="en" style="font-size:24pt;" value="👨‍👩‍👧‍👧"></input>
<button type="button" onclick="reset('tryExceptionA', '👨‍👩‍👧‍👧')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
<p><label for="tryExceptionB" style="display:inline-block">Hindi <em>kshi</em></label>
<input id="tryExceptionB" type="text" name="tryExceptionB" lang="hi" style="font-size:24pt;" value="&#x0915;&#x094d;&#x0937;&#x093f;"></input>
<button type="button" onclick="reset('tryExceptionB', '&#x0915;&#x094d;&#x0937;&#x093f;')" style="font-family:monospaced;font-size:9pt;">Reset</button></p>
</div>

</section>

<!--
<section>
<div style="width:80%; background: #dfdfdf; border: 1px solid black">
<h4>Try it yourself!</h4>
Expand Down Expand Up @@ -274,6 +326,7 @@ <h4>Try it yourself!</h4>
</form>
</section>
-->

</section>
<section>
Expand Down

0 comments on commit 56189c6

Please sign in to comment.