Tamil: 2.8 Text boundaries & selection #20

r12a · 2018-08-10T15:22:05Z

2.8 Text boundaries & selection
https://w3c.github.io/iip/gap-analysis/taml-gap.html#boundaries

Comment from Muthu:

There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CHC). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not.

r12a · 2018-08-10T15:23:07Z

This issue was discussed in a meeting.

ACTION: Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].

View the transcript

alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts?
muthu: in modern languages they write phonetically and pulli remains visible
r12a: w3c/ilreq#31 is a related issue
<scribe> ACTION: r12a to raise tamil segmentation issue in our repo
<trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
alolita: so this issue is fixed in recent platforms - you can now put the cursor between
muthu: yes
neha: the segmentation rules for akshara @@@
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries
vivek: tamil doesn't fall in line with other scripts for handling of clusters
muthu summarises neha and akshat
muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different
<alolita> akshat: there are 2 definitions of akshara
<alolita> akshat: one definition refers to one encoding for all indian scripts
<alolita> akshat: this is the IS13194 definition
akshat: there are two actual definitions today, iscii 1394 list all conjuncts
<alolita> akshat: the other definition is from unicode
akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii,
<alolita> akshat: unicode instead allocated different code pages for each indian language script
<alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear
akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not captured by ilreq
... for tamil we don't need new categories to add to this definition
... definition talks about CHC but in tamil it's only applicable for the two conjuncts
alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode
... having the clarification of differences is needed - that's a gap

r12a added the drafting label Aug 10, 2018

r12a added the l:ta Tamil language & script label Aug 10, 2018

r12a added the i:segmentation Grapheme/word segmentation & selection label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tamil: 2.8 Text boundaries & selection #20

Tamil: 2.8 Text boundaries & selection #20

r12a commented Aug 10, 2018

r12a commented Aug 10, 2018

Tamil: 2.8 Text boundaries & selection #20

Tamil: 2.8 Text boundaries & selection #20

Comments

r12a commented Aug 10, 2018

r12a commented Aug 10, 2018