Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tamil: 2.8 Text boundaries & selection #20

Open
r12a opened this issue Aug 10, 2018 · 1 comment
Open

Tamil: 2.8 Text boundaries & selection #20

r12a opened this issue Aug 10, 2018 · 1 comment
Labels
drafting i:segmentation Grapheme/word segmentation & selection l:ta Tamil language & script

Comments

@r12a
Copy link
Contributor

r12a commented Aug 10, 2018

2.8 Text boundaries & selection
https://w3c.github.io/iip/gap-analysis/taml-gap.html#boundaries

Comment from Muthu:

There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CHC). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not.

@r12a r12a added the drafting label Aug 10, 2018
@r12a
Copy link
Contributor Author

r12a commented Aug 10, 2018

This issue was discussed in a meeting.

  • ACTION: Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
View the transcript alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts?
muthu: in modern languages they write phonetically and pulli remains visible
r12a: w3c/ilreq#31 is a related issue
<scribe> ACTION: r12a to raise tamil segmentation issue in our repo
<trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
alolita: so this issue is fixed in recent platforms - you can now put the cursor between
muthu: yes
neha: the segmentation rules for akshara @@@
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries
vivek: tamil doesn't fall in line with other scripts for handling of clusters
muthu summarises neha and akshat
muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different
<alolita> akshat: there are 2 definitions of akshara
<alolita> akshat: one definition refers to one encoding for all indian scripts
<alolita> akshat: this is the IS13194 definition
akshat: there are two actual definitions today, iscii 1394 list all conjuncts
<alolita> akshat: the other definition is from unicode
akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii,
<alolita> akshat: unicode instead allocated different code pages for each indian language script
<alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear
akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not captured by ilreq
... for tamil we don't need new categories to add to this definition
... definition talks about CHC but in tamil it's only applicable for the two conjuncts
alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode
... having the clarification of differences is needed - that's a gap

@r12a r12a added the l:ta Tamil language & script label Aug 10, 2018
@r12a r12a added the i:segmentation Grapheme/word segmentation & selection label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
drafting i:segmentation Grapheme/word segmentation & selection l:ta Tamil language & script
Projects
None yet
Development

No branches or pull requests

1 participant