Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements of Alphabetic Counter Styles for Indian languages #36

Open
vermaprashant1 opened this issue Mar 24, 2021 · 27 comments
Open

Comments

@vermaprashant1
Copy link

TDIL (Technology Development for Indian languages) programme of Ministry of Electronics and Information Technology, Govt. of India runs Standardization activity for Indian languages.
Many counter styles of Indian languages are define in the sections of https://www.w3.org/TR/predefined-counter-styles/ are numeric counters only. The alphabetic counter styles of Indian languages are currently missing. For seamless access it is also required to define alphabetic counter styles for all Indian languages so that the user can implement on the web. This can be achieved by CLDR (Common Locale Data Repository) of Unicode Consortium, where approved characters lists are defined for all Indian languages. TDIL is also participating and collected the CLDR data from language experts and submitted to Unicode which further go through the process and validated by members and technical Committee of Unicode Consortium.
The code snippets of alphabetic counter can be defined in the https://www.w3.org/TR/predefined-counter-styles/ based on the characters defined in CLDR. The Hindi CLDR data are available at https://st.unicode.org/cldr-apps/v#/hi/Alphabetic_Information/ for the reference. The other Indian languages CLDR data are also available on the same link.

@r12a
Copy link
Contributor

r12a commented Mar 24, 2021

@vermaprashant1 Thank you for these pointers. I'd be happy to add Indian alphabetic styles, but we need some additional information first, which is not available from CLDR (as far as i know). Are you able to supply the following for each language?

  1. When you refer us to the exemplar characters in CLDR, i assume that you mean those listed under Others: index. (The list under Main letters is usually not appropriate.) These are described as " “shortcut” letters for quickly jumping to sections of a sorted, indexed list (for an example, see mu.edu)." It's not necessarily the case that such a list of characters is also appropriate for a list using counter styles. Can you confirm suitability for each language?
  2. What are the appropriate suffixes/prefixes for lists (eg. in the list item you are reading here right now, there is a suffix using U+002E FULL STOP plus U+0020 SPACE. There should at least be a default specified, but there may be more than one style of suffix/prefix usage for a given counter style – i'm thinking of adapting the document to show those as alternatives, going forward.
  3. What happens when the list of items to which the counter style is applied is longer than the list of alphabetic characters? It's important to know whether this is a fixed list, one that starts again once it reaches the end, or one that adds characters going forward, such as aa., ab., ac., etc.

Ideally, we'd also like to have evidence of use in the form of scans or at least pointers to online content, showing the counter style in use. Can you help with that? (We may want to add any pictures to the Type Samples repo at https://w3c.github.io/type-samples/)

If you can provide the above information/confirmation i'd be glad to add these things, as i mentioned. However, it would really help a lot if you yourself could either raise a PR, or add the proposed content to this thread.

@r12a
Copy link
Contributor

r12a commented Mar 24, 2021

By the way, there are also some aspects of the CLDR index lists that are probably not applicable for counter style listings. For example, if you go to Marathi you get this list:

‍ ॐ ं ः अ आ इ ई उ ऊ ऋ ऌ ए ऐ ऑ ओ औ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल व श ष स ह ळ ऽ ॅ ्

which includes the following characters which i doubt are suitable for a counter-style:

ॐ U+0950: DEVANAGARI OM
ं U+0902: DEVANAGARI SIGN ANUSVARA​
ः U+0903: DEVANAGARI SIGN VISARGA​
ऽ U+093D: DEVANAGARI SIGN AVAGRAHA
ॅ U+0945: DEVANAGARI VOWEL SIGN CANDRA E
् U+094D: DEVANAGARI SIGN VIRAMA​

These are fairly noticeable, but there may be less obvious things to consider (such as whether ksha should be in the list). So it's important to check each of the index lists against actual usage to derive the definitive list for the custom counter styles entry.

@vermaprashant1
Copy link
Author

Thanks Richard, We are glad to participate in the activity.

  1. As suggested by you, it is important to collate the list of the characters from CLDR that are appropriate to define counter styles for Indian languages.
  2. In regard to suffixes/prefixes, the default suffix can be the full stop with white space. But it is also required to collect the use-cases for different suffixes such that क), (क) etc. Although currently different suffixes are not acceptable in the web technologies say HTML. Although it can be used in other application such as MS word.
  3. With respect to the continuation of Indian languages list, it is not fixed. For the wider list, it is prefer to define in the same manners as defined in Latin languages say English.
    क ख ग ....... ह कक कख कग.......
    It is also prefer not to include characters like Ksha, Gya, shra etc.
    Currently the alphabetic counter styles in Indian language are not frequently used for online purpose by the industries as it is not currently implemented by web technologies say HTML, CSS etc. While implemented, it might be used on the web/Digital developers for localization purposes. Although the alphabetic lists are used in printed magazines/text books or in pdf format etc.
    In order to achieve the above, we are trying to collect the use-cases of the alphabetic counters styles for different Indian languages for submission. When gather the related information we will share these through the W3C repository.

@r12a
Copy link
Contributor

r12a commented Mar 29, 2021

Great! I'll look forward to hearing from you then. Thanks.

Btw, i'm hoping that Blink will support customisable counter-styles very soon. Perhaps not the next release, but the one after. That will mean that these styles can be used for both Gecko and Blink browsers, which will be great progress. They will of course fall back to the defaults for users of WebKit browsers, but perhaps once the others are supporting this then WebKit will speed up it's support too.

@vermaprashant1
Copy link
Author

vermaprashant1 commented Apr 7, 2021

@richard We have collected some counters styles snaps used in different Indian languages/Scripts(Attached) with some online references. Given below some of the online references for the same:
Kannada
Malayalam
Tamil
Assamese
Nepali
Manipuri
how these can be helpful in the implementation to define alphabetic predefined counter styles as discussed in the section ?
Currently is there is any mechanism to define alphabetic hindi counter style without using symbols as we can define for devanagari numeric counter style.
Guide us for further actions.

Assamese

Assamese

Bengali

Bengali-Conter-Styles

Gujarati

Gujarati-Conter-Styles

Hindi

hindi2

Kannada

Kannada-Conter-Styles

Maithili

Maithili-Conter-Styles

Malayalam

Malayalam-Conter-Styles

Manipuri

manipuri

Nepali

Nepali

Tamil

Tamil-Conter-Styles

@r12a
Copy link
Contributor

r12a commented Apr 7, 2021

@vermaprashant1 thank you for the scans. Could you label them? For example, i can't tell which is Maithili and which is Assamese. (You should be able to edit the comment.)

I still need you to clarify, for each of the styles:

  1. what is the definitive and complete set of symbols to use for each style (eg. what is the full set of letters to use for Manipuri, in order)
  2. what prefix and/or suffix do you want to specify (just the ones in the scans?)

I assume that you expect all of these styles to be 'alphabetic', ie. there is no end of range limitation specified, and after all the symbols are used we start doubling them (ie. for English that would be a, b, c, ... z, aa, ab...)

Currently is there is any mechanism to define alphabetic hindi counter style without using symbols as we can define for devanagari numeric counter style.

I'm not clear what you are asking here. Are you asking whether it's possible to define the style without a particular prefix/suffix? Or are you perhaps asking whether it's possible to have no prefix/suffix?

@vermaprashant1
Copy link
Author

@vermaprashant1 thank you for the scans. Could you label them? For example, i can't tell which is Manipuri and which is Assamese. (You should be able to edit the comment.)

I labeled all the attachments. Kindly let me know if something else.

I still need you to clarify, for each of the styles:

  1. what is the definitive and complete set of symbols to use for each style (eg. what is the full set of letters to use for Manipuri, in order)

For this , We are communicating with the language experts for complete list that are preferred by the community.

  1. what prefix and/or suffix do you want to specify (just the ones in the scans?)

In most of the snaps we find only one types of the suffixes/prefixes say ()

I assume that you expect all of these styles to be 'alphabetic', ie. there is no end of range limitation specified, and after all the symbols are used we start doubling them (ie. for English that would be a, b, c, ... z, aa, ab...)

We are also discussing with the experts on the same matter.

Currently is there is any mechanism to define alphabetic hindi counter style without using symbols as we can define for devanagari numeric counter style.

I'm not clear what you are asking here. Are you asking whether it's possible to define the style without a particular prefix/suffix? Or are you perhaps asking whether it's possible to have no prefix/suffix?

I want to ask, if we give keyword 'devanagari' with system:numeric than it will automatically generate number list in the devanagari. But for defining Hindi alphabetic listing we have to add 'symbol' keyword that contains the characters code we want as define in readymade counter styles.

Also some languages have more that one scripts say Sindhi, Kashmiri has both Devanagari & Perso-Arabic Script. So in which way we can define counter styles of both languages with the Script in the same code?

@r12a
Copy link
Contributor

r12a commented Apr 13, 2021

I want to ask, if we give keyword 'devanagari' with system:numeric than it will automatically generate number list in the devanagari. But for defining Hindi alphabetic listing we have to add 'symbol' keyword that contains the characters code we want as define in readymade counter styles.

Are you referring to the difference between some styles that are supported by the browser already without the user needing to create any CSS code, such as the devanagari style? When the spec author was writing the Counter Styles spec he added those that were supported already by more than one browser to the spec, but it was rather arbitrary as to what was baked in and what wasn't. (Personally, i didn't think that was a good idea, since it creates a kind of odd situation where you may need to check whether or not you need the CSS code.) But i think that going forward there are no plans to add more styles to the Counter Styles spec – authors are expected to create them using CSS code. I also expect several of the styles currently described by the CS spec to be overwritten by authors because they want to use a different prefix/suffix, or use a different set or order of symbols (eg. for Greek).

Also some languages have more that one scripts say Sindhi, Kashmiri has both Devanagari & Perso-Arabic Script. So in which way we can define counter styles of both languages with the Script in the same code?

In your CSS style sheet you can define styles for each style of list you want. Different styles may be developed for different scripts, or to support different prefix/suffix, to use a different order for certain types of counter in your doc, or etc. So you define each style in the stylesheet and give it a name you like (it doesn't have to have the name in the ready-made counter styles doc). Then, when you define the styling for a particular list, you say list-style-type: myCounterName to apply the style you want to that list (or chapter headings, or figure numbering, etc.)

Am i getting closer to answering your questions?

@vermaprashant1
Copy link
Author

Yes, Thanks

@r12a
Copy link
Contributor

r12a commented Aug 11, 2021

hi @vermaprashant1 Did you get any further with your investigations into these counter styles?

Note that Blink (Chrome) now also supports user-defined styles, and i'm hoping that WebKit (Safari, iOS) will move towards supporting it too. So it makes a lot of sense to add these styles now to https://www.w3.org/TR/predefined-counter-styles/

I look forward to hearing from you.

@vermaprashant1
Copy link
Author

Hi Richard,

We are collecting more counter styles resources apart from the styles shared you earlier. We are also in process in collecting character set that define counter styles from the language experts based on the use cases. But this required more consultation from other experts and Indian State Government.
Also i want to know can we propose more than one counter styles of a particular language(If that community uses different counter styles)

Thanks.

@xfq
Copy link
Member

xfq commented Aug 17, 2021

Also i want to know can we propose more than one counter styles of a particular language(If that community uses different counter styles)

This is possible. If you look at https://w3c.github.io/predefined-counter-styles/ , you will find that many languages/scripts contain different counter styles.

@r12a
Copy link
Contributor

r12a commented Sep 6, 2021

hello @vermaprashant1 here is the information i was able to gather and information i still need for each of the above:

Santali

https://sat.wikipedia.org/wiki/%E1%B1%9C%E1%B1%A9%E1%B1%9C%E1%B1%9A%E1%B1%9E_%E1%B1%9B%E1%B1%9A%E1%B1%A8%E1%B1%A1%E1%B1%9A%E1%B1%A2%E1%B1%9F

This list was long enough for me to establish the full components:

system: alphabetic;
symbols: ᱚ ᱛ ᱜ ᱝ ᱞ ᱟ ᱠ ᱡ ᱢ ᱣ ᱤ ᱥ ᱦ ᱧ ᱨ ᱩ ᱪ ᱫ ᱬ ᱭ ᱮ ᱯ ᱰ ᱱ ᱲ ᱳ ᱴ ᱵ ᱶ ᱷ;
suffix: ')';

Bengali

https://allresultnet.com/ssc-bangla-mcq-solution/

The green text is a numeric style with a . separator. We already have this style documented at https://w3c.github.io/predefined-counter-styles/#bengali-styles Am i missing something?

https://www.youtube.com/watch?v=3oFEeMkOJBE&ab_channel=AmlaN

Not sure what you wanted me to get out of this video. Unfortunately, I don't have time to work that out by watching. Please summarise the key points.

https://jobsandhan.com/mcq-questions-answers/bengali-language/

This appears to be an alphabetic list with a . suffix. Only 4 items are listed, however. To create a rule we need to know what letters are in the full list. For example, the list begins with consonants, rather than independent vowel letters. Do the latter appear at all? What consonants appear in the full list - are some letters excluded (eg. what about khanda ta)? Is the order the same as that in the Unicode block? etc.

https://www.gksolves.com/2020/10/bengali-literature-Questions-And-Answers.html

Alphabetic list with ) suffix. Same problem: need to know what's in the full list.

Odia

https://www.studiestoday.com/sample-papers-languages-cbse-class-9-sample-paper-odia-language-203890.html

Alphabetic list with prefix ( and suffix ). Same problem.

https://unacademy.com/lesson/odia-grammar-expected-questions-part-1-in-odia/5W824JLO

Video. Same problem.

https://www.studiestoday.com/sample-papers-languages-cbse-class-11-oriya-sample-paper-set-219619.html

Same as first sample for Odia(?)

@vermaprashant1
Copy link
Author

hello @vermaprashant1 here is the information i was able to gather and information i still need for each of the above:

Santali

https://sat.wikipedia.org/wiki/%E1%B1%9C%E1%B1%A9%E1%B1%9C%E1%B1%9A%E1%B1%9E_%E1%B1%9B%E1%B1%9A%E1%B1%A8%E1%B1%A1%E1%B1%9A%E1%B1%A2%E1%B1%9F

This list was long enough for me to establish the full components:

system: alphabetic;
symbols: ᱚ ᱛ ᱜ ᱝ ᱞ ᱟ ᱠ ᱡ ᱢ ᱣ ᱤ ᱥ ᱦ ᱧ ᱨ ᱩ ᱪ ᱫ ᱬ ᱭ ᱮ ᱯ ᱰ ᱱ ᱲ ᱳ ᱴ ᱵ ᱶ ᱷ;
suffix: ')';

As per the wikipedia resource ,the above code you mentioned contains the full components. This resourse shows the wider list after the ᱷ character and works like aa ab ac.......

Bengali

https://allresultnet.com/ssc-bangla-mcq-solution/

The green text is a numeric style with a . separator. We already have this style documented at https://w3c.github.io/predefined-counter-styles/#bengali-styles Am i missing something?

https://www.youtube.com/watch?v=3oFEeMkOJBE&ab_channel=AmlaN

Not sure what you wanted me to get out of this video. Unfortunately, I don't have time to work that out by watching. Please summarise the key points.

https://jobsandhan.com/mcq-questions-answers/bengali-language/

This appears to be an alphabetic list with a . suffix. Only 4 items are listed, however. To create a rule we need to know what letters are in the full list. For example, the list begins with consonants, rather than independent vowel letters. Do the latter appear at all? What consonants appear in the full list - are some letters excluded (eg. what about khanda ta)? Is the order the same as that in the Unicode block? etc.

https://www.gksolves.com/2020/10/bengali-literature-Questions-And-Answers.html

Alphabetic list with ) suffix. Same problem: need to know what's in the full list.

It is difficult to locate the online resource that covers the full list. However we are consulting Bengali experts to reach out the things.

Odia

https://www.studiestoday.com/sample-papers-languages-cbse-class-9-sample-paper-odia-language-203890.html

Alphabetic list with prefix ( and suffix ). Same problem.

https://unacademy.com/lesson/odia-grammar-expected-questions-part-1-in-odia/5W824JLO

Video. Same problem.

https://www.studiestoday.com/sample-papers-languages-cbse-class-11-oriya-sample-paper-set-219619.html

Same as first sample for Odia(?)

No need to go through the videos. We will summarize the things. We will share you the complete list after consultation with experts Committee.
Thanks

@vermaprashant1
Copy link
Author

@r12a Under Web Standardization activity, Technology Development for Indian Languages(TDIL) of Ministry of Electronics and Information Technology has collected Indian languages alphabetic counters style full character sets from various experts based on the wider usage of these in publishing.Please let me know the further necessary actions for submission.

@r12a
Copy link
Contributor

r12a commented Apr 5, 2022

@vermaprashant1 do you have a link to this information?

@vermaprashant1
Copy link
Author

@vermaprashant1 do you have a link to this information?

No, I have document for the same.

@r12a
Copy link
Contributor

r12a commented Apr 6, 2022

Can we see it? Normally we would look at your information and reproduce what's needed in https://www.w3.org/TR/2021/NOTE-predefined-counter-styles-20210609/, unless you see a problem with that.

@vermaprashant1
Copy link
Author

vermaprashant1 commented May 11, 2022

@r12a ,
please find complete set of characters Alphabetic listing for Indian Languages-final.pdf for alphabetic listing received by experts for Indian languages. Also find Use-cases web-samples.pdf for the same.

@r12a
Copy link
Contributor

r12a commented May 18, 2022

@vermaprashant1 Just so you are aware, i have begun writing this information up in the WG Note. Thanks for providing it. I'll drop a note here when it's ready, so you can check it.

@vermaprashant1
Copy link
Author

@r12a Thanks..

@r12a
Copy link
Contributor

r12a commented Jun 23, 2022

@vermaprashant1 please check the pull request at #46

@r12a
Copy link
Contributor

r12a commented Jun 23, 2022

@vermaprashant1 I didn't add a style for Nepali because there is no list of characters.

From the 2nd item in the commentary provided i'm lead to think that we may need two separate lists, in fact, one for consonants and the other for vowels? Perhaps that's in addition to one full list?

@vermaprashant1
Copy link
Author

@r12a Ok
I will get back to you with the preferred listing used in Nepali language in consultation with other experts whether we should take both listing as the separate lists or not. I will come back to you with the recommendations.

@r12a
Copy link
Contributor

r12a commented Jun 23, 2022

Btw, i'd be curious to know whether they also would be able to provide a set of rules for Newar, using the Newa script.

@vermaprashant1
Copy link
Author

I will check. However we are associated with only experts of Devanagari script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants