Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicating which character collection is used #84

Open
murata2makoto opened this issue Dec 9, 2021 · 0 comments
Open

Indicating which character collection is used #84

murata2makoto opened this issue Dec 9, 2021 · 0 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@murata2makoto
Copy link
Collaborator

murata2makoto commented Dec 9, 2021

Unicode has 92,865 CJK ideographic characters. But each language uses a small subset. Annex A of ISO/IEC 10646 shows a list of character collections relevant to Japanese text. (Note: Annex A also provides collections for other languages as well). Each of the listed character collections contains less than 10,000 characters.

Assistive technologies (e.g., Japanese TTS) are unlikely to handle 92,865 CJK ideographic characters. According to a report from a Japanese ministry in 2015, most TTS engines support 6355 characters in JIS X 0208 only. I have not heard significant improvements since then.

Moreover, authors of textbooks or books for children use even smaller subsets for pedagogical reasons. For example, 1006 CJK ideographic characters are taught in Japanese compulsory education.

I thus think that accessibility metadata should be able to indicate (1) which character collection is used as a basis and (2) which character beyond the specified collection is used as exceptions, which are sometimes necessary. I believe that this is good for other CJK countries. Moreover, since no languages and no TTS engines support all Unicode characters, I guess that this is good for everybody.

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

2 participants