Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more information about chinese character sets & combining marks #627

Merged
merged 2 commits into from
Jul 12, 2024

Conversation

r12a
Copy link
Contributor

@r12a r12a commented Jul 11, 2024

Addresses comments in #619

Copy link

netlify bot commented Jul 11, 2024

Deploy Preview for clreq ready!

Name Link
🔨 Latest commit 276afff
🔍 Latest deploy log https://app.netlify.com/sites/clreq/deploys/6690c1e499cba20008c304ea
😎 Deploy Preview https://deploy-preview-627--clreq.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@r12a r12a requested a review from xfq July 11, 2024 15:36
@r12a
Copy link
Contributor Author

r12a commented Jul 11, 2024

@xfq i'd appreciate a quick turnaround on this review, if possible, so that i can prep the doc for publication. Thanks.

@@ -174,11 +174,11 @@ <h2>Chinese Script Overview</h2>

<p>Words are not separated by spaces or any other character. There is no case distinction. The visual forms of characters don't interact.</p>

<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and the Unicode Standard includes approaching 100,000 Han characters, many of which are archaic or esoteric.</p>
<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and number of characters in the Unicode Standard approaches 100,000 Han code points, many of which are archaic or esoteric. In fact, various regions define their own character sets, such as the 3,500 characters in the Tier I Table of <span lang="zh">通用规范汉字表</span> (General Standard Chinese Characters) in Mainland China, the 4,808 characters in the Taiwanese <span lang="zh">常用“国字”标准字体表</span> (Chart of Standard Forms of Common National Characters), the 4,759 characters in <span lang="zh">常用字字形表</span> (Common Chinese Characters) in Hong Kong SAR, or the sets of <span lang="zh">欢乐伙伴</span> (&quot;Happy Buddy&quot;) characters for Singaporean primary schools.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task force prefers to remove the CLDR numbers unless they have reliable sources.

See discussions in https://www.w3.org/2024/05/08-clreq-minutes.html#t01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I commented that text out because i wanted the information to still be available to myself for future discussions re. CLDR. The character sets now mentioned are all fairly small, relatively speaking, so i added the figure of 10,000 to give the impression of the size of the repertoire needed for things such as text editors (I know that Mainland China basic repertoire is substantially less than the Taiwanese, and this is only an indicator.)

resources/index.html Outdated Show resolved Hide resolved
Copy link
Member

@xfq xfq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first version, I think it is OK, but we need to continue discussing some details with the task force (after the publication).

@r12a r12a merged commit 17d5e8d into gh-pages Jul 12, 2024
4 checks passed
@r12a r12a deleted the r12a-patch-1 branch July 12, 2024 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants