Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which language code for Hanja? #23

Open
fantasai opened this issue Oct 4, 2018 · 9 comments
Open

Which language code for Hanja? #23

fantasai opened this issue Oct 4, 2018 · 9 comments
Labels

Comments

@fantasai
Copy link

fantasai commented Oct 4, 2018

For texts written in Hanja, which is the preferred lang tag, ko-Hant or ko-Hani?

@asmusf
Copy link

asmusf commented Oct 5, 2018

Good question. Hani is neutral on the simplified/traditional axis, so, as it doesn't apply to Hanja, might be preferable. (Assuming the text truly is Hanja only).

@behnam
Copy link
Member

behnam commented Oct 5, 2018

ISO 15924 does define:

  • Kore /287 as "Korean (alias for Hangul + Han)",
  • Hani/500 as "Han (Hanzi, Kanji, Hanja)",
  • Hans/501 as "Han (Simplified variant)", and
  • Hant/502 as "Han (Traditional variant)".

From these, it looks like the Han part of Korean writing system is supposed to be tagged with Hani/500.

I don't have data on what the common practices are, though. Interested to see if there's anything significant.

@frivoal
Copy link

frivoal commented Oct 5, 2018

Since Korean written in Hanja has only ever been written in traditional ideographs, wouldn't using -Hant have a better chance of matching the right (traditional) glyphs in a font that has several variants? Or will fonts pick the correct variant based on ko- regardless of -Hant vs -Hani?
Similarly, could one use ko-Hans intentionally to get glyphs characters for han-unified characters, such as for a transliteration of an old korean text, for the sake of a modern mainland china reader)

@asmusf
Copy link

asmusf commented Oct 5, 2018 via email

@frivoal
Copy link

frivoal commented Oct 5, 2018

Your comment assumes that the simplified / traditional shape difference is a font style matter

I'm assuming that it is the case for some characters, due to han unification, not all.

In the case of Chinese, where the Hans / Hant contrast is relevant, the script designator has an impact on which glyph gets picked in the case of ambiguous ones, in addition to the language. I am just wondering whether it also has an impact if the language is ko-, and if that's the case, whether that informs which of Hant or Hani is more appropriate.

@soon-bum
Copy link

soon-bum commented Oct 7, 2018

In Korea, we are using the code for Korean character set (kore/287),
which includes 11,172 Hangul + 4,888 Hanja + 567 other special characters.
The 4,888 Hanja characters are much different from Hans/501 characters,
and slightly different from Hant/502 characters in their shape.
There characters are corresponding to the subset of Hani/500.

@frivoal
Copy link

frivoal commented Oct 7, 2018

The 4,888 Hanja characters are [...] slightly different from Hant/502 characters in their shape.
These characters are corresponding to the subset of Hani/500.

Thanks, that helps me understand why Hani instead of Hant. I had mistakenly assumed that since their had been no simplification, "traditional" was appropriate.

(As far as I am concerned, we can close this issue)

@r12a
Copy link
Contributor

r12a commented Oct 9, 2018

The IANA subtag registry suggests hani, since that's the only subtag that mentions hanja (see https://r12a.github.io/app-subtags/?find=hanja)

    type: script
    subtag: Hani
    description: Han, Hanzi, Kanji, Hanja
    added: 2005-10-16

I am just wondering whether it also has an impact if the language is ko-, and if that's the case,

Absent a font-family assignment, the ko causes browsers to pick a Korean font. See this test and the results. Note, btw, that the glyph shape chosen in Firefox on my Macbook is different from any of zh-hant, zh-hans, or ja. (Font applied is Apple SD Gothic Neo.)

screen shot 2018-10-09 at 10 04 55

All the above assumes, as Asmus said, that we're really making a point of the fact that the text contains only hanja characters, otherwise i think that ko or less likely ko-kore is more appropriate. (As usual, it depends why you're using the lang attribute.)

@frivoal
Copy link

frivoal commented Oct 10, 2018

All the above assumes, as Asmus said, that we're really making a point of the fact that the text contains only hanja characters, otherwise i think that ko or less likely ko-kore is more appropriate. (As usual, it depends why you're using the lang attribute.)

Right. The context for this question is https://drafts.csswg.org/css-text-3/#script-tagging where we are specifically discussing unusual/historic writing styles (such as pure hanja in Korean), and the effect that should have on css-controlled typesetting. Either -Hant and -Hani (or for that matter, Hans) would be sufficient to clue css into the fact that this is a text written with Chinese style typography, but we wanted to make sure to use the most appropriate script tag in the example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants
@behnam @frivoal @fantasai @r12a @asmusf @soon-bum and others