Which language code for Hanja? #23

fantasai · 2018-10-04T21:46:19Z

For texts written in Hanja, which is the preferred lang tag, ko-Hant or ko-Hani?

The text was updated successfully, but these errors were encountered:

asmusf · 2018-10-05T00:20:45Z

Good question. Hani is neutral on the simplified/traditional axis, so, as it doesn't apply to Hanja, might be preferable. (Assuming the text truly is Hanja only).

behnam · 2018-10-05T00:27:17Z

ISO 15924 does define:

Kore /287 as "Korean (alias for Hangul + Han)",
Hani/500 as "Han (Hanzi, Kanji, Hanja)",
Hans/501 as "Han (Simplified variant)", and
Hant/502 as "Han (Traditional variant)".

From these, it looks like the Han part of Korean writing system is supposed to be tagged with Hani/500.

I don't have data on what the common practices are, though. Interested to see if there's anything significant.

frivoal · 2018-10-05T08:27:19Z

Since Korean written in Hanja has only ever been written in traditional ideographs, wouldn't using -Hant have a better chance of matching the right (traditional) glyphs in a font that has several variants? Or will fonts pick the correct variant based on ko- regardless of -Hant vs -Hani?
Similarly, could one use ko-Hans intentionally to get glyphs characters for han-unified characters, such as for a transliteration of an old korean text, for the sake of a modern mainland china reader)

asmusf · 2018-10-05T09:04:24Z

Your comment assumes that the simplified / traditional shape difference is a font style matter - that is not generally the case, each has separate character codes. With regards to font styles, Japanese, Korean, Chinese and traditional Chinese all use slightly different styles for the same characters. However, for Japanese and Korean, you cannot capture that in the script designator, but must use the language part as well.

frivoal · 2018-10-05T09:49:20Z

Your comment assumes that the simplified / traditional shape difference is a font style matter

I'm assuming that it is the case for some characters, due to han unification, not all.

In the case of Chinese, where the Hans / Hant contrast is relevant, the script designator has an impact on which glyph gets picked in the case of ambiguous ones, in addition to the language. I am just wondering whether it also has an impact if the language is ko-, and if that's the case, whether that informs which of Hant or Hani is more appropriate.

soon-bum · 2018-10-07T00:24:09Z

In Korea, we are using the code for Korean character set (kore/287),
which includes 11,172 Hangul + 4,888 Hanja + 567 other special characters.
The 4,888 Hanja characters are much different from Hans/501 characters,
and slightly different from Hant/502 characters in their shape.
There characters are corresponding to the subset of Hani/500.

frivoal · 2018-10-07T01:43:04Z

The 4,888 Hanja characters are [...] slightly different from Hant/502 characters in their shape.
These characters are corresponding to the subset of Hani/500.

Thanks, that helps me understand why Hani instead of Hant. I had mistakenly assumed that since their had been no simplification, "traditional" was appropriate.

(As far as I am concerned, we can close this issue)

r12a · 2018-10-09T09:13:04Z

The IANA subtag registry suggests hani, since that's the only subtag that mentions hanja (see https://r12a.github.io/app-subtags/?find=hanja)

    type: script
    subtag: Hani
    description: Han, Hanzi, Kanji, Hanja
    added: 2005-10-16

I am just wondering whether it also has an impact if the language is ko-, and if that's the case,

Absent a font-family assignment, the ko causes browsers to pick a Korean font. See this test and the results. Note, btw, that the glyph shape chosen in Firefox on my Macbook is different from any of zh-hant, zh-hans, or ja. (Font applied is Apple SD Gothic Neo.)

All the above assumes, as Asmus said, that we're really making a point of the fact that the text contains only hanja characters, otherwise i think that ko or less likely ko-kore is more appropriate. (As usual, it depends why you're using the lang attribute.)

frivoal · 2018-10-10T00:00:48Z

All the above assumes, as Asmus said, that we're really making a point of the fact that the text contains only hanja characters, otherwise i think that ko or less likely ko-kore is more appropriate. (As usual, it depends why you're using the lang attribute.)

Right. The context for this question is https://drafts.csswg.org/css-text-3/#script-tagging where we are specifically discussing unusual/historic writing styles (such as pure hanja in Korean), and the effect that should have on css-controlled typesetting. Either -Hant and -Hani (or for that matter, Hans) would be sufficient to clue css into the fact that this is a text written with Chinese style typography, but we wanted to make sure to use the most appropriate script tag in the example.

r12a mentioned this issue Oct 9, 2018

Which language code for Hanja? #23 w3c/i18n-activity#602

Closed

r12a added the question label Oct 30, 2018

r12a mentioned this issue Oct 30, 2018

Which language code for Hanja? w3c/i18n-activity#605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which language code for Hanja? #23

Which language code for Hanja? #23

fantasai commented Oct 4, 2018 •

edited

Loading

asmusf commented Oct 5, 2018

behnam commented Oct 5, 2018 •

edited

Loading

frivoal commented Oct 5, 2018

asmusf commented Oct 5, 2018 via email

frivoal commented Oct 5, 2018

soon-bum commented Oct 7, 2018

frivoal commented Oct 7, 2018

r12a commented Oct 9, 2018 •

edited

Loading

frivoal commented Oct 10, 2018

Which language code for Hanja? #23

Which language code for Hanja? #23

Comments

fantasai commented Oct 4, 2018 • edited Loading

asmusf commented Oct 5, 2018

behnam commented Oct 5, 2018 • edited Loading

frivoal commented Oct 5, 2018

asmusf commented Oct 5, 2018 via email

frivoal commented Oct 5, 2018

soon-bum commented Oct 7, 2018

frivoal commented Oct 7, 2018

r12a commented Oct 9, 2018 • edited Loading

frivoal commented Oct 10, 2018

fantasai commented Oct 4, 2018 •

edited

Loading

behnam commented Oct 5, 2018 •

edited

Loading

r12a commented Oct 9, 2018 •

edited

Loading