Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Japanese phonetic names to autocomplete attribute #5821

Open
agektmr opened this issue Aug 13, 2020 · 21 comments
Open

Support Japanese phonetic names to autocomplete attribute #5821

agektmr opened this issue Aug 13, 2020 · 21 comments
Labels
addition/proposal New features or enhancements i18n-jlreq Notifies Japanese script experts of relevant issues i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: forms

Comments

@agektmr
Copy link

agektmr commented Aug 13, 2020

Browser's address autofill has been missing phonetic name fields which is a foundational life matter for Japanese people. Without phonetic name, many Japanese people's names are hard to pronounce.

One of my business partners in Japan told me people at their customer center needs to make phone calls and pronounce their customer's name. Pronouncing wrong name is considered impolite.

The phonetic name is such a foundational component and already supported in:

  • Google Contacts
  • Apple's addressbook
  • Apple Pay

The Payment Request API is also considering to add phonetic name support

I have sent a request to add this in the Chrome password manager as well.
https://bugs.chromium.org/p/chromium/issues/detail?id=1115953

Please consider adding phonetic names in autocomplete attribute as follows:

  • phonetic-given-name
  • phonetic-family-name
@himorin himorin added i18n-jlreq Notifies Japanese script experts of relevant issues i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Aug 13, 2020
@rniwa
Copy link

rniwa commented Aug 13, 2020

Doesn't phonetic come in the form of katakana or hiragana? Does this proposal care either is used? Also, it's unclear what happens in Chinese. There is pinyin and then bopomofo. What do we do with romanization of Cantonese?

@Yay295
Copy link
Contributor

Yay295 commented Aug 13, 2020

As far as I understand it this would just be two new labels; there's no auto-translation going on here. You could put whatever you want into these fields, and your browser would be able to recognize and save them by those labels.

@rniwa
Copy link

rniwa commented Aug 14, 2020

I mean the problem is that different forms different type of characters to be submitted. So if a form is asking for katakana and you autofill hiragana, it's not gonna work.

@agektmr
Copy link
Author

agektmr commented Aug 14, 2020

I think katakana or hiragana doesn't matter much because similar issues are already happening in other fields.

  • Should name, address, etc be written in Japanese or in Roman chars?
  • Should address-lines include numbers as ASCII char (ex: "3") or Japanese char (ex: "3")? This is very well known and common problem in Japanese forms as many companies ask to enter Japanese char but people prefer ASCII.

At least in Chrome, current mitigation for me is to have multiple autofill entries with Japanese version, English version, etc.

Regarding Chinese romanization, I'm not very familiar with it, unfortunately. If it's something these proposed fields can solve, why not?

@xfq
Copy link
Contributor

xfq commented Aug 14, 2020

Heteronyms are rare in Chinese, and with the standardization of pronunciation, they become fewer and fewer.

A Japanese kanji usually has more than two pronunciations, so this problem is more serious in Japanese than in Chinese. Personally, I have never seen phonetic names in Chinese forms, but I often see them in Japanese forms.

(FWIW, https://www.w3.org/International/questions/qa-personal-names#japanese has an example for Japanese.)

@rniwa
Copy link

rniwa commented Aug 15, 2020

I think katakana or hiragana doesn't matter much because similar issues are already happening in other fields.

  • Should name, address, etc be written in Japanese or in Roman chars?
  • Should address-lines include numbers as ASCII char (ex: "3") or Japanese char (ex: "3")? This is very well known and common problem in Japanese forms as many companies ask to enter Japanese char but people prefer ASCII.

At least in Chrome, current mitigation for me is to have multiple autofill entries with Japanese version, English version, etc.

It doesn't sound right that we're adding a Web facing feature for which users would have to have mitigations / workarounds. Can't we solve this once & for all? For example, we could specify that phonetic names should be written in katakana in Japanese as authoring requirement.

Heteronyms are rare in Chinese, and with the standardization of pronunciation, they become fewer and fewer.

A Japanese kanji usually has more than two pronunciations, so this problem is more serious in Japanese than in Chinese. Personally, I have never seen phonetic names in Chinese forms, but I often see them in Japanese forms.

I have no doubt that this is more serious issue in Japanese. On the other hand, are we sure we don't have any other language (e.g. Sub-saharan African languages) in which this is not an issue? It seems to be that if the solution is specific to Japanese, using a generic name like "phonetic" isn't useful. If on the other hand, there are other languages in which this is relevant, then we should probably study use cases pertaining to those languages as well.

@rniwa
Copy link

rniwa commented Aug 15, 2020

I had some conversation with my colleagues, maybe we should reincarnate inputmode=hiragana and inputmode=katakana since that would also allow UA to automatically change the input method to the appropriate type (macOS offers a specific "keyboard" — a.k.a. IME — for hiragana & katakana).

Then autocomplete=family-name inputmode=katakana would mean family name in katakana for auto fill purposes.

@annevk annevk added addition/proposal New features or enhancements topic: forms labels Aug 17, 2020
@annevk
Copy link
Member

annevk commented Aug 17, 2020

cc @whatwg/forms

@kourge
Copy link

kourge commented Aug 20, 2020

In general, Chinese doesn't have nanori like Japanese does, but there are a few valid use cases.

  • Heteronyms: like @xfq said, they exist in Chinese, but it's becoming increasingly rare to find people whose name has an unusual pronunciation. That's not to say they don't exist, but parents would rather that this kind of trouble be nipped in the bud.
  • As a workaround for rare characters that do not yet have their own codepoint: for example, prior to the existence of U+5586, David Tao's name was frequently rendered as 吉吉. Setting phonetic-given-name to Zhé (pinyin) or ㄓㄜˊ (zhuyin) would be a good way to indicate that 吉吉 is meant to be read as a single character.
  • As a way to hint which Chinese language should be used to pronounce the name: not all Chinese names are pronounced in Mandarin by default, since Mandarin is one of many Chinese languages. For example, the family name 吳 can be read as wu (Mandarin, pinyin), ng4 (Cantonese, jyutping), ngô͘ (Hokkien, POJ), and so on.

To dive a little deeper into that last point: I'm more or less approaching the same topic @rniwa broached, but from the other direction: I think the contents of phonetic-given-name and phonetic-family-name can be used to infer which Chinese language acts as the preferred way to pronounce a name.

(The above is a summary of a discussion between @CYBAI and myself.)

@agektmr
Copy link
Author

agektmr commented Aug 20, 2020

autocomplete=family-name inputmode=katakana

Using inputmode as a way to allow alternate expressions of the same data sounds an interesting idea to me. But are you suggesting to reuse family-name for them or is it just a typo? If you did mean to reuse family-name what is the benefit of it over explicit phonetic-family-name?

I don't know much about how browser internally handles these data but I feel having an explicit phonetic-family-name seems cleaner to me.

@rniwa
Copy link

rniwa commented Aug 20, 2020

autocomplete=family-name inputmode=katakana

Using inputmode as a way to allow alternate expressions of the same data sounds an interesting idea to me. But are you suggesting to reuse family-name for them or is it just a typo? If you did mean to reuse family-name what is the benefit of it over explicit phonetic-family-name?

Yes, I'm suggesting that we don't add a new value to autocomplete but rather have the browser figure out what kind of family-name the text field wants based on the value of inputmode.

The benefit of using inputmode is that

  1. We can explicitly say katakana vs hiragana for inputmode instead of having to create katakana-family-name and hiragana-family-name. If some Chinese users wanted to use pinyin or zhuyin as ways of expressing their pronunciations, they could use inputmode=pinyin or inputmode=zhuyin.
  2. It also benefits the users that don't use auto-completion by hinting UA to pick the right kind of software keyboard / input method.

I don't know much about how browser internally handles these data but I feel having an explicit phonetic-family-name seems cleaner to me.

This suggestion seems strictly worse because "phonetic" doesn't tell UA whether this is katakana, hiragana, pinyin, or zhuyin, and it doesn't let UA pick the right kind of input method / software keyboard for users that are not using autocompletion.

@agektmr
Copy link
Author

agektmr commented Sep 26, 2020

Hi, sorry about my long delay in responding.

I agree that your suggestion of using autocomplete and inputmode together sounds like a good idea, rather than adding alternate representations to each autocomplete items. Also adding katakana and hiragana to inputmode seems sensible as there's such expectations in Japanese forms, but as far as I know, there's no such keyboard that constrain itself to enter only katakana or hiragana. Using pattern would be the better approach for that purpose.

But anyway, your suggestion sounds like the way to go.

As I'm not an engineer building a browser myself, I'll consult my colleagues to see if this proposal sounds sensible to them and if changing structure of autofill data is something our team can consider.

@rniwa
Copy link

rniwa commented Sep 26, 2020

I agree that your suggestion of using autocomplete and inputmode together sounds like a good idea, rather than adding alternate representations to each autocomplete items. Also adding katakana and hiragana to inputmode seems sensible as there's such expectations in Japanese forms, but as far as I know, there's no such keyboard that constrain itself to enter only katakana or hiragana. Using pattern would be the better approach for that purpose.

In macOS, you can choose Hiragana and Katakana "keyboards" (basically equivalent of IME on Windows):
Screenshot of Input Sources in Keyboard System Preferences

It would still allow you to convert the typed katakana or hiragana to convert to kanji and each other but the important thing is the default behavior. It would help people start typing their names in the right character. I always find it annoying that I'd have to manually switch between different keyboard types when I'm filling forms on macOS.

I don't think there is an equivalent software keyboard for iOS / iPadOS though.

@jcayzac
Copy link

jcayzac commented Oct 16, 2020

@rniwa I love the idea of using inputmode for this. Regarding Japanese specifically, though, wouldn't there be a need to support the other modes beside just fullwidth hiragana and katakana? I've stumbled on forms that required fullwidth romaji already, and many bank forms require halfwidth katakana.

@agektmr
Copy link
Author

agektmr commented Oct 16, 2020

I support @jcayzac 's idea. The same approach can be applied to other autocomplete properties in Japanese forms not just names.

@sideshowbarker
Copy link
Contributor

I've stumbled on forms that required fullwidth romaji already

In my experience it’s common to see sites with forms fields that require fullwidth input, period. That is, they have form fields that’ll accept romaji if it’s fullwidth, but that’ll also accept (and basically, expect) kanji or (fullwidth) kana.

But I believe such sites are user hostile, and site requirements for fullwidth romaji is not something we should facilitate.

and many bank forms require halfwidth katakana.

I agree it’s also common to see sites with form fields that require halfwidth katakana input.

But I believe that doing that is even more user-hostile than forcing users to input fullwidth romaji.

Regarding Japanese specifically, though, wouldn't there be a need to support the other modes beside just fullwidth hiragana and katakana?

I don’t think we should be doing anything to facilitate sites that require halfwidth katakana input. I think we should instead be doing everything we can to discourage and eradicate halfwidth katakana input.

The fullwidth-romaji input case is more of a gray area. In my experience, sites are not usually intentionally requiring users to input fullwidth romaji explicitly — it’s just that they have form fields that are intended for input of kanji or kana, but without the consideration of the fact that some people who are going to use that form don’t normally write their names in kanji or kana, but instead with non-kanji/non-kana characters.

So I think the real solution to the user problems with those kinds of forms is for the creators of the sites to not be requiring fullwidth input for any of their form fields to begin with, but instead just accepting any character input for them.

I definitely don’t think a good solution would be for us to add more mechanisms to the web platform that further proliferation of those kinds of sites — I mean, by facilitating the ability of the creators of those kinds of sites to continue forcing a user-hostile user experience onto their users. We should make it harder for those sites to keep doing that, not easier.

@jcayzac
Copy link

jcayzac commented Oct 16, 2020

@sideshowbarker Thanks for the reply. I overall agree with everything you say regarding the eradication of user-hostile input modes (for which conversions are trivial anyway). Let's kill both halfwidth katakana/hiragana and fullwidth romaji please :) It would be great if any proposal augmenting the inputmode attribute to support transliterations would cite this as an explicit design goal, still.

@jcayzac
Copy link

jcayzac commented Oct 16, 2020

The only argument against killing halfwidth katakana/hiragana that I can think of is user agents with reduced screen estate. Feature phones are still a thing, in Japan, and models targeting the aging population are not capable of displaying wide text using the fullwidth variant.

Admittedly, this could probably be simply called a typeface issue, though.

@sideshowbarker
Copy link
Contributor

sideshowbarker commented Oct 16, 2020

The only argument against killing halfwidth katakana/hiragana that I can think of is user agents with reduced screen estate. Feature phones are still a thing, in Japan, and models targeting the aging population are not capable to display long text using the fullwidth variant.

Admittedly, this could probably be simply called a typeface issue, though.

Right — but anyway that’s all related to this display side of things rather than the input side, right?

To be clear: I didn’t mean to suggest we do anything to hamper display side of halfwidth kana.

If someone chooses to use halfwidth kana for some reason, that’s great (e.g., people doing things with it in messages on Twitter and other social media). Written Japanese is maybe unique in the variety of expressive overcomplexityrichness it provides, and halfwidth kana is a part of it.

I just mean to be critical only about the input side of halfwidth kana — to say that I don’t think any sites should be forcing users to input halfwidth kana.


In my experience, the sites that make users input halfwidth kana aren’t motivated by love for the expressive richness of written Japanese, but are instead being incredibly lazy and making users do work that they should be doing themselves — because it seems like in most cases, the reason those sites are requiring users to input data in halfwidth kana is just because that’s how the sites are storing that data on the backend (I guess in some legacy database where they already have a ton of data stored in halfwidth kana). But the sites could instead accept normal kana input, and trivially convert that to halfwidth before storing it.

@jcayzac
Copy link

jcayzac commented Oct 16, 2020

I support @jcayzac 's idea. The same approach can be applied to other autocomplete properties in Japanese forms not just names.

Yes, some forms also ask you to input katakana for your address, although I don't understand what could be a valid rationale for that. I guess this could be used for accessibility when your address is rendered to a third party using a screenreader, but I don't believe the frontend developers responsible for any such form I had the displeasure of filling in the past have ever cared about a11y.

EDIT: Ah, I think maybe it's used for customer support over the phone, so that staff can enter address details in a search box without knowing anything about the actual names in the address.

@rniwa
Copy link

rniwa commented Oct 16, 2020

I support @jcayzac 's idea. The same approach can be applied to other autocomplete properties in Japanese forms not just names.

Yes, some forms also ask you to input katakana for your address, although I don't understand what could be a valid rationale for that.

The support for the full range of Kanji characters is a relatively new addition in corporate computer systems. There were a lot of legacy banking systems that used to or maybe still only store data in Katakana. That's how many COBOL programmers still have jobs. In theory, bank employees should be able to figure out how to read addresses written in Kanji and transcribe the equivalent Katakana but that requires work so they often require users to type in Katakana anyway.

There is even an infamous horror of some customers of a merged bank getting a notice to shorten their mailing addresses when two major banks in Japan got merged because the merged bank ended up using the older system of two of the merged banks (that bank had power political power) which supported a fewer number of characters in the mailing address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements i18n-jlreq Notifies Japanese script experts of relevant issues i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: forms
Development

No branches or pull requests

9 participants