Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input[maxlength] breaks a grapheme cluster down into pieces #7861

Open
saschanaz opened this issue Apr 26, 2022 · 5 comments
Open

input[maxlength] breaks a grapheme cluster down into pieces #7861

saschanaz opened this issue Apr 26, 2022 · 5 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest topic: forms

Comments

@saschanaz
Copy link
Member

saschanaz commented Apr 26, 2022

<input maxlength="7">

Copy-pasting 🏳️‍⚧ into that input twice results in 🏳️‍⚧🏳 in both Firefox and Chrome, which is fairly unexpected for users and websites can't really control this. (mastodon/mastodon#18038)

Can we specify that the cluster should not be broken down and instead be prevented altogether, so that the result can be 🏳️‍⚧ instead of 🏳️‍⚧🏳? (One counterpoint would be that this can't be consistently done for all browsers as the number of the clusters changes every time a Unicode update happen, but I'm not sure how it can cause actual interoperability issue in this case.)

(#1467 shows maxlength is complicated as WebKit counts the emoji as a single character, but that's about the counting of the characters.)

@domenic
Copy link
Member

domenic commented Apr 26, 2022

This kind of falls at the boundary between HTML and UI Events, and UI Events is unfortunately not that maintained... Maybe the best we can do is throw in something into HTML.

Is this kind of limitation something browsers are implemented in implementing?

@domenic domenic added needs implementer interest Moving the issue forward requires implementers to express interest topic: forms labels Apr 26, 2022
@saschanaz
Copy link
Member Author

I can take a look at the implementation some day if @annevk is okay with that.

@annevk
Copy link
Member

annevk commented Apr 26, 2022

Yeah, seems like a reasonable thing to fix.

I suppose an argument could be made that how maxlength is enforced for user input is a UI decision and browsers should be allowed to decide how to trim excessive inputs. (Websites could grab the paste event and set value directly if they don't like that.)

@r12a r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Apr 26, 2022
@aphillips
Copy link
Contributor

I tend to agree with @annevk that this could be browser specific. It might also be useful for the browser to indicate to the user visually that truncation has occurred. For example,some emoji sequences can be really long. This family emoji (👨🏻‍👩🏼‍👧🏾‍👧🏿) has 12 code points (don't forget skin tone selectors). With maxlength=7 and grapheme truncation, the input or paste would just appear to fail.

I think that specifying maxlength in terms of graphemes rather than code points would probably be best for end-users (that is, I think WebKit in #1467 is more user-friendly). However, this is probably not consistent with the expectations of page authors (if I said maxlength=7 I don't expect to get 7 x 12 code point family emojis = 84 code points as input)

Note that while the example features emoji, this also affects languages that use combining marks to form e.g. syllables. For example <input maxlength=5> and यूनिकोड results in यूनिक (the last conjunct should be को). While grapheme clusters in language don't tend to reach the absurd lengths that emoji sequences do, they still can be reasonably long (3-4 code points and rarely more) and result in damaged meaning if truncated in the middle. Definition of grapheme clusters in Unicode is imperfect (leading to various permutations, such as "extended" grapheme clusters and on-going work to fully describe cluster boundaries). @r12a can provide more detail.

@domenic
Copy link
Member

domenic commented Apr 27, 2022

Thanks @aphillips for the great reminder about how complicated the world of text is :).

For the HTML Standard, I guess the question is whether we say anything, and if so, how. I was thinking of expanding the existing text:

User agents may prevent the user from causing the element's API value to be set to a value whose length is greater than the element's maximum allowed value length.

by adding a paragraph or sentence like:

If user agents implement such a restriction, they should take special care in cases where multiple code units are entered at once, such as via pasting or using an input method editor. For example, if pasting यूनिकोड into an <input maxlength=7> field truncates the value to the first 7 code units, the result is यूनिक, but a more semantically correct (????) truncation would be यूनिको. Similarly, [... give examples of emoji situations ...]. The best user interface for such situations is not clear, so user agents might want to experiment and report back to the spec on what they find meets users' expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest topic: forms
Development

No branches or pull requests

5 participants