"character" is not defined #73

r12a · 2017-09-15T15:46:59Z

[from Addison Phillips]

https://w3c.github.io/input-events/#interface-InputEvent-Attributes

In section 5.1.2 there are multiple places where the term "character" is used without definition. It would be better to clearly define this to mean a Unicode code point.

johanneswilm · 2018-06-24T21:28:09Z

@r12a @aphillips sorry for late reply, somehow I had missed this.

I am ok with defining the term character. But I cannot find any appropriate definition of the term in the W3C repositories which doesn't use the word "character" as explanation for what that is. And clearly we cannot link to that, because such a definition would be circular. The definition on Wikipedia makes the term code point even broader: "Many code points represent single characters but they can also have other meanings, such as for formatting." [1]

[1] https://en.wikipedia.org/wiki/Code_point

xfq · 2018-06-25T04:12:15Z

FWIW, in Infra:

A code point is a Unicode code point and is represented as a four-to-six digit hexadecimal number, typically prefixed with "U+".
[...]
Code points are sometimes referred to as characters and in certain contexts are prefixed with "0x" rather than "U+".

johanneswilm · 2018-11-01T08:15:14Z

Based on the meeting at TPAC, we are waiting for a a suggestion on how to adjust the explanatory note text from @r12a .

aphillips · 2021-07-13T13:31:19Z

Updating this issue as part of I18N's regular clean-up cycle. There is now a definition in the spec:

https://w3c.github.io/input-events/#definitions

This defines "character" as:

A character is an extended grapheme cluster. [UAX29]

I'm not sure that this is what is intended, given that some input events (backwards deletion, certain cursoring operations) may be on a code point basis. This needs a read-through to determine. In addition, it looks like we owe some text based on a meeting at TPAC. I'll update our tracking issue to needs attention and add it to our action list.

johanneswilm · 2021-07-13T14:24:12Z

@aphillips See also previous discussion here: #71 (comment) .

aphillips · 2021-07-13T14:27:05Z

@johanneswilm I didn't really re-read Input Events this morning when making comments--relying on memory can thus be tricky. Cursoring/selection changes are something I know we've talked about somewhere, but perhaps not in input events :-)

For backward deletion without an IME, yes: generally speaking backwards deletion works on a code point basis. Try a sequence like U+0061 U+0300 (à). Even simple editors like Notepad will delete the accent separately from the base letter when using backspace (even though you cannot select them separately). This is, of course, only true for denormalized input. U+00E0 (à) deletes as a single code point.

Languages such as the Indic ones that rely/require combining marks depend on this behavior for users to be able to correct typos. Of course, some of these also use IMEs.

johanneswilm · 2021-07-13T14:33:22Z

@aphillips You are right, but after rereading that discussion, I believe we were aware of this difference at the time we included the definition. We only use the definition of "character" for the "insertTranspose" input type, in which case it really is switching two characters and it's not ever on code point basis.

But I might be wrong. At any rate, I think the last we officially heard was that we would receive a PR from @r12a so if we can get that now, that would be preferable.

aphillips · 2021-07-13T15:01:36Z

@johanneswilm I'm working on getting that PR (or at least evaluating if more work is needed) from I18N (probably @r12a or I) but I think it'll probably be at least a few days while we remind ourselves of where we left this. Transpositioning of characters should be done on a grapheme cluster basis for sure. Stay tuned.

aphillips · 2022-03-07T23:55:47Z

Reviewing this today (2022-03-07) it appears we didn't put in a PR. I have reviewed the current WD: @johanneswilm's description is correct. The term character is only used once in the document, in the insertTranspose function.

The I18N WG is admittedly pedantic about character encoding jargon. In this case, the meaning of "character" is intended to be a "user-perceived character", aka a grapheme or grapheme cluster. I would suggest:

Remove the definition of character from the Terminology section, since it is only used on the one time in the entire document. This will avoid future revisions accidentally using the term in a different way.
Replace the term 'character' in insertTranspose with the term grapheme, linking from the [I18N-GLOSSARY]. (We created the I18N glossary since the last comments on this thread and it's specref referenceable)

Would you prefer a PR for this?

link to grapheme definition in i18n-glossary, fixes #73

r12a added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Sep 15, 2017

aphillips mentioned this issue Sep 15, 2017

"character" is not defined w3c/i18n-activity#210

Closed

johanneswilm added a commit that referenced this issue Jun 25, 2018

improve wording on unicode terminology, relates to #71 #73

8e8aaf5

johanneswilm added a commit that referenced this issue Sep 13, 2023

Update links to i18n-gloassry, fixes #73

0d026c9

siusin closed this as completed in 58ec6b3 Sep 14, 2023

siusin added a commit that referenced this issue Sep 14, 2023

Merge pull request #139 from w3c/grapheme-definition

383151e

link to grapheme definition in i18n-glossary, fixes #73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"character" is not defined #73

"character" is not defined #73

r12a commented Sep 15, 2017

johanneswilm commented Jun 24, 2018

xfq commented Jun 25, 2018

johanneswilm commented Nov 1, 2018

aphillips commented Jul 13, 2021

johanneswilm commented Jul 13, 2021

aphillips commented Jul 13, 2021

johanneswilm commented Jul 13, 2021

aphillips commented Jul 13, 2021

aphillips commented Mar 7, 2022

"character" is not defined #73

"character" is not defined #73

Comments

r12a commented Sep 15, 2017

johanneswilm commented Jun 24, 2018

xfq commented Jun 25, 2018

johanneswilm commented Nov 1, 2018

aphillips commented Jul 13, 2021

johanneswilm commented Jul 13, 2021

aphillips commented Jul 13, 2021

johanneswilm commented Jul 13, 2021

aphillips commented Jul 13, 2021

aphillips commented Mar 7, 2022