Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handwriting Recognition API #591

Closed
1 task done
wacky6 opened this issue Dec 17, 2020 · 13 comments
Closed
1 task done

Handwriting Recognition API #591

wacky6 opened this issue Dec 17, 2020 · 13 comments
Assignees
Labels
Missing: Multi-stakeholder support Lack of multi-stakeholder support Progress: in progress Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Review type: CG early review An early review of general direction from a Community Group Topic: Input Topic: native platform integration Features that enable web sites to integrate better with native platforms Venue: WICG

Comments

@wacky6
Copy link

wacky6 commented Dec 17, 2020

HIQaH! QaH! TAG!

I'm requesting a TAG review of Handwriting Recognition API.

Handwriting is a widely used input method, one key usage is to recognize the texts when users are drawing. This feature already exists on many operating systems (e.g. handwriting input methods). However, the web platform as of today doesn't have this capability, the developers need to integrate with third-party libraries (or cloud services), or to develop native apps.

We want to add handwriting recognition capability to the web platform, so developers can use the existing handwriting recognition features available on the operating system.

Further details:

  • I have reviewed the TAG's API Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done: unknown
  • Existing major pieces of multi-stakeholder review or discussion of this design: GitHub issues
  • Major unresolved issues with or opposition to this design:
    • Complex script text segmentation: issue 6
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Thanks.

@wacky6 wacky6 added Progress: untriaged Review type: CG early review An early review of general direction from a Community Group labels Dec 17, 2020
@atanassov atanassov self-assigned this Jan 6, 2021
@torgo torgo added this to the 2021-01-11-week milestone Jan 6, 2021
@cynthia cynthia self-assigned this Jan 13, 2021
@cynthia cynthia added Progress: in progress Topic: Input Topic: native platform integration Features that enable web sites to integrate better with native platforms labels Jan 26, 2021
@cynthia
Copy link
Member

cynthia commented Jan 26, 2021

We briefly looked at this during our F2F today, and had a couple early review questions:

  1. How does this work when it comes to writing vertically or right-to-left?
  2. Same question, but in the context of code-switching? (e.g. English word in between Arabic text?)
  3. Does it make sense for this to be stuck directly on to navigator? Could you let us know why it is there?

@r12a do you have any input on this?

@cynthia
Copy link
Member

cynthia commented Jan 26, 2021

What's the metric of the cartesians in the explainer? Are they physical pixels, logical pixels, or something else?

@r12a
Copy link

r12a commented Jan 26, 2021

@cynthia WICG/handwriting-recognition#4

I also made a bunch of other i18n-related comments (see the issue list).

@wacky6
Copy link
Author

wacky6 commented Jan 28, 2021

Vertical writing.

Here I assume you mean a language that can be written both horizontally and vertically.

Google's recognizer generally returns characters in the order they were written (for the above type of languages). So it works in both writing directions (e.g. rtl, ltr, top-bottom). Our metric shows vertical written isn't commonly used by our users, so this feature hasn't got recent attentions.

We aren't sure how other recognizers work. Some may only work with one direction (and doesn't work at all for vertical writing). Some may ignore the character writing order.

WDYT to have a hint about writing direction? In case some recognizers need this information. Note, some recognizer may disregard this hint altogether.

RTL writing

For RTL languages, the recognizer already knows it should process text from right to left.

For LTR languages, but characters written from right to left (e.g. "hello" written in "olleh" order). It's a rare/uncommon scenario. I'm not sure what's the correct interpretation. The user perhaps want the text to be interpreted as "hello", but it's really up to the recognizer to decide what it will output. Either output can be considered valid IMO.


Mixed scripts.

The recognizer could determine the writing direction by looking at each character's written time and their spatial relations. Similarly for context switching.

For example,

  • Unidirectional text "ABC". The writing direction can be learned by looking at the order of each character (A->B->C or C->B->A).
  • Mixed: "AB cba CD" (upper-case / lower-case are two different scripts), "A->B->C->D->a->b", or, "A->B->C->D->b->a".

This being said, existing recognizers (those available on the market) don't support mixed scripts (e.g. english + arabic). They will recognize text as if the text is written in a single script (e.g. recognize arabic characters as english characters, and give less-ideal results).

I don't think we should try to solve the mixed script problem if the underlying implementations haven't solved it. Our solution may not work for them. Or, if the implementation is advanced, it doesn't care about whether we provide this information / hint).


Why navigator object

We choose navigator because it's preferred over alternatives (e.g. window, global constructor):

We expect handwriting recognizer to interact with platform-specific APIs, and support different features (on different platforms). Navigator seems natural based on this consideration of feature differences.

We don't have particular preferences on where the methods are. Are you suggesting we put the methods behind a attribute (e.g. navigator.handwritingService.doSomething())?


What's the metric of the cartesians in the explainer

The explainer examples use logical pixels.

The recognizer doesn't particularly care about the measurement unit, as long as all provided coordinates are measured in the same way (i.e. don't mix logical pixels and device pixels).

The recognizer implementation normalizes the coordinates, and perform recognition relatively (e.g. relative to the smallest character / block in the drawing).

@cynthia
Copy link
Member

cynthia commented May 11, 2021

@wacky6, thank you for your patience! @atanassov and I looked at this during our F2F. Your response covers most of the questions we had - thanks a lot.

WDYT to have a hint about writing direction? In case some recognizers need this information. Note, some recognizer may disregard this hint altogether.

I think having that as an extension point would be useful - if there is some sort specific of post-processing that needs to be done based on this before it hits the recognizer, it feels like this information could be useful to expose.

For LTR languages, but characters written from right to left (e.g. "hello" written in "olleh" order). It's a rare/uncommon scenario.

We agree that this isn't an important scenario to handle. Our concerns on RTL was mostly about languages that are actually written left to right.

I don't think we should try to solve the mixed script problem if the underlying implementations haven't solved it.

If it's an unsolved problem, I think we don't need to delve too much into this.

We don't have particular preferences on where the methods are. Are you suggesting we put the methods behind a attribute (e.g. navigator.handwritingService.doSomething())?

Yes, this was one of the reasons we asked this question.

We were also a bit curious on three different tabs initiating multiple recognition contexts - is anything shared? (This question is based on the navigator layering)

(More comments based on the discussion with @atanassov to come in a bit.)

@atanassov
Copy link

During our May 2021 vf2f, @cynthia and myself did another pass at this review, thank you for all of the answers.

Regarding adding a direction hint to the recognizer - we found that to be a useful futureproofing feature and recommend that you do.

After going over the privacy & security questionnaire I am still not clear if the API exposes additional fingerprinting capabilities. With exposure of strokes, ordering of strokes and timing of strokes, I worry that models can be trained to easily recognize patterns for various disabilities. This will be a very unfortunate byproduct of this API. Is this something that you considered and could expand on?

@wacky6
Copy link
Author

wacky6 commented May 13, 2021

Hi, @cynthia

Writing direction
We'll add a direction hint to indicate the expected parsing / reading direction.

So that the recognizer for ("en" and "ar") can differentiate the following two outputs, for text "نشاط التدويل، W3C"

  • JS String: W3C, [Arabic characters]
  • JS String: [Arabic characters], W3C

Could you confirm this addresses your concern?

Navigator
Recognition contexts are isolated for each recognizer. Different tabs don't share the recognition context. But they may use the same recognition service on the OS (e.g. process).

Is there any documents on navigator layering?


Hi, @atanassov

The handwriting process looks like this:
User input --(1)--> Stroke Data --(2)--> Text

Websites can already collect handwriting and analyze them. All they need is some user input (step 1), and some analysis code. For example, ask user to draw on canvas, use PointerEvent to collect the drawing, then everything to a server for analysis (they don't have to use our API).

Our API is at step 2. It converts stroke data (represented with our proposed HandwritingStroke and HandwritingDrawing) to some text.

Websites can already analyze handwritings in JavaScript. Our API made this easier (call a method instead of supplying a bunch of JavaScript code) and more efficient (run in native code / accelerators). In short, our API isn't introducing new things that Web can't already do.

@cynthia
Copy link
Member

cynthia commented Aug 31, 2021

Thank you for your feedback. We've discussed this in a breakout and concluded that this proposal is good to move forward - we'll discuss further in the plenary and close if everyone agrees. Thanks for bringing this to our attention.

As for the navigator layering, we don't have any formal recommendations - we'll discuss this in the plenary and provide feedback afterwards.

@r12a
Copy link

r12a commented Aug 31, 2021

This being said, existing recognizers (those available on the market) don't support mixed scripts (e.g. english + arabic). They will recognize text as if the text is written in a single script (e.g. recognize arabic characters as english characters, and give less-ideal results).

I don't think we should try to solve the mixed script problem if the underlying implementations haven't solved it. Our solution may not work for them. Or, if the implementation is advanced, it doesn't care about whether we provide this information / hint).

Sure, text written in English rarely has Arabic text in it, but that's not true at all the other way around. Text written in Arabic and all the other languages that use RTL scripts will contain LTR Latin script text on a regular basis. Not only that, but they will also contain numbers, and those are written LTR within the RTL flow. Same goes for expressions, numeric ranges, etc. for some languages. For example, in Hebrew you'll write "Score: 82" as

Screenshot 2021-08-31 at 16 27 12

I don't think you'd want the text stored in memory to become "Score: 28".

Or how about: "No parking: 08:00 - 20:00". Will the text stored in memory indicate that you can't park during the day, or overnight – it depends on the direction in which the range is read, and that will depend on the rules of the language being used.

Note that WICG/handwriting-recognition#4 already raises some of these issues, but as yet has no response.

Sorry, but I don't buy that you don't have to consider how this would work if implementations don't currently enable handwriting recognition properly for large percentages of the people on the planet. Our mission is to make the World Wide Web accessible worldwide. I think some thought has to be given to how to address the needs of the currently underserved millions of potential users.

@r12a
Copy link

r12a commented Aug 31, 2021

Of course, if the recogniser recognises strokes and stores characters in the order they are written, then that may provide a solution, because someone writing "Score: 82" in Hebrew will write the 8 before the 2 (leaving a gap for it to fit). If the conversion of strokes to characters takes place after an input is completed, however, then mixed direction text will require parsing for direction changes.

Note, however, that in the former case, where strokes are converted on-the-fly, it's not straightforward either, since Arabic and Hebrew graphemes tend to be only half-written during the initial pass, and those graphemes are completed after the word is completed (eg. the top bar for scripts such as Devanagari).

@torgo
Copy link
Member

torgo commented Sep 16, 2021

Thanks for the very comprehensive privacy & security section in the explainer. We're basically fine with the design. Since this relies on the presence of a handwriting recognizer software component that raises some concerns about implementability - especially across lower spec devices and in open source efforts. There seems to be an issue regarding multi-stakeholder support as there's no documented support from other browser engines on Chrome Status - can you provide any feedback there? What is the trajectory for this spec after incubation in WICG? Where do you see this going?

@tomayac
Copy link

tomayac commented Sep 16, 2021

WebKit (https://lists.webkit.org/pipermail/webkit-dev/2021-March/031762.html) and Mozilla (mozilla/standards-positions#507) have been asked for their opinions, but without a response so far.

@atanassov atanassov added the Progress: propose closing we think it should be closed but are waiting on some feedback or consensus label Oct 26, 2021
@cynthia
Copy link
Member

cynthia commented Dec 7, 2021

The feedback @r12a wrote above we think is important, but beyond the scope of this review and ideally should be discussed on the group's repository. As noted earlier, we are happy to see this move forward. Thank you for bringing this to our attention.

(And please ping other stakeholders again when you have time!)

@cynthia cynthia closed this as completed Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing: Multi-stakeholder support Lack of multi-stakeholder support Progress: in progress Progress: propose closing we think it should be closed but are waiting on some feedback or consensus Review type: CG early review An early review of general direction from a Community Group Topic: Input Topic: native platform integration Features that enable web sites to integrate better with native platforms Venue: WICG
Projects
None yet
Development

No branches or pull requests

7 participants