Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need hook for when a browsing context gains or loses focus #2711

Open
tobie opened this issue May 26, 2017 · 13 comments
Open

Need hook for when a browsing context gains or loses focus #2711

tobie opened this issue May 26, 2017 · 13 comments

Comments

@tobie
Copy link
Contributor

tobie commented May 26, 2017

This is useful for the Generic Sensor API which needs to stop providing new sensor readings to a browsing context which has lost focus in order to prevent skimming attacks (e.g. inferring a password entered in a different browsing context from device movements captured by a gyroscope).

Reacting to lost focus might already possible to do right now, but it wasn't immediately obvious how to. Apologies if I missed something blatant.

My understanding is WebBluetooth and NFC have similar requirements. Pinging @jyasskin and @kenchris respectively.

@tobie tobie changed the title Need hook for when a browsing context gains or looses focus Need hook for when a browsing context gains or loses focus May 26, 2017
@jyasskin
Copy link
Member

NFC tries to use this in https://w3c.github.io/web-nfc/#handling-window-visibility-and-focus.
WebAuthn has an issue for it in w3c/webauthn#316, which @jcjones is handling.

@domenic
Copy link
Member

domenic commented Sep 6, 2017

We'd very much appreciate a pull request or a sketch of the spec modifications to be done here that would work for your use cases.

@jcjones
Copy link
Contributor

jcjones commented Nov 15, 2017

Could we make the hook using language from the pointerlock spec's methods? They write:

Pointer lock must not succeed unless the target is in the active document of a browsing context which is (or has an ancestor browsing context which is) in focus by a window which is in focus by the operating system's window manager. The target element and its browsing context need not be in focus.

This could be used to define something beyond just active document to capture that it's the document being actively manipulated by the user.

That'd be the preferred situation for Web Authentication, I think - we don't want an active document in a background window to start an authentication session.

@jcjones
Copy link
Contributor

jcjones commented Nov 15, 2017

Note @mikewest - the above might be a useful distinction for CredMan, too.

@mikewest
Copy link
Member

This sounds like something credential management could indeed use.

@tobie
Copy link
Contributor Author

tobie commented Nov 15, 2017

I'm no longer editing the Generic Sensor spec, so I'm not sure what the current requirements are.

@jcjones
Copy link
Contributor

jcjones commented Nov 16, 2017

Taking a stab at @domenic's request:

We'd very much appreciate a pull request or a sketch of the spec modifications to be done here that would work for your use cases.

I'm working mostly from Page Visibility's visibility states...

====
Add to Document a readonly attribute foregroundState which is an enum ForegroundState:

enum ForegroundState {
    "foreground",
    "background"
};

To Document also add an EventHandler onforegroundstatechange that is an event handler for foregroundState:

partial interface Document {
    readonly attribute ForegroundState foregroundState;
             attribute EventHandler    onforegroundstatechange;
};

Upon getting foregroundState, we'd run the algorithm from Pointerlock to determine if the window manager has this window in focus.

====

WebAuthn would then check foregroundState on the way into its methods, failing if not "foreground", and would register for onforegroundstatechange and cancel if it changes to not be "foreground" during the execution of our parallel algorithms.

@domenic
Copy link
Member

domenic commented Nov 17, 2017

Hmm, I thought the request was about a specification-level hook, not a public API that would require implementers to start exposing new stuff to JavaScript?

@jcjones
Copy link
Contributor

jcjones commented Nov 17, 2017

That is true, I'm just not well versed on how to do that. Would it just be declaring a definition?

@jcjones
Copy link
Contributor

jcjones commented Nov 17, 2017

Something like:

====
6.4.X Determining if the Document is in the Foreground of the Window Manager

To determine if a Document is in the Foreground of the Window Manager, run these steps:

{{ The algorithm from Pointerlock }}

====

In this case, Web Authentication would probably have some language like, "Monitor whether the Document is in the Foreground of the Window Manager and reject the Promise .... if not". Would that ... work?

@domenic
Copy link
Member

domenic commented Nov 17, 2017

Thanks for putting in the effort, I think I can start to help from this. A couple issues with what you've got so far:

First, it would help if you outlined what part of what you linked to you were thinking of including. It's a lot of text, and I don't see a real algorithm in there. Is it just the sentence "the active document of a browsing context which is (or has an ancestor browsing context which is) in focus by a window which is in focus by the operating system's window manager."? That seems not great, given how it relies on undefined concepts like window manager. Also, it implies that background tabs are focused (since e.g. my Firefox window currently has focus from the OS's window manager, despite only one of 20 tabs being actually focused). I think we instead want to say something about how user agents can define the concept of the currently-focused top-level browsing context, and then give some explanation about how this ties into tabbed interfaces, window managers, popup windows, etc.

Second, we can't just use magic like "monitor whether something is true". Think about how you'd write this in software. You'd need to find the point at which something becomes true or false, and then invoke a function. That's the kind of hook we're talking about here. So we'd need to create steps like "When the user agent changes its choice of currently-focused top-level browsing context, run the following steps..." where those steps loop over all documents (browsing contexts? Windows? Which is more useful for your use cases?) that got un-focused and all documents that got newly-focused and runs some hook. We'll need a good name for that hook that ties into the above concept naming.

BTW I'm currently thinking of this as a new section underneath https://html.spec.whatwg.org/multipage/interaction.html#focus, probably at the bottom, defining and describing this new concept of "Top-level browsing context focus". Although I'm wondering if maybe we shouldn't overload the word "focus" and instead use some new word like "choice" or "foreground" or something.

@jcjones
Copy link
Contributor

jcjones commented Nov 17, 2017

That all makes sense to me; spec-fu is still a foreign language to me, and I'm not well-versed on how other similar concepts work in HTML, so thanks!

So we'd probably want to iterate over all browsing contexts that change state and run a hook, I think? I'm obviously very green at this, but it seems more ambiguous to do this to all windows and then have to filter down to the actual documents.

Re: naming, while avoiding 'window manager' .... I don't have anything right now, but I'll ponder on it.

Thanks for dealing with my flailing with good humor, Domenic!

@tobie
Copy link
Contributor Author

tobie commented Nov 17, 2017

Giving a bit more context around our generic-sensor use case. As mentioned, we want to be able to stop the sensor from reporting data as soon as the user is entering data on anything else than the web page of the active sensor. Again, this is because sensors like gyroscopes can be used relatively effectively to steal data entered elsewhere (e.g. passwords, credit card numbers, etc.). See for example this 2011 article on this topic. In practice, this means nested iframes (e.g. when using an embedded third party to carry out payment), other tabs or windows (again when using third party payment solutions), browser chrome or browser extensions (e.g. when relying on the user agent's password manager or a third party password manager), or other applications altogether.

It's worth noting that there's currently no consistency in how browsers report pages loosing focus to other applications. That is, on some browsers the web page is considered to no longer have focus, while on other browsers this is not the case and the page continues to be considered as focused despite the browser having moved to the background. I'm also not sure that all browsers consistently unfocus pages when the user focuses on browser chrome and/or browser extensions. That prevents the sensor from being deactivated in such cases, which has security consequences.

So ideally, that focus involves both the page and app being in focus (but not privileged chrome) should be explicit in the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants
@mikewest @tobie @jyasskin @jcjones @domenic @annevk and others