Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a "browser window has focus" algorithm that's true even while user types in URL bar #6211

Closed
jan-ivar opened this issue Dec 10, 2020 · 16 comments · Fixed by #8466
Closed

Comments

@jan-ivar
Copy link
Contributor

Several specs have attempted to roll their own "app is not in the background" criteria for features that would be creepy to activate (or in some cases resume) from background tabs, minimized windows/apps, maybe even from (partially) visible windows other than the one the user is currently engaging with:

  1. Broken foreground detection w3c/mediacapture-main#752 (my motivation)
  2. Ensure "Focused and Visible" addresses OS-level focus immersive-web/webxr#747
  3. Missing browsing-context focus/unfocus hooks in HTML spec. w3c/sensors#222

They've all crafted language that relies in some part on HTML's concept of keyboard focus, and they're all broken atm.

The has focus steps algorithm looked promising (once #6172 is merged), until I found all browsers except Safari return false if I put a cursor in the URL bar. This makes some sense for document.hasFocus() since the page temporarily looses keyboard focus to the URL bar.

But it doesn't make much sense to block camera/mic/sensor/vr access just because the cursor is in the URL bar. The 3 specs were instead probably looking for whether the user agent's presentation of a top-level document (including all related system widgets like the URL bar) has keyboard focus or at least user attention.

Do we want to attempt to add such an algorithm?

@jan-ivar
Copy link
Contributor Author

Renaming this issue since the term "visible and focused" is ambiguous.

It would also be nice if APIs that already rely on "transient activation" would get this assurance for free. #6212

@jan-ivar jan-ivar changed the title Need a "visible and focused" algorithm that's true even while user types in URL bar Need a "foreground detection" algorithm that's true even while user types in URL bar Dec 10, 2020
@domenic
Copy link
Member

domenic commented Dec 10, 2020

Unifying this seems like a good idea. I wonder to what extent implementations are unified.

It sounds like this is probably a property of top-level browsing contexts? (Although it might be convenient for specs to ask about individual Windows or Documents, e.g. to ensure that it returns false for non-fully-active Documents.)

I guess the main question is what invariants this has with regard to other already-specced primitives. For example:

@sideshowbarker
Copy link
Contributor

Any relation with https://w3c.github.io/page-visibility/#visibility-states ? Heck, maybe visibility state is what you're looking for?

Yeah, I am wondering the same thing.

But given that we’ve got three different editors who have each independently/coincidentally created solutions for this — then even if visibility state is what would actually solve the same problem for all of them, there would still be an issue that we’re in a situation where editors are simply unaware of visibility state.

And if so, that’d suggest maybe we should have something more in the HTML spec to help raise awareness about visibility state. (And also really about the page lifecycle stuff in general.)

@jan-ivar
Copy link
Contributor Author

Heck, maybe visibility state is what you're looking for?

On desktop, a page mostly obscured by other windows is "visible" if just a single pixel of its browser chrome shows. I don't think this would match a "When is a window in the foreground?" user poll.

In mediacapture, we tried to answer when users would be surprised by camera or microphone being turned on, and we conservatively said: from any app other than the one they're currently using. i.e. in focus. This lines up well with transient activation (modulo #6212) which in hindsight we should have required for getUserMedia like we do for getDisplayMedia, as well as permissions (which need someplace to hang prompts), and effective privacy indicators (which tend to reside in window chrome).

The same answer fit well with insertion of USB cameras & mics, which we treat like a key press (i.e. requires focus).

I notice other specs like https://w3c.github.io/sensors/#concepts-can-expose-sensor-readings list both visibilityState and focus as criteria, so I gather they arrived at a similar conclusion? @rwaldron

It sounds like this is probably a property of top-level browsing contexts?

I think so. Something like:

The only thing that gives me pause are mobile devices that display two apps side-by-side.

@annevk
Copy link
Member

annevk commented Dec 14, 2020

I don't think you want to put this on the TLBC because then you have to check if your document is the TLBC's active document. It seems easier if you can just check if your document or settings object is "good to go".

@domenic
Copy link
Member

domenic commented Dec 16, 2020

I don't think you want to put this on the TLBC because then you have to check if your document is the TLBC's active document. It seems easier if you can just check if your document or settings object is "good to go".

I think the state is on TLBC (like system focus is), but I agree from specs it should be easy to call it given a document or settings object.

On desktop, a page mostly obscured by other windows is "visible" if just a single pixel of its browser chrome shows. I don't think this would match a "When is a window in the foreground?" user poll.

Alright, I'm more or less convinced. So in terms of spec restrictions, I think we have:

  • If the TLBC has system focus, then the TLBC is in the foreground. (But not vice-versa; it could be that the URL bar has focus.)

  • If the TLBC is in the foreground, then its active document must be "visible". (But not vice-versa; it could be that the document is "visible" but mostly occluded.)

  • Only one TLBC can be in the foreground at a given time? (I think mobile devices displaying two apps side-by-side also have a focus model, so we can treat them like desktop.)

Putting these together, I'd suggest spec text something like:

At a given time, either zero or one of a user agent's top-level browsing contexts is in the foreground. Exactly which top-level browsing context is considered in the foreground is partially implementation-dependent, and generally relies on platform interaction paradigms. However, it must obey the following constraints:

  • If a top-level browsing context has system focus, then it is in the foreground.
  • A top-level browsing context can only be in the foreground if its active document's visibility state is visible.

Note: these implications do not go in the other direction. For example, it's possible for no top-level browsing context to have system focus, but for one still to be in the foreground; this happens in some implementations when the URL bar or other user agent UI is focused. Similarly, it's possible for a document to be visible, but for its top-level browsing context to not be in the foreground, for example if the contents of the document are occluded.

A document is in the foreground if its browsing context is a top-level browsing context, and that top-level browsing context is in the foreground.

We should also update the definition of system focus to make it clear only zero or one TLBCs can have system focus.

@annevk
Copy link
Member

annevk commented Dec 17, 2020

Only one TLBC can be in the foreground at a given time?

What if you put two windows side-by-side? I guess the note tries to account for this, but I think we need to make that a little bit more explicit. E.g., only consider user agent UI directly related to the TLBC in question as @jan-ivar did. I also think that might make "foreground" a bit fraud and something like "has indirect system focus" and "has direct system focus" might be better.

@domenic
Copy link
Member

domenic commented Dec 17, 2020

What if you put two windows side-by-side?

Then, only one of them should be considered foreground still, at least according to the specs that currently exist, so I think it's fine?

only consider user agent UI directly related to the TLBC in question

I'm hesitant to write too much spec text about user agent UI, myself.

@annevk
Copy link
Member

annevk commented Dec 18, 2020

What specification defines foreground? I thought you were trying to define it?

@jan-ivar
Copy link
Contributor Author

jan-ivar commented Dec 18, 2020

@annevk I think he's just referring to the three specs in the OP and their needs. Maybe "foreground" is the wrong word, but so was "visible and focused" when applied to the document or its viewport. In layterms I think all three were trying to say "visible and focused" applied to the browser window, but we don't know what to call that in specs.

@q-alex-zhao
Copy link

q-alex-zhao commented Jan 12, 2021

Would a new window.hasFocus() serve this need? Since document.hasFocus() seems to be only applicable to the document.

@annevk
Copy link
Member

annevk commented Jan 12, 2021

We're not looking to introduce a new public API here, just a low-level primitive that specifications can hook into.

@jan-ivar
Copy link
Contributor Author

jan-ivar commented Dec 17, 2021

I'm renaming this issue from "foreground detection" to "browser window has focus" which is the part not covered by #7238.

I'm not sure we'll be able to reconcile differences between desktop and mobile for all specs, so it may be up to individual specs whether they want a browser focus requirement, a browser visibility requirement or both.

@jan-ivar jan-ivar changed the title Need a "foreground detection" algorithm that's true even while user types in URL bar Need a "browser window has focus" algorithm that's true even while user types in URL bar Dec 17, 2021
@jan-ivar
Copy link
Contributor Author

How about something like:

"A top-level browsing context has proximate system focus when it has system focus or when user agent widgets directly related to it can receive keyboard input channeled from the operating system.

Note: Proximate system focus is lost when a browser window loses focus."

@jan-ivar
Copy link
Contributor Author

jan-ivar commented Nov 2, 2022

No takers. How about s/proximate system focus/tentative system focus/ or s/proximate system focus/user attention/ ?

@annevk
Copy link
Member

annevk commented Nov 2, 2022

I like the idea of "user attention" and then define what we expect to minimally be true for that to be the case (e.g., visible, foreground, focus).

annevk pushed a commit that referenced this issue Apr 28, 2023
As well as document's "fully active descendant of a top-level traversible with user attention" for callers.

Fixes #6211.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

5 participants