Skip to content

dialog initial focus, a proposal

Scott O'Hara edited this page Mar 17, 2022 · 10 revisions

Introduction

This proposal attempts to build consensus regarding initial focus in the <dialog> element. It strives to address requirements and concerns raised by community members to date.

There have been two main approaches being debated:

  • Keep status quo: set initial focus on the first focusable element inside the dialog.
  • Change, to set initial focus directly on the <dialog>. PR 4184 contains proposed spec text along these lines.

This document has been authored and reviewed by community members who were initially of contrasting opinions. Areas that need further work are marked with the text TODO.

Contributors:

  • Scott O'Hara (Microsoft) / @scottaohara
  • Aaron Leventhal (Google) / @aleventhal
  • Domenic Denicola (Google) / @domenic
  • James Teh (Mozilla) / @jcsteh
  • Joey Arhar (Google)
  • Mason Freed (Google) / @mfreed7
  • Matt King (Meta) / @mcking65

Status of <dialog> implementations

  • Chrome/Edge Stable — implemented
  • Safari 15.4 — implemented
  • Firefox 98 — implemented

All of the above implementations set the initial focus to the first focusable descendant inside the dialog. This is implemented as a depth-first search of elements within the dialog subtree. The first element that is focusable receives the initial focus.

Problem Statement

There are legitimate concerns raised by community members regarding all proposed approaches. The specific concerns are described in the next several sections.

Although there is no perfect solution for an automatic initial focus, this document attempts to find a satisfactory resolution that addresses key concerns raised.

Concerns raised regarding keeping the status quo

Community members have raised these concerns about setting initial focus to the first focusable descendant, as implementations currently do:

  1. The first focusable element could be anything, anywhere, which can create unexpected and less than optimal initial-focus placements. Information can easily be skipped over & missed by screen reader users if the first focusable element comes after other non-focusable content.
  2. Scrollable dialogs, (e.g., terms of service) might contain a link in the middle of the longform content which would become its first focusable element. This then would result in scrolling to focus the link text, leading to a confusing context. This will be problematic for all users.
  3. Focusing the first focusable (e.g., tabindex=-1) as opposed to the first tabbable (tabindex >=0) is problematic. For example, there could be a radio group or tab panel, where the inactive controls are focusable but not active, and therefore, the wrong first item would be focused. This would be especially problematic for HTML radio buttons, as it could make it easy to accidentally change the selection by pressing spacebar or an arrow key.
  4. What should be focused in non-modal dialogs such as the Facebook notifications dialog, or other dialogs that contain multiple focusable elements where there is no one clear element that "should" receive focus.
  5. Zoom / screen magnifier users do not get good context when focus goes to the bottom, because the all the preceding context is scrolled or positioned out of view.
  6. Screen reader users accustomed to using the Esc key to exit forms mode may accidently auto-dismiss dialogs if the first focusable element is a form control.

Analysis of proposed change to set initial focus to <dialog>

Here's a chart listing pros and cons of changing the status quo and setting initial focus on the dialog, as proposed in PR 4184 as an attempt to address the concerns with the current approach of using the first focusable descendant.

These pros and cons have been collected via discussions with a number of active community members.

Pros of initial focus on <dialog> Cons for initial focus on <dialog>
Better context for screen reader, screen magnifier users, high 200 to 400% browser zoom users - consistently starting at the top of the dialog for sequential reading. Helps ensure that any content prior to the first focusable element is not missed by screen readers. Inconsistent with behavior that some users are accustomed to concerning native OS implementations of dialog windows, and custom web dialogs which have had inconsistent implementations by web authors.
Better at handling the situation where nothing is focusable within the dialog Inconsistent with host environment behaviors (OS platform / browser dialogs)
Don't have to deal with the case where focus lands in an awkward spot (subjective and situational), e.g., a link in the middle of a complex dialog, or after a large amount of explanatory text that precedes the first form control. Many authors are likely to neglect specifying initial focus, even when the spec recommends it, and a default behavior of focusing the dialog would be unexpected for most/many users. Probably not really realistic to change such a basic user expectation and not receive blowback
Less convenient in some cases, e.g., when the first field of the dialog is a textfield.

Proposal

There is no single solution that handles all dialogs nicely, acts as users expect, and provides consistency with host environments.

However, here is a middle ground that can provide a good experience in most cases, and addresses concerns with the original auto focus inside approach.

Initial dialog focus logic

  1. If non-modal, and the author specifies to prevent initial focus, then exit focus logic.
    The author can prevent initial focus for a non-modal dialog opened via dialog.show({ preventInitialFocus: true })
    Explanation: authors may wish to prevent initial focus for a dialog where the user should not be interrupted from their current activities.
    • Use case #1: a customer agent chat window that opens when the user is idle, and stays open until explicitly closed.
    • Use case #2, a cookie consent dialog.
      Note: dialog.open = true can currently be used to open a modeless dialog without initial focus, but this should be considered unintentional and may change. See the issue "Dialog should be better behaved on misuse, probably", which discusses direct manipulation of the open attribute.
  2. Otherwise, focus any focusable element with the autofocus attribute in the dialog subtree (inclusive of the dialog element itself), if provided.
  3. Otherwise, if nothing inside the dialog is tabbable, focus the dialog itself, so that at least something gets focused.
  4. Otherwise, focus the first tabbable descendant of the <dialog>. Importantly, this is based on tabbability, not focusability, aka tabindex=-1 does not count (otherwise unexpected items would get focus, e.g., first radio button in a group, regardless of its checked state).
    • To make #4 satisfactory:
      • User scrollable elements should be in tab order. Firefox already does this. If Chrome and Webkit counted those in the first tabbable item algorithm that would mitigate Jamie's concern about the scrollable terms of service. Chrome and Webkit currently do not do this, which means a mouse is required to scroll user-scrollable regions unless the author remembers tabindex. Chrome is looking at addressing this (there were some problems with certain custom components and backwards compatibility — if it continues to be a problem we may need to specifically change iii to "first tabbable or focusable scrollable area" — hopefully we don't need this change).
      • Expose which elements are scrollable to ATs (HTML AAM).
    • Because the algorithm chooses the first tabbable descendant of the dialog, if the <dialog> itself is tabbable (because it has tabindex="0" or is scrollable), it would not gain initial focus, but would still be in the tab order of the page. Note: it is recommended that the author does not make the <dialog> itself scrollable or give it tabindex="0".

Improving keyboard accessibility of non-modal dialogs

TODO:

Improving AT Behavior

To ensure that ATs provide good context when a dialog is opened, and a descendant receives focus:

  1. Provide a better description of a heuristic that ATs (specifically screen readers and magnifiers) should do to provide good context when dialog opens. For instance, there should be spec language to indicate that if the first focusable element is not the dialog itself, then AT could announce the content that precedes the element. A rough description of the heuristic is to read up to the initial focus. aria-describedby/aria-description on a <dialog>/role=dialog element should override this, allowing the author to manually specify the initial contextual announcement.
    • When autofocus goes to a descendant, should the same algorithm be applied? The algorithm probably needs to avoid being too verbose in that case, e.g., avoid reading the contents of a listbox that comes between the start of the dialog and the focused element.
  2. Develop a manual test suite to test ATs against.
  3. Consult with ATs to improve behavior where there are gaps found via the test suite.
  4. Investigate whether special platform API events indicating that a dialog has opened is necessary/helpful for ATs. This is likely necessary to help at least with modeless dialog awareness. For IA2, there is already a ready-made event for this, EVENT_SYSTEM_DIALOGSTART. For browser/AT combinations that do not implement this, the experience in an unfocused non-modal dialog should gracefully degrade to be like any other dynamically inserted, unfocused content in a page. Ensure that ATs provide a reasonable announcement when the dialog is loaded at page start, e.g., a modeless cookie acceptance dialog.
  5. TODO: this area needs work. (Note that there are no AT requirements in ARIA specs) Because screen reader users will need a way to easily navigate back to non-modal dialog(s), open non-modal dialogs could potentially be added to landmark navigation features. We can run this idea by the ARIA working group, and if it is found to be a reasonable request, add some non-normative language, e.g., "Assistive technologies may enable users to quickly navigate to elements with role dialog." Also, as a normative requirement for UAs: user agents SHOULD expose elements with role dialog as navigational landmarks. User agents MAY enable users to quickly navigate to elements with role dialog." (this text may belong in the ARIA spec under dialog in fact, and it probably only applies to non-modal dialogs, since for modal dialogs everything else in the page should be inert anyway). Alternatively, AT could provide specific functionality to navigate to non-modal dialogs and not include them as landmarks. However, this would likely be best solved by UA deciding upon keyboard commands to cycle between the open non-modal dialog(s) and the primary web page. Note: this assumes dialogs do not automatically close when users navigate away; authors who wish to implement auto close behavior may instead use the popup attribute, if this feature becomes standard. Alternatively, the author may implement this behavior with their own scripting.
  6. Is there any need to expose to ATs whether the dialog is modal or not? Currently browser just makes content outside of a modal dialog unavailable to AT. HTML AAM

Improving conformance requirements and examples

The current <dialog> specification is not helpful to authors in providing guidance on when and how to use <dialog>, which can potentially lead to a lot of the pain and confusion here. Essentially all it says is

The dialog element represents a part of an application that a user interacts with to perform a task, for example a dialog box, inspector, or window.

For example, some people have used <dialog> for context menus in the past, which have completely different focus requirements.

Here is an early draft of some ideas for beefing up this text, which could help encourage the right behavior.

Dialog draft text

The dialog element represents a transitory part of an application, in the form of a small window ("dialog box"), which the user interacts with to perform a task or gather information. Once the user interaction is complete, the dialog can be automatically closed by the application, or manually closed by the user.

Especially for modal dialogs, which are a familiar pattern across all types of applications, authors should work to ensure that dialogs in their web applications behave in a way that is familiar to users of non-web applications.

Discussion topic: need more information concerning non-modal (modeless) dialogs, expected behaviors for desktop and how they may be represented on mobile web. GitHub issue 7707

NOTE: As with all HTML elements, it is not conforming to use the dialog element when attempting to represent another type of control. For example, context menus, tooltips, and popup listboxes are not dialog boxes, so abusing the dialog element to implement these patterns is incorrect.

An important part of user-facing dialog behavior is the placement of initial focus. The [dialog focusing steps] attempt to pick a good candidate for initial focus when a dialog is shown, but might not be a substitute for authors carefully thinking through the correct choice to match user expectations for a specific dialog. As such, authors should use the autofocus attribute on the descendant element of the dialog that the user will expect to immediately interact with after the dialog opens. If there is no such element, then authors should use the autofocus attribute on the dialog element itself.

Discussion topic: Consider conformance guidance for authors to explicitly declare autofocus for dialog element

EXAMPLE:

In the following example, a dialog is used for editing the details of a product in an inventory management web application.

<dialog>
  <label>Product Number <input type="text" readonly></label>
  <label>Product Name <input type="text" autofocus></label>
</dialog>

If the autofocus attribute was not present, the Product Number field would have been focused by the dialog focusing steps. Although that is reasonable behavior, the author determined that the more relevant field to focus was the Product Name field, as the Product Number field is readonly and expects no user input. So, the author used autofocus to override the default.

Even if the author wants to focus the Product Number field by default, they are best off explicitly specifying that by using autofocus on that input element. This makes the intent obvious to future readers of the code, and ensures the code stays robust in the face of future updates. (For example, if another developer added a close button, and positioned it in the DOM before the Product Number field).

Another important aspect of user behavior is whether dialogs are scrollable or not. In some cases, overflow (and thus scrollability) cannot be avoided, e.g., when it is caused by the user's high text zoom settings. But in general, scrollable dialogs are not expected by users, and authors should attempt to avoid them. In particular, authors must not include large blocks of text directly in dialog elements, as this is likely to cause the dialog element itself to overflow. Instead, a scrollable container element should be used, inside the dialog, to contain this text. Discussion topic: Consider conformance guidance for dialogs which contain large blocks of text

EXAMPLE:

The following terms of service dialog respects the above requirements.

<dialog style="height: 80vh;">
  <div style="overflow: auto; height: 60vh;" autofocus>
    <p>By placing an order via this Web site on the first day of the fourth month of the year 2010 Anno Domini, you agree to grant Us a non-transferable option to claim, for now and for ever more, your immortal soul.</p>
    <p> Should We wish to exercise this option, you agree to surrender your immortal soul, and any claim you may have on it, within 5 (five) working days of receiving written notification from  this site or one of its duly authorized minions.</p>
    <!-- ... etc., with many more <p> elements ... -->
  </div>
  <form method="dialog">
    <button type="submit" value="agree">Agree</button>
    <button type="submit" value="disagree">Disagree</button>
  </form>
</dialog>
<!-- TODO (domenic): probably flexbox or grid would be better than 80vh/60vh... -->

Note how the dialog focusing steps would have picked the scrollable div element by default, but similarly to the previous example, we have placed autofocus on the div so as to be more explicit and robust against future changes.

In contrast, if the p elements expressing the terms of service did not have such a wrapper div element, then the dialog itself would become scrollable, violating the above conformance requirements. Furthermore, in the absence of any autofocus attribute, such a non-conformant markup pattern would have tripped up the dialog focusing steps' default behavior, and caused focus to jump to the Agree button, which is a bad user experience.

TODO: discuss modeless dialogs, both with and without preventInitialFocus.

Use case analysis

The autofocus attribute allows total control over what is initially focused. Thus, ultimately, any behavior is possible. What the "initial dialog focus logic" section is trying to create is a good default for when authors do not explicitly think about the desired behavior, and do not heed the spec's advice to always specify autofocus.

To determine whether we've picked a good default, we need to analyze expected patterns and see how well the default serves us in them. Here is our analysis of dialog patterns we expect to see--both ones that are good and conformant to the spec's authoring requirements, and ones that are bad and not conformant (but we want to give the best behavior possible anyway).

Alert dialog

Example: like window.alert()

Desired result: focuses the OK button so that pressing enter closes the dialog Our algorithm: matches the desired result

Confirm dialog

Example: like window.confirm()

Desired result: focuses the "default button", which might be OK or might be Cancel, depending on the specific dialog Our algorithm: focuses the first-in-DOM of the two buttons, which is not necessarily the right one. We need the author to use autofocus="" to communicate the right default button.

Edit/prompt dialog

Example: in Chrome, click on the star icon in the URL bar, and click "Add bookmark"

Desired result: focus the name <input type=text> field Our algorithm: matches desired result

Scrollable edit/prompt dialog

Example: any edit-type dialog, but with enough fields such that when the user turns text zoom up high, it causes vertical or horizontal overflow.

Desired result: focuses the first input field Our algorithm: matches desired result

Terms of service dialog

i.e., with scrollable <div> for the terms

Example: see example in conformance requirements draft above

Desired result: focuses the scrollable terms <div> Our algorithm: matches desired result

Scrollable terms of service dialog (non-conformant)

i.e., the terms are embedded directly as <p>s into the <dialog>, with OK/Cancel as siblings of the <p>s.

Example: see discussion in conformance requirements draft above

Desired result: focus the <dialog> Our algorithm: focuses the OK button

Media dialog

Example: lightbox for viewing an image or carousel of images

Desired result: probably the same as our algorithm? Our algorithm: focuses the first control (e.g., previous image, share image, <video>'s controls, etc.) if there is one, or the dialog if there is no focusable control but instead just the media

Teaching UI

Example: a popup teaching UI that is represented as a box with an arrow pointing at the item it provides information for.  It has a heading, some placeholder text, and two call to action buttons.

Some other examples can be found on Open UI

Desired result/our result: similar to confirm dialog, although some cases only have one button so don't suffer the ambiguity problem

So far we have not thought of a way to give the desired result for both "Scrollable edit dialog" and "Scrollable terms of service dialog". Our hunch (lacking data) is that "scrollable edit dialog" is more common than "scrollable terms of service dialog", especially given the prevalence of non-default text zoom levels. Combined with the fact that the conformant "terms of service dialog" is a nice alternative to the non-conformant "scrollable terms of service dialog", we believe that we should favor the "scrollable edit dialog" case when setting defaults.

Addendum: differences between web and native dialogs

  • Web dialogs tend to be more text heavy or have more highly structured content (lists, tables, etc.). Web dialogs range from highly informative content, to ads, image galleries, etc. They also can and will contain more utilitarian form based content, but even this content can be coupled with text heavy instructions or description, which can easily be skipped over by AT, depending on the initially focused element.
  • Screen reader users are more likely to navigate with virtual buffer commands. (James Teh notes that NVDA users were very upset when NVDA tried to make focus mode the default inside a dialog).
  • Dialogs on the web have links, where native dialogs rarely do.