Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially standardize window.find() #3539

Open
annevk opened this issue Mar 7, 2018 · 19 comments
Open

Potentially standardize window.find() #3539

annevk opened this issue Mar 7, 2018 · 19 comments
Labels
addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. interop Implementations are not interoperable with each other

Comments

@annevk
Copy link
Member

annevk commented Mar 7, 2018

See:

Related #2858.

@annevk annevk added addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. interop Implementations are not interoperable with each other labels Mar 7, 2018
@js-choi
Copy link

js-choi commented Mar 7, 2018

Good news.

Is the scope of this issue simply to standardize window.find’s existing behavior in Firefox, WebKit, and Chromium? How does its matching work? Are Unicode code points matched with any normalization? Does case folding occur; if so, how? Can paragraph breaks and line breaks be matched, and by what characters? Is there any fuzzy search (e.g. between straight quotes and curly quotes as some browsers do on some systems)?

The answer to all of these probably will be: “What do current browsers do? Let’s stick with that,” but it’d be good to be explicit about the goal. And there may be some platform inconsistencies, especially in fuzzy matching.

See also Charmod Norm, w3c/selection-api#37, whatwg/dom#431, tc39/proposal-intl-segmenter#17, #2424, the inactive String Search API the inactive FindText API, and the inactive RangeFinder API.

@fred-wang
Copy link
Contributor

For the record some browsers also implement a document.execCommand("FindString", ...) command.
https://w3c.github.io/editing/execCommand.html

@grantcv1
Copy link

I always find it distressing that counting the usage of public-facing websites is used to make decisions. In my experience, there are far more complex web applications with big companies and big governments that would not (should not) be included within these statistics.

window.find() is very much needed in editing style applications and there is a need for this feature (or a better alternative). Support for case folding, regular expressions, and other things that would help with a fuzzy search are really needed.

It seems that one possible effort to standardize this capability, the FindText API, (http://www.w3.org/TR/findtext/) has been discontinued :-(

@tilgovi
Copy link

tilgovi commented Jun 11, 2018

One way to accommodate some of the goals of FindText without requiring standardization to take a stance on algorithm would be to specify how window.find interacts with Symbol.search (or other relevant, well-known symbols).

@vmpstr
Copy link
Member

vmpstr commented Jun 12, 2020

I'm not sure if this should be a separate issue, but my proposal to start the process of standardizing window.find by standardizing some of the aspects of find-in-page commonly used first. For instance,

  1. Define terms like active match(?) vs potential match(?), meaning the thing that was found and highlighted vs the thing that could be found if the user or script continue searching for the same string
  2. Perhaps also define how find-in-page interacts with things like clipped out content, and opacity 0 content, etc.

By starting with definitions, I think we can start thinking about how to define the algorithm. However for some features, it might already be useful to reference definitions of find-in-page (e.g. https://drafts.csswg.org/css-scroll-anchoring/#anchor-priority-candidates 2nd candidate is "an element containing the current active selected match of the find-in-page user-agent algorithm" which could reference this)

As an aside, I put together a brief overview of behaviors of find-in-page in different browsers (Chrome, Firefox, Safari) to see the commonalities and differences in behaviors. The doc uses find-in-page dialog, not window.find though.

Does this seem like a good approach?

@domenic
Copy link
Member

domenic commented Jun 12, 2020

my proposal to start the process of standardizing window.find by standardizing some of the aspects of find-in-page commonly used first.

Interesting.

This falls into a gray area of web specs, of specifying UI. Generally we try to shy away from that, and only specify things which are observable from JavaScript. I believe nothing about find-in-page is observable, so we normally wouldn't specify it.

However, sometimes we bend this rule, when it's especially beneficial, and all the browsers are interested.

I guess I would ask what is the goal here, and for who. Are you trying to make things more predictable for web page authors? In what way, since find-in-page is not observable? Are you trying to make things easier for implementers?

If the goal is purely to work on a better spec for window.find, then I would probably treat that orthogonally to find-in-page...

@vmpstr
Copy link
Member

vmpstr commented Jun 12, 2020

I guess I would ask what is the goal here, and for who

Good question. The immediate benefit from having the definitions is for spec writers and implementers so that they can agree what is meant by terms like 'active match' (e.g. the scroll anchor spec I linked, and beforematch proposal; the latter would benefit from the algorithm specified as well since the timing of the event and timing of find-in-page scroll are dependent on each other).

I think the ultimate benefit of at least partially speccing the algorithm is for users to have a consistent experience across browsers (although I'm not sure how valuable it is, since I imagine users don't typically switch browsers very often). That is, you can see in the compat doc I linked that browsers tend to do different things in a number of situations. In some cases, none of the browser seem to do "the right thing". For instance, content clipped by overflow hidden can be found on the three browsers I tested. It is conceivable that the spec here would dictate what should and should not be found, if that makes sense.

As an aside, I assume that window.find essentially hooks into the find-in-page algorithm (maybe this is a wrong assumption), so any kind of specification for it is likely to be very similar. To put it differently, I think if window.find is specified and browsers update their implementations to match the spec, I suspect that they will also have to change the find-in-page behavior to simplify the code.

@domenic
Copy link
Member

domenic commented Jun 12, 2020

The immediate benefit from having the definitions is for spec writers and implementers so that they can agree what is meant by terms like 'active match' (e.g. the scroll anchor spec I linked, and beforematch proposal; the latter would benefit from the algorithm specified as well since the timing of the event and timing of find-in-page scroll are dependent on each other).

I definitely see the benefit there. That could probably be accomplished with a fairly minimal spec, that just hand-waves at how the feature works but builds around a skeleton of some <dfn>s like "active match" that other, more observable features can reference. I'm happy to support that much, at least.

I think the ultimate benefit of at least partially speccing the algorithm is for users to have a consistent experience across browsers (although I'm not sure how valuable it is, since I imagine users don't typically switch browsers very often). That is, you can see in the compat doc I linked that browsers tend to do different things in a number of situations. In some cases, none of the browser seem to do "the right thing". For instance, content clipped by overflow hidden can be found on the three browsers I tested. It is conceivable that the spec here would dictate what should and should not be found, if that makes sense.

I think you're right that this would be valuable for users, in that it would guide browsers toward doing "the right thing", where "the right thing" is what domain experts (HTML spec editors, CSS WG, i18n folks, and browser engineers) can collectively get together and agree upon. Maybe we wouldn't get total agreement, e.g. maybe one browser representative has a very different philosophical stance on what a "word" means, but that's fine. Any discussion at all would likely be an opportunity to improve things in this way.

In other words, since this isn't JS-developer-observable, the goal isn't to get total interop, but instead to get the other values that the standards process brings. And I suspect that even if not all browser engines want to spend to spend time on this, you'd be able to get good discussion from the rest of the web standards community, and from any interested web developers and users.

So, I'm sold that this is worth trying to specify.

As an aside, I assume that window.find essentially hooks into the find-in-page algorithm (maybe this is a wrong assumption), so any kind of specification for it is likely to be very similar. To put it differently, I think if window.find is specified and browsers update their implementations to match the spec, I suspect that they will also have to change the find-in-page behavior to simplify the code.

Well, but as long as the result of window.find is not observable from JS, it seems like the specification could just be "calling this function does something with the user interface generally related to finding things". Although, maybe it's observable from scroll offsets? I'm not sure.

@aphillips
Copy link
Contributor

Text search is a complex topic for reasons such as those called out in @js-choi's comment. Past attempts to write a spec at W3C failed to consider I18N basics early on and have foundered on that. The I18N WG (perhaps wisely?) shelved any attempt to work on it directly as part of Charmod-Norm by creating a separate document. Any group starting to work on this might want to have a look at string-search and to the issues we filed against FindText.

I think this is worth taking a stab at--it is possible to overcomplicate the problem and at long as judicious choices are made (and well-documented) I think it is possible to have a successful result in a finite amount of time.

@annevk
Copy link
Member Author

annevk commented Jun 14, 2020

@domenic it's pretty observable, no?

console.log(window.getSelection())
window.find("test");
console.log(window.getSelection())

@domenic
Copy link
Member

domenic commented Jun 15, 2020

Hmm, that appears to be a Firefox quirk where window.find() (and Ctrl+F!) actually affect window.getSelection(). That's not the case in other browsers.

@domenic
Copy link
Member

domenic commented Jun 15, 2020

For the record, I was testing something wrong; window.getSelection() is impacted by window.find() in Chrome too. http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=8206

@domenic
Copy link
Member

domenic commented Aug 7, 2020

For folks watching this thread, @vmpstr has put together an initial pull request describing find-in-page in #5770 (direct preview link). It's pretty basic and, I think, should be uncontroversial. But it might provide a good place to collect some of the notes or open issues here, e.g. we could expand it to link to https://w3c.github.io/string-search/#searching, and eventually try to define window.find() as triggering that feature.

domenic pushed a commit that referenced this issue Aug 11, 2020
This serves as helpful structure for work such as #3539, or potentially
integrating with https://github.com/WICG/scroll-to-text-fragment or
https://github.com/WICG/display-locking.
@xfq
Copy link
Contributor

xfq commented Aug 15, 2020

FWIW, there's a CSS issue about controlling whether an element is findable/searchable: w3c/csswg-drafts#3460

mfreed7 pushed a commit to mfreed7/html that referenced this issue Sep 11, 2020
@petelomax
Copy link

My gut instinct on this is that "find" is too generic and meaningless. Adding eg openFindWindow() or findTextOnPage() or highlight/selectTextOnPage() would be intuitively more distinct from querySelector() and friends, which "find" just isn't.

@domenic
Copy link
Member

domenic commented Sep 4, 2021

We don't get to choose the name; it's already in all browsers. This issue is just about writing a spec for it.

@mantou132
Copy link

mantou132 commented May 9, 2022

find just needs to return some Range that contains the specified text. Other processing such as highlighting should be left to the web developer, e.g: use custom highlight api

@hsivonen
Copy link
Member

@domenic it's pretty observable, no?

console.log(window.getSelection())
window.find("test");
console.log(window.getSelection())

It's rather unfortunate that what window.find() finds is Web-exposed when Gecko implements the search technically in a very different way from WebKit (forked to Blink), and the WebKit/Blink behavior depends on the UI language of the browser.

Specifically, Firefox operates on the Unicode Database level (in a language-independent way) and WebKit&Blink use collator-based search (with primary-level matching only) such that the collation data that is used is the CLDR search collation for the browser UI language.

As a collator implementor, I'm very skeptical of the technical merit of collator-based search compared to search implemented directly over the Unicode Database layer (possibly with hard-coded exceptions to try to reproduce the main effects of collator-based search). (When operating on the Unicode Database, you transform characters to other characters and match on the transformed stream of characters. When operating on collations, you perform a complex mapping from characters to collation units and then ignore everything but the primary weight in the collation unit and match on the primary weights. Even with fast computers of today, you can experience a performance difference by using cmd/ctrl-f on the HTML spec in Firefox and Chrome.) I also don't want to bring collator-based search into scope Gecko or ICU4X. See a URL text fragment issue.

@sideshowbarker
Copy link
Contributor

Given that — along with the core “highlight the active match and scroll into view” behavior — browser UIs also expose a count of the total matches for the current query, it’s imaginable that it might be useful to developers (and for testing scenarios too) to have an API which programmatically exposes that total match count to JavaScript code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. interop Implementations are not interoperable with each other
Development

No branches or pull requests