Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add browsingContext.locateNodes command to find nodes within a page #150

Closed
sadym-chromium opened this issue Nov 25, 2021 · 24 comments · Fixed by #547
Closed

Add browsingContext.locateNodes command to find nodes within a page #150

sadym-chromium opened this issue Nov 25, 2021 · 24 comments · Fixed by #547
Labels
enhancement New feature or request module-browsingContext Browsing Context module

Comments

@sadym-chromium
Copy link
Contributor

sadym-chromium commented Nov 25, 2021

  • Opinion 1 (lazy): browsingContext.findElement can be implemented using script.evaluate or script.callFunction and DOM API, as Puppeteer does. So there is no need in yet another syntax sugar command.
  • Opinion 2 (ergonomic): I assume finding element(s) is one of the most used features, and having something like DOM.querySelector can be at least convenient.
@sadym-chromium
Copy link
Contributor Author

I cannot add labels, but there should be: browsingContext and question

@foolip foolip added module-browsingContext Browsing Context module question Further information is requested labels Nov 26, 2021
@foolip

This comment was marked as off-topic.

@foolip
Copy link
Member

foolip commented Nov 26, 2021

What would it take to implement https://w3c.github.io/webdriver/#find-element and the other WebDriver classic endpoints with script.evaluate or script.callFunction? At a glance it looks trivial, but things rarely are.

@AutomatedTester

This comment was marked as off-topic.

@sadym-chromium

This comment was marked as off-topic.

@AutomatedTester
Copy link

What would it take to implement https://w3c.github.io/webdriver/#find-element and the other WebDriver classic endpoints with script.evaluate or script.callFunction? At a glance it looks trivial, but things rarely are.

We can probably do most of it as it was originally designed to use that. The things that we need to be aware of is keeping a map of elements found so we can handle StaleElementException well. We need to handle the case of finding elements from within a closed Shadow Root which I doubt script.evaluate will be able to do.

@jgraham
Copy link
Member

jgraham commented Nov 26, 2021

I agree this isn't a high priority; it's effectively a higher level API over script.evaluate and querySelector, and Puppeteer implements find element in terms of a script, so it's not like there's some existing API we need to replicate.

The shadow DOM stuff is different, it still makes sense to have a way to get inside the shadow dom of a Node. But that could perhaps just be part of the serialization of Element (i.e. if we return an Element give the serialization a shadowRoot property that is either null or a reference to the shadow root).

Getting clarity on what the model is for interacting with platform types, particuarly DOM nodes, seems like the high priority work item here.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Discussion: need in browsingContext.findElement command.

The full IRC log of that discussion <AutomatedTester> topic: Discussion: need in browsingContext.findElement command
<AutomatedTester> github: https://github.com//issues/150
<jgraham> If people have concerns about that model it would help to post them on the issue
<jgraham> I think we're likely to get a PR soon
<simonstewart> That looks like it's based on https://chromedevtools.github.io/devtools-protocol/tot/Runtime/#method-addBinding, right?
<AutomatedTester> sadym: apparently in Selenium, we have a number of ways to find element but these can be done via script eval
<AutomatedTester> .... do we really need a find element moving forward
<jgraham> simonstewart: it's a different approach to the same problem, you do callFunction(function, args=[callback]) and whenever the remote end calls callback it generates an event with the arguments
<simonstewart> q+
<jgraham> q+
<AutomatedTester> automatedtester: What would happen on things like closed shadow dom ?
<AutomatedTester> sadym: It probably doesn't cover the shadow dom when it's closed
<AutomatedTester> jgraham: I disagree, we can return a shadow root value to the element and then just go down that way
<jgraham> q-
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart: the thing that findelement gives us separation of concerns
<AutomatedTester> ... and in clients we can get people building their own locator types
<AutomatedTester> ... e.g. FindElementByARIA
<AutomatedTester> ... separating the interface is generally good design and we can have people return element ID to be passed into other things later on
<AutomatedTester> q?
<jgraham> I note again that closed shadow roots should work with the script execution approach :)
<simonstewart> I was looking for an example that might work :)
<simonstewart> Point noted, jgraham!

@whimboo
Copy link
Contributor

whimboo commented Feb 9, 2023

See also the discussion on #342 and the decision to leave the implementation up to the client by eg. calling evaluate.

@jgraham I think that we can close this issue?

@thiagowfx
Copy link
Member

+1 this can be closed. Client can do it with either script.evaluate or script.callFunction.

x-ref: GoogleChromeLabs/chromium-bidi#337

@thiagowfx thiagowfx closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2023
@jgraham jgraham reopened this Jun 26, 2023
@jgraham
Copy link
Member

jgraham commented Jun 26, 2023

Reopening this, since Selenium and WebdriverIO projects consider this a high priority for BiDi.

Do we have concrete examples of things that are blocked in those clients due to the absence of this feature? Generally it seems like one could implement most kinds of selectors using a preload script that runs in a client-specific sandbox e.g.

const locatorStrategies = new Map();

function addLocatorStrategy(name, fn) {
  locatorStrategies.set(name, fn)
}

function findElement(type, selector) {
  return _findElement(type, selector, false)
}

function findElements(type, selector) {
  return _findElement(type, selector, true)
}

function _findElement(type, selector, all) {
  if (locatorStrategies.has(type)) {
    return locatorStrategies.get(type)(selector, all);
  }
  throw new Error(`Unsupported type $(type)`)
}

addLocatorStrategy("css", (selector, all) => {
  let selectorFunction = all ? querySelector : querySelectorAll;
  return selectorFunction(selector);
})

and then to use it you just run script.callFunction("(type, selector) => findElement(type, selector)", args=[type, selector], sandbox="webdriver-impl-sandbox"). Obviously that's not quite as easy as running browsingContext.findElement(type, selector) but it does provide the kind of one-shot extensibility of locator strategies that people have previously requested for classic WebDriver (just add more locators types to your preload script). Are there locator strategies that can't be implemented in terms of js functions, or where such an implementation is prohibitively difficult?

@whimboo whimboo added the needs-discussion Issues to be discussed by the working group label Jun 26, 2023
@shs96c
Copy link

shs96c commented Jul 4, 2023

The WebDriver protocol is effectively a tree navigation and node interaction protocol. Find Element and Find Element From Element provide that first piece, and we need similar functionality in Bidi.

The primary reason for this is that the WebDriver protocol is used more widely than in just browser automation. It forms the basis of automation tools such as Appium, WinAppDriver, and numerous other products. Using pure JS for this would preclude this kind of usage, which would be to the detriment of testing tooling as a whole. While this may be out of scope for a standard aimed at the Web, deliberately harming progress in other tooling seems short-sighted.

Secondarily, intermediate nodes can take calls to Find Element and transform them as required. For example, the Selenium Server takes calls to finding elements by name and converts them to CSS selectors. Using pure JS for this would preclude this kind of decomposition and reformulation.

Without this command, there is a real risk that tools such as Appium or intermediate nodes will attempt to fingerprint incoming JS to Do The Right Thing. That makes that tooling brittle, and prevents local ends from improving or iterating their implementations.

Thirdly, without script pinning, using pure JS will cause slower test runs as the amount of network traffic would be substantially higher. With script pinning, sending the initial JS (even minimised) is inefficient and introduces more lag to the protocol.

In the view of the Selenium project, findElement and a related findElementFromElement (or find element in the elemtn scope) are cornerstones of the protocol, and are absolutely vital for adoption.

@shs96c
Copy link

shs96c commented Jul 4, 2023

Additionally, things like the accessibility tree aren't available to JS implementations, and may never be. Without a find element command, there's no way to extend Bidi to support this kind of element location.

@shs96c
Copy link

shs96c commented Jul 4, 2023

We also want visibility into shadow trees, even if the the shadow root is closed and can't be pierced by JS

@AutomatedTester
Copy link

If we move everything to be within script.callFunction we are running the risk of having to effectively moving all of the Selenium code into preloaded scripts. While this makes browser vendors lives easier it puts a lot of work onto client bindings to make things work and limits who can make clients. The nice thing about WebDriver classic is that it simplifies who can make clients and they can have their own opinions. In a everything is a callFunction world people need to build the correct things before they can even worry about their opinionated framework
The other issue is the scalability. In preloading every page with JS to do simple tasks we are going to be sending a lot to the browser on session load. This could make the session start times quite slow as they load things up before a navigation occurs. This will impact cloud providers but could really impact those trying to build their own grid infrastructure with underpowered machines which QA departments are likely to have.

@jgraham
Copy link
Member

jgraham commented Jul 5, 2023

We also want visibility into shadow trees, even if the the shadow root is closed and can't be pierced by JS

It's already possible to do that with script in BiDi (we serialize shadow roots in a way that allows them to be passed in to future scripts, so it's already possible to implement the equivalent of "find element[s] from shadow root").

@jimevans
Copy link
Collaborator

jimevans commented Aug 30, 2023

I am entirely against assuming everything can be shunted into script.callFunction or script.evaluate, and am unlikely to be convinced to change my mind on that subject. I note this for reasons of "reaching consensus" on spec matters. I've been playing with an idea, the explicit goals of which are as follows:

  • Minimize round-trips from the local end to the remote end for locating elements
  • Enable implementation of WebDriver Classic's element finding commands using this mechanism
  • Embrace the flexibility allowed by non-WebDriver-based frameworks location mechanisms natively (cf. Playwright) in the BiDi protocol

I'm in no way ready to formalize my ideas yet in a spec PR, and am entirely unqualified to write the spec prose that would be required to describe the following, but as a strawman argument, I'm envisioning a scheme something like the pseudo-CDDL spec definition below:

Command Type

browsingContext.locateNodes = {
  method: "browsingContext.locateNodes",
  params: browsingContext.LocateNodesParameters
}

browsingContext.LocateNodesParameters = {
  context: browsingContext.BrowsingContext,
  locators: [ + browsingContext.Locator ],
  ? startNode: script.RemoteNodeValue,
  ? waitTimeout: js-uint
}

browsingContext.Locator = (
  browsingContext.CssLocator /
  browsingContext.XPathLocator /
  browsingContext.TextLocator /
  browsingContext.JavaScriptLocator /
  browsingContext.FrameLocator /
  browsingContext.ShadowRootLocator /
  browsingContext.AccessibleRoleLocator /
  browsingContext.AccessibleNameLocator /
  browsingContext.NthLocator /
  browsingContext.AndLocator /
  browsingContext.OrLocator
)

Return Type

browsingContext.LocateNodesResult = {
  context: browsingContext.BrowsingContext,
  nodes: [ * script.RemoteNodeValue ].
}

If startNode is omitted or null, the documentElement node of the specified browsingContext is used as the starting node.

In general, a browsingContext.Locator would generally look like:

{
  "type": text,
  "value": *** <something goes here> ***
}

The type property identifies the type of locator. For the value property, I am unsure what would go in there. In some cases (CSS selectors, XPath), text would be sufficient, but in others, like for JavaScript, something more complex is likely to be required.

The idea would be that one would pass an array of locators which the remote end would use to successively locate elements. The implementation would loop through the list of locators. The first locator returns a list of nodes. The second locator is evaluated with the each of the nodes in the list returned from the first locator as the context node, which returns another list of nodes. The third locator is evaluated with each of the nodes in second as the context node, and so on, until a final list of nodes is reached, each of which meets the conditions of all of the locators iteratively from 0 to length - 1.

Each of the Locator types will obviously need to be defined, and the list above is not exhaustive. An extension mechanism might be nice, but I haven't figured out how to make that work conceptually. As for the ones listed above, my thoughts are as follows:

  • CssLocator and XPathLocator are self-explanatory
  • TextLocator would find an element by its text content; there may well need to be options for whether using visible text only, like InnerText or including all text, like textContent, and whether to do a full or substring match, but in any case, the definition should be defined by these web primitives
  • JavaScriptLocator takes a function definition that takes an element as an argument and returns an array of elements
  • FrameLocator returns the documentElement node of a frame/iframe element
  • ShadowRootLocator returns the equivalent of the shadowRoot property of the RemoteNodeValue
  • AccessibleRoleLocator and AccessibleNameLocator would query the accessibility tree by role or name, respectively
  • NthLocator returns the (0-based) nth value in the returned list of nodes
  • AndLocator and OrLocator accept another browserContext.Locator, applying either an "and" or "or" condition to the returned nodes are either an intersection or a union, respectively, of the nodes returned by each locator.

While any (most? all?) of the above locators may be implementable solely using script, I believe there is value in providing them via the protocol directly. Error handling for the command would, of course, need to explicitly point out at which point in the locator chain the location failed. Returning an empty array for no nodes found matching the criteria is perfectly normal and acceptable.

Additionally, it allows other users to extend the command by providing their own implementation of browsingContext.Locator for their own remote end interpretation. As an example, a remote end that allows automation of iOS applications could create a locator type to locate an element using the iOS class chain, passing a structure with { "type": "iosclasschain", "value": "<whatever an iOS class chain query looks like>" }.

Yes, this seems an incredibly complicated solution, but I can't think of an easier way to meet my three stated goals for having a "find elements" command in the protocol.

EDIT: Added waitTimeout parameter., representing the time, in milliseconds, to wait for the location attempt to find at least one element matching the specified locators.

@OrKoN
Copy link
Contributor

OrKoN commented Aug 31, 2023

Additionally, things like the accessibility tree aren't available to JS implementations, and may never be. Without a find element command, there's no way to extend Bidi to support this kind of element location.

In Puppeteer we use channels to get access to accessibility tree by other means if the selector is based on A11y tree information.

Additionally, it allows other users to extend the command by providing their own implementation of browsingContext.Locator for their own remote end interpretation. As an example, a remote end that allows automation of iOS applications could create a locator type to locate an element using the iOS class chain, passing a structure with { "type": "iosclasschain", "value": "" }.

I wonder how non-web-browser requirements have to be reflected in the protocol? E.g., would desktop apps support CSS locators? how should they deal with realms and serialization of DOM objects? Do they event have DOM? Is it one of the primary use cases for WebDriver BiDi?

@jimevans
Copy link
Collaborator

jimevans commented Aug 31, 2023

I wonder how non-web-browser requirements have to be reflected in the protocol? E.g., would desktop apps support CSS locators? how should they deal with realms and serialization of DOM objects? Do they event have DOM? Is it one of the primary use cases for WebDriver BiDi?

@OrKoN, The short answer is, “they don’t; they can ignore the parts of the protocol they don’t or can’t use,” but @shs96c said it far more eloquently than I could in a previous reply , but I’ll quote the relevant bit here:

The primary reason for this is that the WebDriver protocol is used more widely than in just browser automation. It forms the basis of automation tools such as Appium, WinAppDriver, and numerous other products. Using pure JS for this would preclude this kind of usage, which would be to the detriment of testing tooling as a whole. While this may be out of scope for a standard aimed at the Web, deliberately harming progress in other tooling seems short-sighted.

Make no mistake, shoving everything into JS would be doing deliberate harm to that progress. Making the WG aware of that now would just mean such a decision would make that harm malicious.

@jlipps
Copy link

jlipps commented Aug 31, 2023

👋 Appium maintainer here, just want to +1 what @shs96c, @jimevans, @AutomatedTester have said. While respecting the fat that the WebDriver BiDi working group's primary focus is and should be the web platform, the fact is that it's a good thing for there to be standardization in automation protocols more generally, even beyond the web. (I have a whole argument to this effect if you want to get into the guts of a larger doc I wrote several years ago).

UI automation of app platforms is going to be a thing whether we like it or not, because it is useful. The only question really is whether UI automation for various app platforms is going to be unified or fragmented. The success and longevity of Appium is testament to the power of unification vs fragmentation. The follow-on question is which standard should be the basis for that unification. What better candidate than the standard pioneered by the most truly open platform of all---the web?

There will of course always be web/browser-specific aspects of W3C standards that may not apply to other platforms (like mobile native app platforms). So far, it's been easy enough to simply ignore or slightly update the semantics of such aspects. But I would caution heavily against an approach which would key the protocol too tightly to the specifics of a given web technology (JavaScript). By the same token I would appreciate if the BiDi spec (which admittedly I have not looked at myself in great detail yet) had the same eye to extensibility as the WebDriver spec (allowing e.g. new types of locator strategies).

I'm happy to work more closely with this group to provide a voice from an external "downstream" ecosystem, if that would be at all useful.

@mathiasbynens mathiasbynens changed the title Discussion: need in browsingContext.findElement command Discussion: need for browsingContext.findElement command Sep 4, 2023
@jimevans
Copy link
Collaborator

@sadym-chromium I've added a waitTimeout parameter to the command parameters. Would that suffice for your purposes for brosingContext.waitElement?

@whimboo I believe adding such a parameter to this command would allow the "implicit wait" to be implemented client-side, and make the capability obsolete in the BiDi case.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Add support for �FindElement(s)� commands from WebDriver classic.

The full IRC log of that discussion <AutomatedTester> Topic: Add support for �FindElement(s)� commands from WebDriver classic
<AutomatedTester> github: https://github.com//issues/150
<JimEvans> q+
<orkon> q+
<AutomatedTester> ack next
<AutomatedTester> Jim Evans: in the github issue. In the comment I wrote is a strawman proposal for how we can do it
<AutomatedTester> ... there are some locator strategies that won't be able to be done via the javascript execution like the a11y discussion earlier
<jgraham> Agenda: https://www.w3.org/wiki/WebDriver/2023-TPAC#Agenda
<AutomatedTester> ... having the findElement command also helps with not web platform implementations of webdriver (appium et al)
<littledan> q+
<AutomatedTester> ... I think it is important that we do this. I know in puppeteer they move things down to css or psuedo css selectors
<shs96c> q+
<AutomatedTester> ... I would like to be able to do things without having to rely on JavaScript to make this happen
<AutomatedTester> ack next
<AutomatedTester> orkon: I wanted to comment on the proposal. We are in favour in having this command but we have some concerns
<AutomatedTester> ... e.g. how do combine commands for finding elements
<jgraham> q+ to note that the scope of the proposal is far larger than we have in classic
<JimEvans> q+
<AutomatedTester> ... we need to discuss the details. Also allow people to return an iterator etc and shadow roots
<AutomatedTester> ack next
<AutomatedTester> littledan: We in bloomberg we have a UI testing tool using a custom protocol
<AutomatedTester> ... the bloomberg terminal is based off chromium
<AutomatedTester> ... we have been using a CDP interface on top of that
<AutomatedTester> ... this is done via commands on the domain instead of JS
<AutomatedTester> ... we are in favour of a find element command
<AutomatedTester> ... and if we get this in then we can see about adopting this quicker
<AutomatedTester> ack next
<AutomatedTester> ... its very similar to the appium use case
<AutomatedTester> shs96c: from the selenium point of view, this is very crucial here
<littledan> I also wanted to mention: We don't have any particular concerns about CSS selectors--we actually already implement a CSS selector system for querying the UI
<AutomatedTester> ... we also need findElements but with the ability to limit the amount of elements returned
<AutomatedTester> ... the ability to handle compound selectors would be very useful
<AutomatedTester> ... from Jim Evans proposal we can do some really useful ideas looks really nice and I support it
<AutomatedTester> q?
<AutomatedTester> ack next
<Zakim> jgraham, you wanted to note that the scope of the proposal is far larger than we have in classic
<AutomatedTester> jgraham: looking at this comment I would want to start making this map onto classic
<AutomatedTester> ... as this proposal is larger than classic
<shs96c> q+
<AutomatedTester> ... the easiest approach is using a single locator and then making sure they map across to classic and then new items be a separate discussion e.g. shadowroot
<AutomatedTester> ... we need to make sure we work for the classic use cases and then move onto the longer discussion on compound as they can, not elegantly, be handled in the client at the moment
<AutomatedTester> ack next
<AutomatedTester> Jim Evans: one of the reasons for this proposal from puppeteer and looking at playwright notion of what a locator looks like and being able to chain locators
<AutomatedTester> ... the next reason for the complexity was to also minimise the amount of round trips
<AutomatedTester> ... unlike cdp based tools we have to be able to handle internet latency
<AutomatedTester> ... and minimising the round trips
<sadym> q+
<jgraham> q+
<AutomatedTester> ... I haven't been through all the edge cases but the command is definitely important
<orkon> q+
<jgraham> zakim, close the queue
<Zakim> ok, jgraham, the speaker queue is closed
<AutomatedTester> ack next
<AutomatedTester> shs96c: implement things to get classic to work on bidi and I suggest that we get the text to move to innerText whch could be a breaking change
<jgraham> +1 to that.
<AutomatedTester> ... clients could potentially solve that via javascript to make sure that we dont break backwards compat
<AutomatedTester> ... and we need a JS locator
<AutomatedTester> ack next
<AutomatedTester> sadym: Another scenario we want to want for an element to wait for it to appear/disappear
<AutomatedTester> ... this is why it's in JS at the moment so that it can wait for elements to reach a state required
<AutomatedTester> ... do we want this to happen too?
<AutomatedTester> shs96c: I think that is out of scope for right now and we should deal with it in a separate issue
<AutomatedTester> jgraham: people could poll or create a mutation observer
<AutomatedTester> shs96c: classic just says "it's here" or not where puppeteer/playwright can wait for a page load and an element loads
<AutomatedTester> whimboo: classic does have implicit waiting
<AutomatedTester> ack next
<AutomatedTester> jgraham: I agree the compound locators have real uses but we should focus on find element so we can get quickly get consensus
<AutomatedTester> ... i'm worried about adhoc compounding of commands will not work well across the whole spec even though they make sense on their own
<AutomatedTester> ack next
<orkon> an example of p-selectors (inspired by CSS extensions and deep selectors proposals) combining various strategy (css + text + aria + xpath + custom locator with light and shadow dom descendants):
<orkon> `div.my-cls ::-p-text(Test) >>> ::-p-aria(MyButton) >>>> ::-p-xpath(//div) >>> ::-p-vue(MyComponent)`
<AutomatedTester> orkon: I just wanted to point an out an example using the p selectors from puppeteer
<AutomatedTester> q?
<AutomatedTester> zakim, open the queue
<Zakim> ok, AutomatedTester, the speaker queue is open

@jimevans
Copy link
Collaborator

Consensus from discussion at TPAC: Implementation of something similar to the strawman proposal above, limited at first in scope to the location strategies used in WebDriver Classic. Other strategies may be added at a later time.

@pipobscure
Copy link

pipobscure commented Sep 15, 2023

I am entirely against assuming everything can be shunted into script.callFunction or script.evaluate, and am unlikely to be convinced to change my mind on that subject. I note this for reasons of "reaching consensus" on spec matters.

If we move everything to be within script.callFunction we are running the risk of having to effectively moving all of the Selenium code into preloaded scripts.

Sorry for butting in late. I'm just starting to get myself acquainted with WebDriver/BiDi and wanted to drop a quick note as I'm seeing a trend.

As you rightly noted above doing everything in script.callFunction or script.evaluate could present problems. One problem is that this leaves us without the ability to test for scenarios where script execution is not permitted for some reason. (The reasons vary and include I'm testing smooth degradation of capabilities and have JS turned off and This is a secure application that has everything frozen etc.)

So I'd like to encourage - for the purpose of designing WebDriver/BiDi - to always assume that script evaluation is not available. It's great to have it as an option, and there is nothing wrong with script.evaluate in principle, but its availability should never be used as an argument against a feature being needed. (Just my €0.02)

TL;DR: wholehearted agreement with @jimevans and @jlipps on script.evaluate and general automation

jimevans added a commit that referenced this issue Nov 2, 2023
This PR implements the browsingContext.locateNodes command, which allows the user to locate nodes in the DOM using means other than returning them via JavaScript. This also allows for future expansion of location strategies (e.g., via the accessibility tree, etc.) should they become available.

Fixes #150.
@whimboo whimboo changed the title Discussion: need for browsingContext.findElement command Add browsingContext.locateNodes command to find nodes within a page Nov 2, 2023
@whimboo whimboo added enhancement New feature or request and removed question Further information is requested needs-discussion Issues to be discussed by the working group labels Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module-browsingContext Browsing Context module
Projects
None yet
Development

Successfully merging a pull request may close this issue.