-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add browsingContext.locateNodes
command to find nodes within a page
#150
Comments
I cannot add labels, but there should be: |
This comment was marked as off-topic.
This comment was marked as off-topic.
What would it take to implement https://w3c.github.io/webdriver/#find-element and the other WebDriver classic endpoints with |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
We can probably do most of it as it was originally designed to use that. The things that we need to be aware of is keeping a map of elements found so we can handle |
I agree this isn't a high priority; it's effectively a higher level API over The shadow DOM stuff is different, it still makes sense to have a way to get inside the shadow dom of a Node. But that could perhaps just be part of the serialization of Element (i.e. if we return an Element give the serialization a Getting clarity on what the model is for interacting with platform types, particuarly DOM nodes, seems like the high priority work item here. |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> topic: Discussion: need in browsingContext.findElement command<AutomatedTester> github: https://github.com//issues/150 <jgraham> If people have concerns about that model it would help to post them on the issue <jgraham> I think we're likely to get a PR soon <simonstewart> That looks like it's based on https://chromedevtools.github.io/devtools-protocol/tot/Runtime/#method-addBinding, right? <AutomatedTester> sadym: apparently in Selenium, we have a number of ways to find element but these can be done via script eval <AutomatedTester> .... do we really need a find element moving forward <jgraham> simonstewart: it's a different approach to the same problem, you do callFunction(function, args=[callback]) and whenever the remote end calls callback it generates an event with the arguments <simonstewart> q+ <jgraham> q+ <AutomatedTester> automatedtester: What would happen on things like closed shadow dom ? <AutomatedTester> sadym: It probably doesn't cover the shadow dom when it's closed <AutomatedTester> jgraham: I disagree, we can return a shadow root value to the element and then just go down that way <jgraham> q- <AutomatedTester> ack simonstewart <AutomatedTester> simonstewart: the thing that findelement gives us separation of concerns <AutomatedTester> ... and in clients we can get people building their own locator types <AutomatedTester> ... e.g. FindElementByARIA <AutomatedTester> ... separating the interface is generally good design and we can have people return element ID to be passed into other things later on <AutomatedTester> q? <jgraham> I note again that closed shadow roots should work with the script execution approach :) <simonstewart> I was looking for an example that might work :) <simonstewart> Point noted, jgraham! |
+1 this can be closed. Client can do it with either |
Reopening this, since Selenium and WebdriverIO projects consider this a high priority for BiDi. Do we have concrete examples of things that are blocked in those clients due to the absence of this feature? Generally it seems like one could implement most kinds of selectors using a preload script that runs in a client-specific sandbox e.g. const locatorStrategies = new Map();
function addLocatorStrategy(name, fn) {
locatorStrategies.set(name, fn)
}
function findElement(type, selector) {
return _findElement(type, selector, false)
}
function findElements(type, selector) {
return _findElement(type, selector, true)
}
function _findElement(type, selector, all) {
if (locatorStrategies.has(type)) {
return locatorStrategies.get(type)(selector, all);
}
throw new Error(`Unsupported type $(type)`)
}
addLocatorStrategy("css", (selector, all) => {
let selectorFunction = all ? querySelector : querySelectorAll;
return selectorFunction(selector);
}) and then to use it you just run |
The WebDriver protocol is effectively a tree navigation and node interaction protocol. The primary reason for this is that the WebDriver protocol is used more widely than in just browser automation. It forms the basis of automation tools such as Appium, WinAppDriver, and numerous other products. Using pure JS for this would preclude this kind of usage, which would be to the detriment of testing tooling as a whole. While this may be out of scope for a standard aimed at the Web, deliberately harming progress in other tooling seems short-sighted. Secondarily, intermediate nodes can take calls to Without this command, there is a real risk that tools such as Appium or intermediate nodes will attempt to fingerprint incoming JS to Do The Right Thing. That makes that tooling brittle, and prevents local ends from improving or iterating their implementations. Thirdly, without script pinning, using pure JS will cause slower test runs as the amount of network traffic would be substantially higher. With script pinning, sending the initial JS (even minimised) is inefficient and introduces more lag to the protocol. In the view of the Selenium project, |
Additionally, things like the accessibility tree aren't available to JS implementations, and may never be. Without a |
We also want visibility into shadow trees, even if the the shadow root is closed and can't be pierced by JS |
If we move everything to be within |
It's already possible to do that with script in BiDi (we serialize shadow roots in a way that allows them to be passed in to future scripts, so it's already possible to implement the equivalent of "find element[s] from shadow root"). |
I am entirely against assuming everything can be shunted into
I'm in no way ready to formalize my ideas yet in a spec PR, and am entirely unqualified to write the spec prose that would be required to describe the following, but as a strawman argument, I'm envisioning a scheme something like the pseudo-CDDL spec definition below: Command Type
Return Type
If In general, a
The The idea would be that one would pass an array of locators which the remote end would use to successively locate elements. The implementation would loop through the list of locators. The first locator returns a list of nodes. The second locator is evaluated with the each of the nodes in the list returned from the first locator as the context node, which returns another list of nodes. The third locator is evaluated with each of the nodes in second as the context node, and so on, until a final list of nodes is reached, each of which meets the conditions of all of the locators iteratively from 0 to length - 1. Each of the
While any (most? all?) of the above locators may be implementable solely using script, I believe there is value in providing them via the protocol directly. Error handling for the command would, of course, need to explicitly point out at which point in the locator chain the location failed. Returning an empty array for no nodes found matching the criteria is perfectly normal and acceptable. Additionally, it allows other users to extend the command by providing their own implementation of Yes, this seems an incredibly complicated solution, but I can't think of an easier way to meet my three stated goals for having a "find elements" command in the protocol. EDIT: Added |
In Puppeteer we use channels to get access to accessibility tree by other means if the selector is based on A11y tree information.
I wonder how non-web-browser requirements have to be reflected in the protocol? E.g., would desktop apps support CSS locators? how should they deal with realms and serialization of DOM objects? Do they event have DOM? Is it one of the primary use cases for WebDriver BiDi? |
@OrKoN, The short answer is, “they don’t; they can ignore the parts of the protocol they don’t or can’t use,” but @shs96c said it far more eloquently than I could in a previous reply , but I’ll quote the relevant bit here:
Make no mistake, shoving everything into JS would be doing deliberate harm to that progress. Making the WG aware of that now would just mean such a decision would make that harm malicious. |
👋 Appium maintainer here, just want to +1 what @shs96c, @jimevans, @AutomatedTester have said. While respecting the fat that the WebDriver BiDi working group's primary focus is and should be the web platform, the fact is that it's a good thing for there to be standardization in automation protocols more generally, even beyond the web. (I have a whole argument to this effect if you want to get into the guts of a larger doc I wrote several years ago). UI automation of app platforms is going to be a thing whether we like it or not, because it is useful. The only question really is whether UI automation for various app platforms is going to be unified or fragmented. The success and longevity of Appium is testament to the power of unification vs fragmentation. The follow-on question is which standard should be the basis for that unification. What better candidate than the standard pioneered by the most truly open platform of all---the web? There will of course always be web/browser-specific aspects of W3C standards that may not apply to other platforms (like mobile native app platforms). So far, it's been easy enough to simply ignore or slightly update the semantics of such aspects. But I would caution heavily against an approach which would key the protocol too tightly to the specifics of a given web technology (JavaScript). By the same token I would appreciate if the BiDi spec (which admittedly I have not looked at myself in great detail yet) had the same eye to extensibility as the WebDriver spec (allowing e.g. new types of locator strategies). I'm happy to work more closely with this group to provide a voice from an external "downstream" ecosystem, if that would be at all useful. |
browsingContext.findElement
commandbrowsingContext.findElement
command
@sadym-chromium I've added a @whimboo I believe adding such a parameter to this command would allow the "implicit wait" to be implemented client-side, and make the capability obsolete in the BiDi case. |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> Topic: Add support for �FindElement(s)� commands from WebDriver classic<AutomatedTester> github: https://github.com//issues/150 <JimEvans> q+ <orkon> q+ <AutomatedTester> ack next <AutomatedTester> Jim Evans: in the github issue. In the comment I wrote is a strawman proposal for how we can do it <AutomatedTester> ... there are some locator strategies that won't be able to be done via the javascript execution like the a11y discussion earlier <jgraham> Agenda: https://www.w3.org/wiki/WebDriver/2023-TPAC#Agenda <AutomatedTester> ... having the findElement command also helps with not web platform implementations of webdriver (appium et al) <littledan> q+ <AutomatedTester> ... I think it is important that we do this. I know in puppeteer they move things down to css or psuedo css selectors <shs96c> q+ <AutomatedTester> ... I would like to be able to do things without having to rely on JavaScript to make this happen <AutomatedTester> ack next <AutomatedTester> orkon: I wanted to comment on the proposal. We are in favour in having this command but we have some concerns <AutomatedTester> ... e.g. how do combine commands for finding elements <jgraham> q+ to note that the scope of the proposal is far larger than we have in classic <JimEvans> q+ <AutomatedTester> ... we need to discuss the details. Also allow people to return an iterator etc and shadow roots <AutomatedTester> ack next <AutomatedTester> littledan: We in bloomberg we have a UI testing tool using a custom protocol <AutomatedTester> ... the bloomberg terminal is based off chromium <AutomatedTester> ... we have been using a CDP interface on top of that <AutomatedTester> ... this is done via commands on the domain instead of JS <AutomatedTester> ... we are in favour of a find element command <AutomatedTester> ... and if we get this in then we can see about adopting this quicker <AutomatedTester> ack next <AutomatedTester> ... its very similar to the appium use case <AutomatedTester> shs96c: from the selenium point of view, this is very crucial here <littledan> I also wanted to mention: We don't have any particular concerns about CSS selectors--we actually already implement a CSS selector system for querying the UI <AutomatedTester> ... we also need findElements but with the ability to limit the amount of elements returned <AutomatedTester> ... the ability to handle compound selectors would be very useful <AutomatedTester> ... from Jim Evans proposal we can do some really useful ideas looks really nice and I support it <AutomatedTester> q? <AutomatedTester> ack next <Zakim> jgraham, you wanted to note that the scope of the proposal is far larger than we have in classic <AutomatedTester> jgraham: looking at this comment I would want to start making this map onto classic <AutomatedTester> ... as this proposal is larger than classic <shs96c> q+ <AutomatedTester> ... the easiest approach is using a single locator and then making sure they map across to classic and then new items be a separate discussion e.g. shadowroot <AutomatedTester> ... we need to make sure we work for the classic use cases and then move onto the longer discussion on compound as they can, not elegantly, be handled in the client at the moment <AutomatedTester> ack next <AutomatedTester> Jim Evans: one of the reasons for this proposal from puppeteer and looking at playwright notion of what a locator looks like and being able to chain locators <AutomatedTester> ... the next reason for the complexity was to also minimise the amount of round trips <AutomatedTester> ... unlike cdp based tools we have to be able to handle internet latency <AutomatedTester> ... and minimising the round trips <sadym> q+ <jgraham> q+ <AutomatedTester> ... I haven't been through all the edge cases but the command is definitely important <orkon> q+ <jgraham> zakim, close the queue <Zakim> ok, jgraham, the speaker queue is closed <AutomatedTester> ack next <AutomatedTester> shs96c: implement things to get classic to work on bidi and I suggest that we get the text to move to innerText whch could be a breaking change <jgraham> +1 to that. <AutomatedTester> ... clients could potentially solve that via javascript to make sure that we dont break backwards compat <AutomatedTester> ... and we need a JS locator <AutomatedTester> ack next <AutomatedTester> sadym: Another scenario we want to want for an element to wait for it to appear/disappear <AutomatedTester> ... this is why it's in JS at the moment so that it can wait for elements to reach a state required <AutomatedTester> ... do we want this to happen too? <AutomatedTester> shs96c: I think that is out of scope for right now and we should deal with it in a separate issue <AutomatedTester> jgraham: people could poll or create a mutation observer <AutomatedTester> shs96c: classic just says "it's here" or not where puppeteer/playwright can wait for a page load and an element loads <AutomatedTester> whimboo: classic does have implicit waiting <AutomatedTester> ack next <AutomatedTester> jgraham: I agree the compound locators have real uses but we should focus on find element so we can get quickly get consensus <AutomatedTester> ... i'm worried about adhoc compounding of commands will not work well across the whole spec even though they make sense on their own <AutomatedTester> ack next <orkon> an example of p-selectors (inspired by CSS extensions and deep selectors proposals) combining various strategy (css + text + aria + xpath + custom locator with light and shadow dom descendants): <orkon> `div.my-cls ::-p-text(Test) >>> ::-p-aria(MyButton) >>>> ::-p-xpath(//div) >>> ::-p-vue(MyComponent)` <AutomatedTester> orkon: I just wanted to point an out an example using the p selectors from puppeteer <AutomatedTester> q? <AutomatedTester> zakim, open the queue <Zakim> ok, AutomatedTester, the speaker queue is open |
Consensus from discussion at TPAC: Implementation of something similar to the strawman proposal above, limited at first in scope to the location strategies used in WebDriver Classic. Other strategies may be added at a later time. |
Sorry for butting in late. I'm just starting to get myself acquainted with WebDriver/BiDi and wanted to drop a quick note as I'm seeing a trend. As you rightly noted above doing everything in So I'd like to encourage - for the purpose of designing WebDriver/BiDi - to always assume that script evaluation is not available. It's great to have it as an option, and there is nothing wrong with TL;DR: wholehearted agreement with @jimevans and @jlipps on |
This PR implements the browsingContext.locateNodes command, which allows the user to locate nodes in the DOM using means other than returning them via JavaScript. This also allows for future expansion of location strategies (e.g., via the accessibility tree, etc.) should they become available. Fixes #150.
browsingContext.findElement
commandbrowsingContext.locateNodes
command to find nodes within a page
browsingContext.findElement
can be implemented usingscript.evaluate
orscript.callFunction
and DOM API, as Puppeteer does. So there is no need in yet another syntax sugar command.DOM.querySelector
can be at least convenient.The text was updated successfully, but these errors were encountered: