-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a module for executing script #63
Comments
Relevent use cases:
Best practice is that scripts which don't need to directly interact with the page should be put in a sandbox, which gives them access to the unmodified DOM (c.f.
I think using CDP as a touchstone, For communicating back from the script to the document, CDP seems to have two options:
By contrast WebExtensions content scripts work quite like sandboxed CDP scripts, but also have a Extension-specific API exposed in the global scope, including a |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> topic: Script execution<AutomatedTester> github: https://github.com//issues/63 <AutomatedTester> jgraham: After we have something for navigation the next item on the list is script execution <AutomatedTester> ... as it can open up the rest of what we can do <AutomatedTester> ... what are the requirements around this. <AutomatedTester> q+ <AutomatedTester> jgraham: I have written up in the issue some of the relevant things <AutomatedTester> ... e.g. selectors against the DOM <AutomatedTester> ... and being able to overload APIs on the page e.g. Random giving a response that isn't random <AutomatedTester> ... and so on <AutomatedTester> ... the next point is there are 2 points here on where we can execute script <AutomatedTester> ... this can be on the page/worker <AutomatedTester> ... and the 2nd one is sandboxing <AutomatedTester> ... so that clients can install there own helpers that doesn't pollute the page <simonstewart> q+ <AutomatedTester> ... we want it to be a lot like executeAsyncScript from webdriver but with more features <AutomatedTester> ... and do we want to have it like using `arguments` or just execute as is <brwalder> q+ <AutomatedTester> ... and the sandboxing seems really cool <AutomatedTester> ... and do we want to have something like webextensions where we limit the surface and allow people to inject using something like a `postMessage` <AutomatedTester> ack AutomatedTester <AutomatedTester> automatedtester: One of the things that might be good is that Jason Leyba, a few years ago, suggesting moving webdriver execute script to being more promised based <jgraham> s/limit the surface/have a dedicated WebDriver DOM API surface/ <AutomatedTester> jgraham: the way that CDP works now is that it has an `awaitPromise` type event <AutomatedTester> ... and we can do something like that <AutomatedTester> q? <AutomatedTester> ack simonstewart <AutomatedTester> simonstewart: re: arguments... as long as we can reformulate webdriver on top then its fine. Ideally we want to update things in the simplest ways <AutomatedTester> jgraham: [describes <AutomatedTester> ... how call function works in CDP with arguments] <AutomatedTester> ack brwalder <AutomatedTester> brwalder: re: CDP something like `addBinding` or webextensions <AutomatedTester> ... then we would have more network roundtrips <AutomatedTester> ... where if we wnt for webextensions with a more `postMessage` structure then we can save a round trip as it can post straight away <AutomatedTester> ... and if it were the `postMessage` then it would simplify sharing snippets of code <AutomatedTester> ... it might be more work but it would be easier for developers and offer more functionalit <AutomatedTester> q? |
#63 (comment) is a great writeup, thanks @jgraham! Parts of this seem potentially out of scope for a script execution module. In particular, finding elements by selector in principle doesn't involve scripts at all. But it is possible to implement using scripts, so maybe we start there, but without actually writing anything about selectors into the spec? I the meeting notes, @shs96c said:
This resonates with me. It sounds like the direction things are going is to have an The idea to have a communication channel I think there are probably lots of ways this could be achieved, and don't have any well thought out ideas :) |
For providing |
Is the idea that this Whatever the API shape, a stream of arbitrary data from the page to the client sounds good. |
Yeah, this would be bidirectional. So the script would have access to a MessagePort-like object with a postMessage method, and an onmessage event. The BiDi client would have a command to send a JS value to script, which invokes the MessagePort's onmessage event. There would also be a BiDi event that is invoked when the MessagePort.postMessage function is called. The command and the event would need to include the Realm ID or something similar to identify where the message is going or where it came from. |
One detail question for the purposes of discussion: do we have an agreed approach for sum types in the protocol? For script execution in an existing realm (i.e. ignoring sandboxes for now), I want it to be possible to specify either a context or a realm id. A use case here is "get a browsing context, and execute script in the realm associated with the active document of that context, without needing a seperate realm lookup command". So the proposal I have locally is something like:
So in this proposal we identify sum types by objects whose keys tell us the type name. But other options are possible e.g.
Or in theory one could just use the fact that realm ids and context ids are non-overlapping and not explicitly distinguish the types, but that doesn't strike me as very pleasant. Also, how do we feel about "target" as a name here? Other options, like |
Another question: errors. In WebDriver, if a script throws a exception we return an error response which has a code, but where the fields are mostly left up to the user agent (amusingly the spec tries to set an non-existent "data" property on the error to the value of the thrown exception). In CDP an exception is a normal response, but with an So we need to choose which model to follow here. Do we return a normal response but with details of the exception, or do we extend the error response to allow it to contain more type-specific error data? |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> Topic: Script execution<AutomatedTester> github: https://github.com//issues/63 <AutomatedTester> jgraham: I have started looking at the spec text here to do the minimum <AutomatedTester> ... I have 3 questions. 2 are in the last comment of the issue linked <AutomatedTester> ... 1. Do we want to keep the distiction between evaluate and callFunctionOn? <AutomatedTester> ... I think we should keep the distinctions <AutomatedTester> ... 2. We want to be able to specify the target of where the script could run. Like the realm or the browser context. There is a question in the PR on what should this look like. It could be a real or a browser context and I have some suggestions there <AutomatedTester> ... 3. How do we want errors to be handled? <AutomatedTester> ... There is a difference between webdriver http and bidi. Webdriver http exxpects the UA to populate it with info <AutomatedTester> ... in CDP it returns a success and then gives you a lot more detailed info <AutomatedTester> q? <simonstewart> q+ <AutomatedTester> ack simonstewart <AutomatedTester> simonstewart: are you saying in CDP you cant tell if there was a success or failing? <foolip> q+ <AutomatedTester> jgraham: I've only read the docs but it says that you get a success return but an exception field will be populated <AutomatedTester> simonstewart: as long as there as a mechanism to differentiate then it's fine since we can do that on the local end <AutomatedTester> ack foolip <AutomatedTester> foolip: I want to make sure I understnad the decision. Do we want to return an error at a protocol level or do we just populate a field? <simonstewart> q+ <AutomatedTester> jgraham: The question comes from CDP doing something very different to webdriver. If we want to follow webdriver we would want to have to a mechanism to create more richer error types <AutomatedTester> ... but CDP is there and we can follow that and allow clients to easily move over to it <foolip> q? <AutomatedTester> ... and not everything that happens in the client should be returned as a protocol error <AutomatedTester> ack simonstewart <foolip> sadym, do you know off hand what Puppeteer does here? Does it turn script exceptions into a rejected promise? <AutomatedTester> simonstewart: what ever we do we can always make http sit on top of bidi <AutomatedTester> ... so we want send error types and CDP allows us to send union types <AutomatedTester> ... I don't have a strong opinion here <AutomatedTester> q? <foolip> q+ <AutomatedTester> ack foolip <AutomatedTester> foolip: If it helps I can move to webdriver http and just do what happens there. <whimboo> foolip: https://github.com/puppeteer/puppeteer/blob/main/src/common/ExecutionContext.ts#L253-L275 <AutomatedTester> jgraham: I might look at what existing clients do now and then go from there <foolip> thanks whimboo! <AutomatedTester> github: end topic |
It seems logical not to mix the protocol and the script errors, like in CDP. But I'm not sure if we agreed on this or not: |
Script execution is a fundamental feature requirement for automation, and is a protocol feature that allows clients to implement many higher-level automation features before we have explicit support (i.e. anything that is accessible to content script).
In terms of precedent there are several examples to consider:
WebDriver has endpoints for executing a script both synchronously and asynchronously (although both calls are blokcing). In both cases the script is wrapped inside an implied function. In the async case one of the arguments to that function is a callback that is called with the return value. This allows use cases like adding a script that returns when the page recieves an event, but running the callback in the context of the event handler.
CDP (and, reportedly, the WebInspector protocol) has an Runtime.evaluate endpoint. This evaluates a script directly in the realm ("execution context") provided in the call, and returns the completion value. Async behaviour can be modelled by setting up a promise and using Runtime.awaitPromise to provide a callback once a condition is met.
WebExtensions have content injected scripts (e.g. gecko docs. These are run in the content process, but in a sandbox ("isolated world") so that they get access to the page without modifications to globals made by in-page scripts. They also provide unique DOM APIs to allow access to extension-related functionality and to postMessage data back to the main extension script.
The unique ability of extensions to return arbitary data at a time of their choosing seems very valuable for automation. For example it would give the ability to send an event every time a mutation observer is called. It also allows modelling the behaviour of WebDriver/HTTP and CDP; to model the behaviour of execute script, the provided script text would be wrapped in a function like:
and execute async script would be similar but with the
WebDriver.emit
function passed in as the final argument. To match the script execution to the returned value, we could return a token in the initial script response which would later be provided in any the output of emit calls originating in that script. I'm not sure how to model that in IDL, but in principle it's just a requirement that when script is injected the interface is instantiated with the token as internal data that the exposed API can access.Another question is whether injected script should run directly in the targeted realm, or should be sandboxed as extensions are. In gecko we already partially sandbox WebDriver scripts in a way that allows them to access functions on the page, but does not allow globals to be set by WebDriver and read by the page. Not having any access to page-set state seems problematic, particuarly for inspecting the state of objects. But some ability to sandbox scripts so they are able to access the initial values of DOM properties rather than page-set overrides does also seem useful.
The text was updated successfully, but these errors were encountered: