Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a module for executing script #63

Closed
jgraham opened this issue Oct 27, 2020 · 12 comments
Closed

Add a module for executing script #63

jgraham opened this issue Oct 27, 2020 · 12 comments

Comments

@jgraham
Copy link
Member

jgraham commented Oct 27, 2020

Script execution is a fundamental feature requirement for automation, and is a protocol feature that allows clients to implement many higher-level automation features before we have explicit support (i.e. anything that is accessible to content script).

In terms of precedent there are several examples to consider:

  • WebDriver has endpoints for executing a script both synchronously and asynchronously (although both calls are blokcing). In both cases the script is wrapped inside an implied function. In the async case one of the arguments to that function is a callback that is called with the return value. This allows use cases like adding a script that returns when the page recieves an event, but running the callback in the context of the event handler.

  • CDP (and, reportedly, the WebInspector protocol) has an Runtime.evaluate endpoint. This evaluates a script directly in the realm ("execution context") provided in the call, and returns the completion value. Async behaviour can be modelled by setting up a promise and using Runtime.awaitPromise to provide a callback once a condition is met.

  • WebExtensions have content injected scripts (e.g. gecko docs. These are run in the content process, but in a sandbox ("isolated world") so that they get access to the page without modifications to globals made by in-page scripts. They also provide unique DOM APIs to allow access to extension-related functionality and to postMessage data back to the main extension script.

The unique ability of extensions to return arbitary data at a time of their choosing seems very valuable for automation. For example it would give the ability to send an event every time a mutation observer is called. It also allows modelling the behaviour of WebDriver/HTTP and CDP; to model the behaviour of execute script, the provided script text would be wrapped in a function like:

let result = function (${args}) {
  $(script)
}
WebDriver.emit(result)

and execute async script would be similar but with the WebDriver.emit function passed in as the final argument. To match the script execution to the returned value, we could return a token in the initial script response which would later be provided in any the output of emit calls originating in that script. I'm not sure how to model that in IDL, but in principle it's just a requirement that when script is injected the interface is instantiated with the token as internal data that the exposed API can access.

Another question is whether injected script should run directly in the targeted realm, or should be sandboxed as extensions are. In gecko we already partially sandbox WebDriver scripts in a way that allows them to access functions on the page, but does not allow globals to be set by WebDriver and read by the page. Not having any access to page-set state seems problematic, particuarly for inspecting the state of objects. But some ability to sandbox scripts so they are able to access the initial values of DOM properties rather than page-set overrides does also seem useful.

@jgraham
Copy link
Member Author

jgraham commented May 12, 2021

Relevent use cases:

  • Evaluate a function against the page DOM e.g. execute a selector and return a handle for the matching objects, or get the text in an element.
  • Overwrite functions in the page DOM (e.g. to make random number generation deterministic for tests)
  • Wait for a specific event or set of events to occur in the page to be sure it's reached a stable state for running further tests (e.g. implementing wait-for-element-to-appear).

Best practice is that scripts which don't need to directly interact with the page should be put in a sandbox, which gives them access to the unmodified DOM (c.f. createIsolatedWorld from CDP, or Web Extensions content scripts). So the following seems like the minimum function set:

  • Ability to execute a script in the scope of the page (similar to a <script> element).
  • Ability to defer return of that script (needed for WebDriver-classic executeAsyncScript)
  • Ability to create a sandbox and execute script in that. The sandbox must give access to the unmodified DOM. Implementations have this, but the details are different. We probably don't have the ability to align on details here, so might need to document the shared properties of the sandbox that can safely be relied on.
  • Ability to run (sandboxed and unsandboxed)? scripts durning navigation, before any content scripts are run.

I think using CDP as a touchstone, Runtime.evaluate provides the ability to execute script in a context, and the awaitPromise argument is sufficient for defering the return; that would allow reimplementing the WebDriver executeAsyncScript semantics with the promise resolve function acting as the callback argument. Runtime.callFunctionOn seems to be required if you want to pass data from the client into the browser. Page.createIsolatedWorld creates a sandbox that can later be referenced with an execution context (realm) id. Page.addScriptToEvaluateOnNewDocument is used for adding scripts that are evaluated early in the load, before content scripts have run. This can use a sandbox if required, and returns a handle that can later be used to remove the scripts. This appears to apply to all pages in a specific target?

For communicating back from the script to the document, CDP seems to have two options:

  • The return value of the evaluated script/function
  • The Runtime.addBinding function which exposes a function in one/all execution contexts which, when called generates a Runtime.bindingCalled event with a name/(string) payload/context id. This means this can't easily be used to pass objects back.

By contrast WebExtensions content scripts work quite like sandboxed CDP scripts, but also have a Extension-specific API exposed in the global scope, including a postMessage API for communicating with the extension script. This kind of API for WebDriver scripts, including a way to emit events with custom data, seems like a more flexible approach, but it's not clear either a) how to specify such a thing or b) what implementation complexity is involved.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Script execution.

The full IRC log of that discussion <AutomatedTester> topic: Script execution
<AutomatedTester> github: https://github.com//issues/63
<AutomatedTester> jgraham: After we have something for navigation the next item on the list is script execution
<AutomatedTester> ... as it can open up the rest of what we can do
<AutomatedTester> ... what are the requirements around this.
<AutomatedTester> q+
<AutomatedTester> jgraham: I have written up in the issue some of the relevant things
<AutomatedTester> ... e.g. selectors against the DOM
<AutomatedTester> ... and being able to overload APIs on the page e.g. Random giving a response that isn't random
<AutomatedTester> ... and so on
<AutomatedTester> ... the next point is there are 2 points here on where we can execute script
<AutomatedTester> ... this can be on the page/worker
<AutomatedTester> ... and the 2nd one is sandboxing
<AutomatedTester> ... so that clients can install there own helpers that doesn't pollute the page
<simonstewart> q+
<AutomatedTester> ... we want it to be a lot like executeAsyncScript from webdriver but with more features
<AutomatedTester> ... and do we want to have it like using `arguments` or just execute as is
<brwalder> q+
<AutomatedTester> ... and the sandboxing seems really cool
<AutomatedTester> ... and do we want to have something like webextensions where we limit the surface and allow people to inject using something like a `postMessage`
<AutomatedTester> ack AutomatedTester
<AutomatedTester> automatedtester: One of the things that might be good is that Jason Leyba, a few years ago, suggesting moving webdriver execute script to being more promised based
<jgraham> s/limit the surface/have a dedicated WebDriver DOM API surface/
<AutomatedTester> jgraham: the way that CDP works now is that it has an `awaitPromise` type event
<AutomatedTester> ... and we can do something like that
<AutomatedTester> q?
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart: re: arguments... as long as we can reformulate webdriver on top then its fine. Ideally we want to update things in the simplest ways
<AutomatedTester> jgraham: [describes
<AutomatedTester> ... how call function works in CDP with arguments]
<AutomatedTester> ack brwalder
<AutomatedTester> brwalder: re: CDP something like `addBinding` or webextensions
<AutomatedTester> ... then we would have more network roundtrips
<AutomatedTester> ... where if we wnt for webextensions with a more `postMessage` structure then we can save a round trip as it can post straight away
<AutomatedTester> ... and if it were the `postMessage` then it would simplify sharing snippets of code
<AutomatedTester> ... it might be more work but it would be easier for developers and offer more functionalit
<AutomatedTester> q?

@foolip
Copy link
Member

foolip commented May 13, 2021

#63 (comment) is a great writeup, thanks @jgraham!

Parts of this seem potentially out of scope for a script execution module. In particular, finding elements by selector in principle doesn't involve scripts at all. But it is possible to implement using scripts, so maybe we start there, but without actually writing anything about selectors into the spec?

I the meeting notes, @shs96c said:

as long as we can reformulate webdriver on top then its fine. Ideally we want to update things in the simplest ways

This resonates with me. It sounds like the direction things are going is to have an awaitPromise argument, and https://w3c.github.io/webdriver/#execute-async-script is already defined in terms of promises, so it should end up being compatible.

The idea to have a communication channel postMessage between the evaluated script and the client also strongly resonates with me. When writing some things with Puppeteer, I've actually used console.log as a page→script message channel, in the absence of anything better.

I think there are probably lots of ways this could be achieved, and don't have any well thought out ideas :)

@bwalderman
Copy link
Contributor

For providing postMessage functionality, we had previously discussed using something with the same interface as a MessagePort, but serializing message data with the WebDriver BiDi serialization algorithm instead of structured clone.

@foolip
Copy link
Member

foolip commented May 13, 2021

Is the idea that this MessagePort would be bidirectional so that the client could also send things back in some way without invoking evaluateScript again?

Whatever the API shape, a stream of arbitrary data from the page to the client sounds good.

@bwalderman
Copy link
Contributor

Yeah, this would be bidirectional. So the script would have access to a MessagePort-like object with a postMessage method, and an onmessage event. The BiDi client would have a command to send a JS value to script, which invokes the MessagePort's onmessage event. There would also be a BiDi event that is invoked when the MessagePort.postMessage function is called. The command and the event would need to include the Realm ID or something similar to identify where the message is going or where it came from.

@jgraham
Copy link
Member Author

jgraham commented Sep 8, 2021

One detail question for the purposes of discussion: do we have an agreed approach for sum types in the protocol? For script execution in an existing realm (i.e. ignoring sandboxes for now), I want it to be possible to specify either a context or a realm id. A use case here is "get a browsing context, and execute script in the realm associated with the active document of that context, without needing a seperate realm lookup command". So the proposal I have locally is something like:

ScriptEvaluateParameters = {
  expression: text,
  awaitPromise: bool,
  target: Target
}

Target = (
  ContextTarget //
  RealmTarget
)

ContextTarget = {
  context: Context
}

RealmTarget = {
  realm: Realm
}

So in this proposal we identify sum types by objects whose keys tell us the type name. But other options are possible e.g.

ContextTarget = {
  type: "context",
  id: Context
}

RealmTarget = {
  type: "realm",
  id: Realm
}

Or in theory one could just use the fact that realm ids and context ids are non-overlapping and not explicitly distinguish the types, but that doesn't strike me as very pleasant.

Also, how do we feel about "target" as a name here? Other options, like context_or_realm are possible.

@jgraham
Copy link
Member Author

jgraham commented Sep 8, 2021

Another question: errors.

In WebDriver, if a script throws a exception we return an error response which has a code, but where the fields are mostly left up to the user agent (amusingly the spec tries to set an non-existent "data" property on the error to the value of the thrown exception). In CDP an exception is a normal response, but with an exceptionDetails field that contains an ExceptionDetails object.

So we need to choose which model to follow here. Do we return a normal response but with details of the exception, or do we extend the error response to allow it to contain more type-specific error data?

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Script execution.

The full IRC log of that discussion <AutomatedTester> Topic: Script execution
<AutomatedTester> github: https://github.com//issues/63
<AutomatedTester> jgraham: I have started looking at the spec text here to do the minimum
<AutomatedTester> ... I have 3 questions. 2 are in the last comment of the issue linked
<AutomatedTester> ... 1. Do we want to keep the distiction between evaluate and callFunctionOn?
<AutomatedTester> ... I think we should keep the distinctions
<AutomatedTester> ... 2. We want to be able to specify the target of where the script could run. Like the realm or the browser context. There is a question in the PR on what should this look like. It could be a real or a browser context and I have some suggestions there
<AutomatedTester> ... 3. How do we want errors to be handled?
<AutomatedTester> ... There is a difference between webdriver http and bidi. Webdriver http exxpects the UA to populate it with info
<AutomatedTester> ... in CDP it returns a success and then gives you a lot more detailed info
<AutomatedTester> q?
<simonstewart> q+
<AutomatedTester> ack simonstewart
<AutomatedTester> simonstewart: are you saying in CDP you cant tell if there was a success or failing?
<foolip> q+
<AutomatedTester> jgraham: I've only read the docs but it says that you get a success return but an exception field will be populated
<AutomatedTester> simonstewart: as long as there as a mechanism to differentiate then it's fine since we can do that on the local end
<AutomatedTester> ack foolip
<AutomatedTester> foolip: I want to make sure I understnad the decision. Do we want to return an error at a protocol level or do we just populate a field?
<simonstewart> q+
<AutomatedTester> jgraham: The question comes from CDP doing something very different to webdriver. If we want to follow webdriver we would want to have to a mechanism to create more richer error types
<AutomatedTester> ... but CDP is there and we can follow that and allow clients to easily move over to it
<foolip> q?
<AutomatedTester> ... and not everything that happens in the client should be returned as a protocol error
<AutomatedTester> ack simonstewart
<foolip> sadym, do you know off hand what Puppeteer does here? Does it turn script exceptions into a rejected promise?
<AutomatedTester> simonstewart: what ever we do we can always make http sit on top of bidi
<AutomatedTester> ... so we want send error types and CDP allows us to send union types
<AutomatedTester> ... I don't have a strong opinion here
<AutomatedTester> q?
<foolip> q+
<AutomatedTester> ack foolip
<AutomatedTester> foolip: If it helps I can move to webdriver http and just do what happens there.
<whimboo> foolip: https://github.com/puppeteer/puppeteer/blob/main/src/common/ExecutionContext.ts#L253-L275
<AutomatedTester> jgraham: I might look at what existing clients do now and then go from there
<foolip> thanks whimboo!
<AutomatedTester> github: end topic

@sadym-chromium
Copy link
Contributor

It seems logical not to mix the protocol and the script errors, like in CDP. But I'm not sure if we agreed on this or not:
"Consider exception is a normal thing for scripts, and return a success result with error details, as from the protocol perspective all went well."

@jgraham
Copy link
Member Author

jgraham commented Nov 10, 2021

I've split off the sandbox-related stuff into #144. I think we should close this one once #140 lands.

@sadym-chromium
Copy link
Contributor

#140 landed in #142, so the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants