Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify sandboxed script execution #144

Open
jgraham opened this issue Nov 9, 2021 · 17 comments
Open

Specify sandboxed script execution #144

jgraham opened this issue Nov 9, 2021 · 17 comments

Comments

@jgraham
Copy link
Member

jgraham commented Nov 9, 2021

Being able to run WebDriver scripts in an environment that's isolated from the effect of content scripts is a commonly used feature in existing frameworks. For example Puppeteer creates a named sandbox whenever a frame is created, and uses that for JS implementation of the client API. PlayWright has a simplar concept of a "utility context" used to evaluate JS as required for the client API.

Underneath, these are using the https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-createIsolatedWorld CDP method. However there's also https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument which allowed creating this sandboxed environment automatically when a page loads (it's unclear to me whether the same sandbox is reused if you later call createIsolatedWorld with the same worldName parameter).

Inside an isolated environment we should have access to the context's WindowProxy object via the window global, but unlike in a normal WindowGlobal:

  • window should not be the global object
  • The DOM accessible via window.document should be a view which shares the internal state (information stored in internal slots or otherwise in spec internal properties) with the DOM in the corresponding WindowGlobal, but should not share user-set state (e.g. properties only set via js).

I think this is broadly correct, but may have got some details wrong. In addition current browser disagree on details (I believe the Blink constructs a fresh JS wrapper for the DOM in each isolate whereas Gecko uses Xrays, which also affect access to JS objects). Even glossing browser differences, gettting the details right here is likely to be a significant challenge at the specification level, since it exposes some of the unspecified difference between the internal ("C++") state and the state as seen through ECMAScript. The good news is that Web Extensions is likely to have broadly the same requirements for Content Scripts.

As a result I hope we can start the specification process with an API that defines the relevant surface area in the protocol and fill in some of the details about exactly how the DOM access works later. Practically that creates some risk of people creating scripts that depend on implementation specifics. But practically some per-implementation support is likely to be required in clients and if we can create clear notes about what works in practice in different implementations that should allow us to progress on WebDriver without needing to hold up progress until we can specify and fully align on the underlying sandbox semantics.

Sandbox IDL

An initial strawman proposal for the environment inside the wrapper:

interface SandboxGlobalScope {
  readonly attribute SandboxGlobalScope self;
  readonly attribute WindowProxyWrapper window;
  postMessage(any message);

WindowProxyWrapper is the type that gives the access to the associated WindowProxy, but exposing the internal state rather than the modified DOM state.

postMessage will emit a script.message event with the remote value sereialization of message. This provides an API for creating arbitary events from JS.

Presumably object handles from the corresponding browsing context should be resolvable inside the SandboxGlobalScope (i.e. if you evaluate document.body in the browsing context realm, and get back an object like {type: "node", objectId: "1", value: {…}} it should be possble to script.callFunction() in a sandbox for that browsing context and pass {objectId: "1"} as an argument and have it resolve correctly).

External API

In CDP, one calls createIsolatedWorld to create a sandbox. This returns a realm id. The sandbox can also be given a "worldName" which seems to be used informatively, but is otherwise not taken as a parameter in other commands (execpt Runtime.addBinding The advantage of this model is that sandbox creation is explicit, and scripting contexts can always just be identified by a realm id. However it means that addScriptToEvaluateOnNewDocument ends up with a different model where the sandbox name is supplied for each script.

An alternative would be to just allow creating sandboxes by name as part of evaluate and callFunction and other commands. In particular we could change the definition of the script.Target type so that ContextTarget = {context: ContextId, ?sandbox: text} i.e. optionally allow providing a sandbox name when specifying a context to execute script in. This has the disadvantage that a typo in the name could unexpectedly cause a new sandbox to be created. We'd also want to change the ScriptEvaluateResult type to provide the id of the realm in which the script executed and the sandbox name (and probably the ReamCreated event to also include the sandbox name where it exists). This would allow event-listening code to identify events originating in a sandbox, which seems important for clients to isolate their internal script execution from the user-facing API.

@jgraham
Copy link
Member Author

jgraham commented Nov 10, 2021

So I think that implicit sandbox creation when specifying a context target with a sandbox name is probably a good idea. But I wonder if there's also transition value in providing an explicit creation method so that we'd have a ~drop in replacement for people using createIsolatedWorld.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed sandboxed script execution.

The full IRC log of that discussion <AutomatedTester> topic: sandboxed script execution
<AutomatedTester> github: https://github.com//issues/144
<AutomatedTester> jgraham: With the script.eval and script.call nearly ready to ship
<AutomatedTester> ... however some clients don't want to do this
<AutomatedTester> ... because they can call things in the page
<AutomatedTester> ... what they do instead is use a script sandbox to do this
<AutomatedTester> ... they get a very filtered "window" object
<AutomatedTester> ... so looking at this we probably want to get this implemented as soon as possible to help the playwright and puppeteer folks
<AutomatedTester> ... there are few questions in the issue that I could use some help wit
<AutomatedTester> s/wit/with/
<AutomatedTester> ... the way this works in CDP is that you can use the `createIsolatedWorld` API
<AutomatedTester> ... and you get access to the dom via the `window` object
<AutomatedTester> ... we need to think about how `onLoadRunJS` type APIs would work
<AutomatedTester> ... we could use what CDP does here and have 2+ APIs or we could figure out how to do it implicitly
<AutomatedTester> q?
<AutomatedTester> ... There is a rough strawman in the issue and could be a bit of a deep question
<AutomatedTester> ... so for now can we discuss what the protocol level API would look like
<foolip> q+
<AutomatedTester> q?
<AutomatedTester> ack foolip
<AutomatedTester> foolip: are you saying that his is a readonly thing?
<AutomatedTester> ... can the script execution set properties?
<AutomatedTester> jgraham: I don't think it's readonly
<AutomatedTester> ... if you update the internal state of the DOM then it will be updated
<AutomatedTester> ... if you update a JS property then it won't necessarily be updated
<AutomatedTester> ... my understanding of how this works in blink is <describes technical section>
<AutomatedTester> ... however in gecko it's different
<AutomatedTester> ... I've not dug into the details that are observable differences
<AutomatedTester> ... I need to go back and look into it again
<AutomatedTester> foolip: are we trying to figure out what we should do or...
<AutomatedTester> jgraham: I am expecting that we will have implementation differences here
<AutomatedTester> ... and there are very few clients that we can hopefull solve things
<AutomatedTester> ... we are probably going to need to start with something sketchy and then build it up
<AutomatedTester> ... it's trying to work things that don't really exist to us
<AutomatedTester> ... fortunately there are some similarities with the webextensions groups here
<foolip> q+
<AutomatedTester> ack foolip
<AutomatedTester> foolip: that all makes sense to me
<AutomatedTester> ... and since webextensions already depend on it that we can converge here
<AutomatedTester> ... maybe there is something we can do
<AutomatedTester> q?
<whimboo> q+
<AutomatedTester> ack whimboo
<AutomatedTester> whimboo: when were implementing evaulate in the firefox cdp code we had issues around CSP
<AutomatedTester> ... has your issue taken that into account
<AutomatedTester> jgraham: I haven't looked into it just yet but I think we should be fine
<AutomatedTester> ... We should probably look at making a note that CSP should not block the execute script in this area
<AutomatedTester> ... I assume this is how CDP already works
<AutomatedTester> q?
<AutomatedTester> q?
<AutomatedTester> github: end topic
<jimevans> jgraham: not bidi related, but i've patched a hole in the webdriver html spec and would like a review (wpt test is forthcoming).

@jgraham
Copy link
Member Author

jgraham commented Nov 10, 2021

https://chat.mozilla.org/#/room/#fission:mozilla.org/$n-JVPWmHnODFMHH-csxoZRo5LZZErLgoGRQ-kHJp5Iw has some discussion of the differences between the Blink and Gecko models here.

@foolip
Copy link
Member

foolip commented Nov 18, 2021

Regarding postMessage and CDP's Runtime.addBinding, should we have something like Runtime.addBinding in BiDi that could also be used to create a message channel between the original script execution context and the BiDi client? postMessage is in a way a more specialized form of Runtime.addBinding, that only works in sandboxes.

Disclaimer: I'm a novice to some of this, high risk that I'm misunderstanding.

@sadym-chromium
Copy link
Contributor

I'm not sure yet, what is the SandboxGlobalScope for CDP Page.createIsolatedWorld is, but I'd assume it creates the new instance of Window wrapper. If so, it would be quite difficult to implement custom global for sandbox.

Regarding postMessage, we can add an optional parameter bindings to provide postMessage or any other bindings user likes.

@jgraham
Copy link
Member Author

jgraham commented Nov 19, 2021

Note that Web Extensions content scripts already use the same sandbox/isolated world concept and provide additional APIs on the global object, so I'm not convinced it's impossible to do that. But maybe the global is itself the SandboxWindowProxy or whatever we call it (i.e. the same object as window). I should check.

@jgraham
Copy link
Member Author

jgraham commented Nov 19, 2021

Regarding postMessage and CDP's Runtime.addBinding, should we have something like Runtime.addBinding in BiDi that could also be used to create a message channel between the original script execution context and the BiDi client? postMessage is in a way a more specialized form of Runtime.addBinding, that only works in sandboxes.

Well they're a bit different. addBinding creates a named function in one or more globals. Presumably the function is them visible to content scripts, so you have to be careful to avoid a name clash. It also only takes a string argument, so you don't get any protocol level serialization of argument values. In practice Puppeteer seems to use this in content scripts to implement an "evaluate in client" feature (i.e. to allow calling a client defined function which does something outside the browser sandbox and and passes the result back to the browser).

It's kind of unclear to me that setup where the code running in the page is directly calling a global binding is a good idea. For a similar use case I think I'd define something like

WebDriverMessagePort {
    postMessage(any message)
}

which wouldn't be by default exposed in any global, but would be seriazable over the protocol as something like {type: "webdriverport", value: {id: <someId>}}. Then you could do script.callFunction((port) => {/*do whatever with the port */}, args=[{type: "webdriverport", value: {id: 1}}])which would allow you to do whatever you want with the port, including create a function on the global that forwards its argument to the port, which would basically re-implement addBinding. We'd still need the ability to automatically execute scripts during the initial page load, but I think we want that anyway.

@jgraham
Copy link
Member Author

jgraham commented Nov 19, 2021

So, regarding the global, it does seem like the global should be the SandboxWindowProxy or whatever we're calling it. So perhaps it makes sense to put the WebDriver API in a webdriver object i.e. we'd have webdriver.postMessage since otherwise it conflicts with Window.postMessage. Or the other option would be to avoid having any special APIs in the global scope and only do the "pass in a WebDriverMessagePort" suggestion above. But that seems less convenient for users.

@sadym-chromium
Copy link
Contributor

I like the idea: instead of adding global bindings, pass bindings as deserializable arguments to script.callFunction. What I don't understand is why do we need the whole new interface for that? The argument can be something like:

{
  type: "binding", 
  value: "SOME_BINDING_NAME"
}

, which will be deserialized to a callback. Calling that callback with any arguments will cause the BiDi event with those arguments like:

{
  method: "script.bindingCalled",
  params: {
    arguments: [... remote values ...],
    name: "SOME_BINDING_NAME"
  }
}

@sadym-chromium
Copy link
Contributor

moved add binding discussion to another issue: #157

@jgraham
Copy link
Member Author

jgraham commented Dec 8, 2021

I've created an extremely draft PR with the work I have here so far at #158.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Sandboxed Script Execution.

The full IRC log of that discussion <jgraham> Topic: Sandboxed Script Execution
<jgraham> ScribeNick: foolip
<foolip> jgraham: turn real talk into text
<jgraham> RRSAgent: create minutes
<RRSAgent> I have made the request to generate https://www.w3.org/2022/01/12-webdriver-minutes.html jgraham
<foolip> github: https://github.com//issues/144
<foolip> jgraham: We've talked about sandboxed script execution before. If you haven't paged it in, we want some way to run scripts in a way that you can access the page DOM, but the JS properties you set aren't visible to content scripts, and you're not affected by what the content script has done.
<foolip> jgraham: This is relevant for the ability to implement "find element" with JS without worrying about querySelector being overridden.
<foolip> jgraham: Browsers already have a capability like this. The existing primitives aren't the same cross-browser, but getting convergence will be tricky, so we'll probably want some non-normative description on what authors can rely on.
<foolip> jgraham: I've put up a draft PR at https://github.com//pull/158
<foolip> jgraham: the model is we have a target (realm or context ID) and in the case of a browsing context you get to specify a sandbox name. If it exists, use it, otherwise create a new one.
<foolip> jgraham: or you can specify the realm ID if you already know it.
<foolip> jgraham: This is different to CDP in that there's no explicit "create sandbox" command, although you can recreate it by running an empty sandbox script. It's a superset of CDP.
<foolip> foolip: what about this is especially complicated or worth considering tradeoffs carefully?
<foolip> jgraham: first, do people think this is an acceptable API surface? or should we have explicit creation like in CDP? the tradeoff here is you can make mistakes by making typos and create a new sandbox.
<foolip> jgraham: the actual API surface feels fairly straightforward though.
<foolip> jgraham: in the PR, there's just issues where you'd expect to understand how the sandbox stuff works.
<foolip> jgraham: I've put up https://github.com//pull/169 with some of the support we might need for this.
<foolip> foolip: does one PR depend on another?
<foolip> jgraham: https://github.com//pull/169 is the one to review, as the actual thing to land. #158 isn't ready for review.
<jgraham> q?
<foolip> q+
<foolip> ack
<jgraham> ack foolip
<foolip> foolip: I guess the upside of the API surface is you save a roundtrip and it's easier to use?
<foolip> jgraham: Yes. What Puppeteer does is create a sandbox upfront and use that for everything. Given that, why not just have that hardcoded as a sandbox name?
<foolip> foolip: is there any notion of creating or garbage collecting a sandbox?
<foolip> jgraham: the lifetime is tied to the window.
<foolip> foolip: sounds good
<jgraham> q?
<foolip> foolip: I'd suggest requesting review from folks on the PR (Brandon isn't here, for example)

@whimboo
Copy link
Contributor

whimboo commented May 2, 2022

@jgraham we have sandbox script execution already in the WebDriver BiDi spec:
https://w3c.github.io/webdriver-bidi/#sandbox

Mind listing what's left for this issue or if it can be closed? Same for PR #158. Thanks.

@jgraham
Copy link
Member Author

jgraham commented May 3, 2022

The API is specified, the underlying model is still rather unclear. We could close this issue and use another one for the underlying model, or we could just keep using this issue. I don't really mind which, but this at least has some discussion of the model.

@whimboo
Copy link
Contributor

whimboo commented May 3, 2022

I see. So I'm fine with leaving this open. Thanks for clarifying.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Sandboxed script execution.

The full IRC log of that discussion <jgraham> Topic: Sandboxed script execution
<jgraham> GitHub: https://github.com//issues/144
<jgraham> q+
<orkon> whimboo: I added this because I went over old issues that were left open. It is not clear what is missing in this issue. I wonder what specifically required here to get this feature finished?
<jgraham> ack jgraham
<orkon> jgraham (IRC): basically in the spec we agreed about the name, sandbox. Details are missing but no one is implementing it from scratch. Semantics are not quite the same but it is not a priority
<orkon> jgraham (IRC): we can close the issue and say we specified it and we can have another issue about getting an actual definition done later. Or as a issue to track that future work in the same issue.
<jgraham> q?
<orkon> whimboo: on my side, we can close it and if there are things to add in the future. And then file another issue for this.
<sadym> + for creating new issue for details
<orkon> q+
<jgraham> ack orkon
<jgraham> orkon: I think we found an issue in the Firefox sandbox; if you delete a global object it will deleted in all sandboxes. I'd like this to be specified more. We can't investigate in detail.
<jgraham> orkon: This will cause a failing test in CDP-backed puppeteer in Firefox

@whimboo
Copy link
Contributor

whimboo commented Feb 9, 2023

@jgraham as per the meeting yesterday would you mind filing a new issue with all the details and close this one? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants