Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define session creation without HTTP driver #97

Closed
jgraham opened this issue Mar 30, 2021 · 5 comments
Closed

Define session creation without HTTP driver #97

jgraham opened this issue Mar 30, 2021 · 5 comments
Labels
enhancement New feature or request session Session module

Comments

@jgraham
Copy link
Member

jgraham commented Mar 30, 2021

Some implementations want to support connecting without requiring an initial HTTP request. This is required for feature parity with CDP-based clients e.g. Puppeteer that are able to establish a connection to the browser directly without going through a seperate driver binary.

The following discussion will assume a WebSockets based transport, but the same issues would apply to an implementation that wanted to allow connections over e.g. a unix pipe without the initial HTTP handshake.

Currently once a session is created, the HTTP layer returns a websocket url of the form ws://localhost:<port>/session/<sessionid> for the client to connect to. Since this requires the session id to be known it's clear that this doesn't work well for establishing a session directly over WebSockets. An obvious implementation would be to allow connecting to ws://localhost:<port>/session and defining a command like

SessionNewCommand = {
  method: "session.new",
  params: SessionNewParameters
}

SessionSubscribeParameters = {
  ? alwaysMatch: Capabilities,
  ? firstMatch: [*Capabilities],
}

Capabilities = {
  *text: any
}

Then, if you send this command when there's no existing session it would create a session and return a response with the matched capabilities and otherwise it would error.

One question is whether the session itself should reuse the connection; it would be the "wrong" resource since the session id wouldn't be in the path. But I don't immediately see a practical problem with reusing it in this case, and the alternative would require the client to establish a new connection which adds latency.

One wart might be if you want to allow reconnecting to the session once it dropped; in this case it might be necessary to connect ot the URL including the session id (to account for nodes accepting multiple sessions). That is also probably sufficient reason not to change the spec to make /session the only supported ws resource; in a node that supports multiple sessions that would make it hard to work out which session to reconnect to.

Another question is with this setup is how to to communicate the ws port to the local end. This is analogous to the problem of how to communicate the HTTP server address to local ends, and is usually solved either by putting the local end in control and allowing the client to select the address through remote-specific options (with some risk of races) or by communicating the address back through stdout of the client.

@whimboo
Copy link
Contributor

whimboo commented Apr 8, 2021

Some implementations want to support connecting without requiring an initial HTTP request. This is required for feature parity with CDP-based clients e.g. Puppeteer that are able to establish a connection to the browser directly without going through a seperate driver binary.

The following discussion will assume a WebSockets based transport, but the same issues would apply to an implementation that wanted to allow connections over e.g. a unix pipe without the initial HTTP handshake.

When reading the design document from Google in how the BiDi handler will be implemented there seems to be a strong position to make use of pipes only instead of a websocket connection. I think it would be good to get some kind of feedback from Google's and Microsoft's side here. Maybe @foolip and @bwalderman could give some insights? For our current CDP implementation in Firefox we do not use Pipes and as such would have to add support for that, which might require additional platform work to be done.

One question is whether the session itself should reuse the connection; it would be the "wrong" resource since the session id wouldn't be in the path. But I don't immediately see a practical problem with reusing it in this case, and the alternative would require the client to establish a new connection which adds latency.

The connection that was created for the client will remain and could automatically be attached to the WebDriver session as created via the /session end-point. A current PoC of mine for Firefox works fine that way.

One wart might be if you want to allow reconnecting to the session once it dropped; in this case it might be necessary to connect ot the URL including the session id (to account for nodes accepting multiple sessions). That is also probably sufficient reason not to change the spec to make /session the only supported ws resource; in a node that supports multiple sessions that would make it hard to work out which session to reconnect to.

Yes, if a client wants to reconnect to such an initiated WebDriver session, it would have to know the session id. Given that with the former session creation the session has a listener running on /session/%uuid% the new connection attempt could be correctly assigned to the existent session. Using an unknown session id should fail with an invalid session id error. But also trying to connect to /session again, should fail as already described above.

Another question is with this setup is how to to communicate the ws port to the local end. This is analogous to the problem of how to communicate the HTTP server address to local ends, and is usually solved either by putting the local end in control and allowing the client to select the address through remote-specific options (with some risk of races) or by communicating the address back through stdout of the client.

I think this also depends on the above question regarding the usage of Pipes. Without them it might still have to be printed to stderr / stdout.

@jgraham
Copy link
Member Author

jgraham commented Apr 8, 2021

I don't think the websockets-vs-pipes thing makes a big difference here. There will need to be an out-of-band way to communicate the entrypoint for communication irrespective of whether that's a file handle, a ws address or something else. The same problem exists for the HTTP drivers; you have to communicate the address of the HTTP server out of band.

@whimboo
Copy link
Contributor

whimboo commented Apr 8, 2021

I had a look at the Puppeteer source code and as of right now they have both the WebSocket and Pipe connections implemented, whereby Pipe isn't used by default.

https://github.com/puppeteer/puppeteer/blob/943477cc1eb4b129870142873b3554737d5ef252/src/node/LaunchOptions.ts#L99-L103

Also CC'ing @mathiasbynens.

@bwalderman
Copy link
Contributor

The Chromium BiDi handler design document proposes using pipes but it could just as easily be implemented over websockets, or something else. The transport mechanism isn't particularly important. One thing I do want to call out though is that the Chromium in-browser BiDi handler in this design wouldn't support the notion of multiple sessions.

One of the benefits of establishing a direct connection to the browser, as previously mentioned, is that clients would not need an intermediate driver binary. However, in Chromium at least, for this idea to work, the browser process first needs to be launched out-of-band, and the connection would implicitly be associated with a single session. In Chromium, there is a 1-to-1 relationship between a WebDriver session and a browser process. So in Chromium, to support the creation of multiple sessions over some kind of connection (whether that's a websocket or a unix pipe), the thing servicing the connection would have to be something apart from the browser process(es) which can launch the browser processes, and associate them with session IDs. Which essentially means we're back to having a driver binary.

So the BiDi handler design leaves session management up to its client (i.e. ChromeDriver, or Puppeteer). The client establishes a direct pipe connection to a browser process and all traffic going over that connection implicitly belongs to a single WebDriver session. This design is not being proposed as a replacement for a establishing a websocket connection as defined in the spec. It's meant more as an internal implementation detail for ChromeDriver and Puppeteer. ChromeDriver would still maintain a mapping of session IDs to browser connections and would serve a spec-compliant websocket resource for each session so that ChromeDriver users can establish a WebDriver session through HTTP and then connect to a websocket as described in the spec.

@whimboo
Copy link
Contributor

whimboo commented Oct 6, 2021

This actually landed via PR #99.

@whimboo whimboo closed this as completed Oct 6, 2021
@whimboo whimboo added enhancement New feature or request session Session module labels Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request session Session module
Projects
None yet
Development

No branches or pull requests

3 participants