Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize ScriptTransform constructor to allow main-thread processing #89

Open
alvestrand opened this issue Mar 19, 2021 · 39 comments
Open

Comments

@alvestrand
Copy link
Contributor

The RTCRtpScriptTransform constructor takes a Worker argument, limiting the usage of this form of the transform to Workers.

The older createEncodedStreams() function was agnostic as to where the processing was going to take place; a number of existing demos and apps have been written that do processing on the main thread; some have even prototyped both worker-based and main-thread-based processing and deliberately chosen main-thread-based processing.

The normal use case should be worker - but other use cases should be possible.

Proposal: Change the argument type of the constructor from Worker to (Worker or MessagePort). Dispatch the event (which could then be a message) on either the worker's implicit port or the explicit MessagePort.

This allows all the use cases that the older API allowed, but ensures that the simplest code will be the one invoking a Worker.

@alvestrand
Copy link
Contributor Author

Adopting this change would remove my objection to landing #64.

@youennf
Copy link
Collaborator

youennf commented Mar 19, 2021

This allows all the use cases that the older API allowed

Can you give details on the applications/JavaScript code that cannot be migrated to RTCRtpScriptTransform?

@youennf
Copy link
Collaborator

youennf commented Mar 19, 2021

Also, MessagePort can potentially be used for out-of-process message exchanges, for instance between window and service worker, or opener/opened-window.
DedicatedWorker was used here to limit the complexity of the implementation while retaining most of the functionality.

@alvestrand
Copy link
Contributor Author

I first wanted to suggest Window as the alternate target, but Window's PostMessage() seems to have a different signature than workers' PostMessage (which is derived from MessagePort); I wanted to have a single API to relate to.

@alvestrand
Copy link
Contributor Author

alvestrand commented May 12, 2021

Given that streams are transferable, once we let them out, we don't really have any way to limit how much they spread by PostMessage(), so I don't see much implementation simplification by this restriction.

It seems useful to have only one mechanism for interprocess communication (PostMessage), not two (PostMessage and the special-case internal processing that creates an event).

This variant was also proposed as an API in the January interim (slide 20, https://docs.google.com/presentation/d/1crumgYj4eHkjo04faLktPTg0QoYJhTFoosEBudfJBuw/edit#slide=id.g7957cd038b_9_116), but wasn't further explored in the February interim (slide 34, https://docs.google.com/presentation/d/1-ZB5pjq0hTl4iNxGhIc3polH9wjCps2Uzqg7D4v1n14/edit#slide=id.gbdce051084_0_3)

The discussion in the linked issue ( #48 ) does not give any pro/con discussion of event versus message.

@youennf
Copy link
Collaborator

youennf commented May 12, 2021

so I don't see much implementation simplification by this restriction.

By restricting to Worker, we do not open the door to out-of-process transforms, this simplify implementations a great deal.
This also makes using the API straightforward to understand and use (in-process, out-of-main-thread).

@alvestrand
Copy link
Contributor Author

Still don't understand: How, with transferable streams, does restricting to Worker close the door on out-of-process transforms?

I am not necessarily saying that closing the door on out-of-process transforms is a Bad Thing, but I don't understand how the stated tool ("restricting to Worker") accomplishes the task ("do not open the door to out-of-process transforms").

@alvestrand
Copy link
Contributor Author

Consider reimplementing https://webrtc.github.io/samples/src/content/insertable-streams/video-analyzer/ (the obvious example of an app where background processing doesn't matter) under the event-sending interface:

One can write a worker that receives the stream on an event, and then transfers the stream right back to the main thread, using an API that is part of the Streams API and not at all part of the webrtc API.
There is no simplification of implementation compared to a more generic, more common IPC API using PostMessage with a MessagePort target.

@youennf
Copy link
Collaborator

youennf commented May 19, 2021

The MessagePort code path is much more complex than Worker.postMessage code path.
We keep things simple for WebRTC encoded transform implementation by staying in process and we support all cases we actually care.

Additional API (transferable streams) can be used for more complex cases on top of webrtc encoded transform.
That is great, this is composability a la Unix.

This does not mean we should bake into the simple WebRTC encoded transform API everything that WebRTC encoded transform+transferable streams allow.

@youennf
Copy link
Collaborator

youennf commented May 19, 2021

If the goal is to be able to properly shim createEncodedTransform in JS, I think the following can be written:

  • Create a ReadableStream R1 and WritableStream W1 in window environment with JS sources SR1 and SW1
  • Create a worker and a script transform
  • Set the transform on the sender/receiver
  • In the worker, when receiving the readable/writable R2/W2, transfer them back to window (async operation).
  • When receiving R2 and W2 in window, use them to make SR1 and SW1 functional (async operation)
  • Return R1 and W1

@alvestrand
Copy link
Contributor Author

Re #89 (comment) about complexity: Why is the MessagePort code path more complex than Worker.postMessage? For the user or for the browser implementor?
After all, DedicatedWorker is defined as "act as if it has an internal MessagePort" (https://html.spec.whatwg.org/multipage/workers.html#dedicated-workers-and-the-dedicatedworkerglobalscope-interface), so I assumed that the underlying implementation was OOM the same, since the API is identical.

Re workaround: Yes, that's exactly what I was suggesting in #89 (comment), and it shows that:

  1. Requiring posting to a worker does not lead to decreased implementation complexity, since one still has to support processing the stream if it happens to be executing on the main thread (or some other thread)
  2. Requiring posting to a worker increases the complexity for the user in the (few) cases where main thread processing makes sense.

@youennf
Copy link
Collaborator

youennf commented May 19, 2021

Why is the MessagePort code path more complex than Worker.postMessage? For the user or for the browser implementor?

Both actually.
For the implementor, posting to a worker is easy as it is a destination that does not move (the implicit port is not transferred). Posting to a general MessagePort is harder since it might be transferred at will. In Safari, there is usually a central registry that does the routing and message management. Not sure about other implementations, let me know how Chrome is handling this.

As of user impact, using a MessagePort in the API will require the user to create a worker, create a MessageChannel, transfer one of the message port to the worker and use the other one in the script transform constructor.
For the user, it seems simpler and easier to understand to directly pass the Worker as a parameter, which covers all cases we know about.

  1. Requiring posting to a worker does not lead to decreased implementation complexity, since one still has to support processing the stream if it happens to be executing on the main thread

Only if the UA supports transferable streams, which can be implemented in a fully orthogonal way. This orthogonality is decreasing complexity.
The only case where it would not decrease complexity would be if UA would want to optimise the script transform+transfer stream code path. I haven't heard of use cases that would make this needed.

If there are such use cases, I would first try to see how we could update the API and resort on specific optimization if API change is not feasible.

2. Requiring posting to a worker increases the complexity for the user in the (few) cases where main thread processing makes sense.

I am not clear about these few cases.

@alvestrand
Copy link
Contributor Author

Replying in reverse order:

The use case I use as an example is https://webrtc.github.io/samples/src/content/insertable-streams/video-analyzer/, where the whole point of the processing is to get data that is presented on the page by code running on the main thread.

I mentioned this 21 hours ago in #89 (comment) too.

Using a (MessagePort or Worker) will not lead to any user increase in API user complexity for the worker case; that's what I suggested in #89 (comment) - I have not suggested removing Worker from that construct.

@alvestrand
Copy link
Contributor Author

FWIW - MessagePort in Blink is implemented as a Mojo IPC operation, and postMessage() in DedicatedWorkerGlobalScope is implemetned on top of MessagePort. So for Chrome, the underlying mechanism is identical for the two cases.
If I've read the code correctly.

@youennf
Copy link
Collaborator

youennf commented May 19, 2021

The use case I use as an example is https://webrtc.github.io/samples/src/content/insertable-streams/video-analyzer/, where the whole point of the processing is to get data that is presented on the page by code running on the main thread.

I have not looked precisely at what this page is doing but I think the usecase can be defined as:

  • Do the video rtc pipeline as efficiently as possible (as usual). In addition, analyse the data to gather stats/monitor the pipeline
  • Show the stats/analyze results in the web page
    I would implement it as:
  • Use script transform to efficiently do the processing out of main thread. Gather the stats in the worker.
  • Send the stats back to the page.
    Based on that, the current API supports that use case and would lead to more optimal results than doing the whole processing in main thread.

MessagePort in Blink is implemented as a Mojo IPC operation

In the previous message, I was referring to Worker.postMessage, not WorkerGlobalScope.postMessage (which might be the same for shared worker, service worker and dedicated worker in Chrome).
In any case, allowing a MessagePort would lead to require Mojo IPC operations. Currently, RTCRtpScriptTransform only requires posting a task on the worker thread, which is much simpler to implement.

@alvestrand alvestrand changed the title Generalize ScriptTransform constructor Generalize ScriptTransform constructor to allow main-thread processing Jul 22, 2021
@fippo
Copy link
Collaborator

fippo commented Oct 28, 2021

WPT question: who volunteers to rewrite all the tests to use workers?

@vitaly-castLabs
Copy link

Just my two cents. I'm working on this project involving feeding RTCEncodedVideo/AudioFrame to MSE, and it all works nicely with Chrome's createEncodedStreams. However porting it to Safari / RTCRtpScriptTransform is not exactly trivial because I don't have access to the video element and can't create a MediaSource object from a worker. So having a non-worker alternative would be really appreciated

@alvestrand
Copy link
Contributor Author

If Safari's implementation is the way they proposed in the spec, the RTCRtpScriptTransform is already executing in a worker (the constructor for RTCRtpScriptTransform takes a Worker argument). But I haven't tried coding on Safari.

https://w3c.github.io/webrtc-encoded-transform/#dom-rtcrtpscripttransform-rtcrtpscripttransform

@youennf
Copy link
Collaborator

youennf commented Sep 28, 2022

The way to go is to get MSE in a worker which is specified and has consensus from all browser vendors IIRC.
This seems a superior approach, technically speaking and timing wise. Until such support lands, I would recommend post messaging data to window manually.

@alvestrand
Copy link
Contributor Author

I had an acronym cache miss on MSE....

@aboba
Copy link
Contributor

aboba commented Sep 29, 2022

MSEv2 (like MSEv1) is a bit idiosyncratic with respect to "low latency" mode. So if the goal is low latency and there is no need to support DRM, it might be easier (and might save some containerization cycles) to use WebCodecs for decoding instead.

MSEv2 in Workers spec: https://w3c.github.io/media-source/
"Intent to Ship": https://groups.google.com/a/chromium.org/g/blink-dev/c/FRY3F1v6Two

@vitaly-castLabs
Copy link

The only reason I use MSE is DRM. Are there any benefits of going through all the trouble of extracting video frames and decoding and rendering them "manually" compared to doing nothing at all and letting webrtc/browser take care of it?

@aboba
Copy link
Contributor

aboba commented Sep 29, 2022

Currently, WebCodecs doesn't support DRM so MSE is your only alternative.

@alvestrand
Copy link
Contributor Author

Revisiting this issue .... we do have a fundamental disagreement about the need for main-thread support. But it seems to me that if we do it, we could at least agree on a reasonable interface.

Abandoning the messageport idea - how would this look?

interface RTCRtpScriptTransform : EventTarget {
constructor(Worker? worker, optional any options, optional sequence transfer);
attribute EventHandler onrtctransform;
}

With the obvioius semantics that if worker is not specified, the event is fired at the RTCRtpScriptTransform object.

Having this API shape available would (I think) make it possible to support the RTCRtpScriptTransform-based API for the use cases that are supported by the current Chrome interface, and may offer a path to converging the implementations on a single API.

@alvestrand
Copy link
Contributor Author

Seeking comment on #89 (comment) from @youennf and @jan-ivar

@jan-ivar
Copy link
Member

jan-ivar commented Dec 4, 2023

@alvestrand thanks I appreciate its illustrative power of our gap. But we can polyfill that.

Providing a polyfill shaped like you propose would take me a bit longer (making the transform a bona fide EventTarget), but here's a fiddle showing main-thread access in today's API (modulo bug 1868223):

const bouncer = new Worker(`data:text/javascript,(${bounce.toString()})()`);
function bounce() {
  onrtctransform = async ({transformer: {readable, writable}}) =>
    self.postMessage({transformer: {readable, writable}}, [readable, writable]);
}

To use this bouncer:

sender.transform = new RTCRtpScriptTransform(bouncer, {});
const event = await new Promise(r => bouncer.onmessage = r);

@jan-ivar
Copy link
Member

jan-ivar commented Dec 4, 2023

The issue is all browsers implement the WebRTC pipeline off main-thread, yet the RTCPeerConnection API that controls it is not exposed in workers, making main-thread the path of least resistance, which can lead to poor outcomes in interoperable performance. Exposing that pipeline in a worker-first API corrects this bias, by reversing the transfer burden.

@alvestrand
Copy link
Contributor Author

The issue is all browsers implement the WebRTC pipeline off main-thread, yet the RTCPeerConnection API that controls it is not exposed in workers, making main-thread the path of least resistance, which can lead to poor outcomes in interoperable performance. Exposing that pipeline in a worker-first API corrects this bias, by reversing the transfer burden.

I don't buy that argument; making something harder because some other thing is hard (the need for RTCPeerConnection in worker has been recognized for years, but isn't being acted on) isn't acting for the benefit of the developers (far less the users).

To be clear: I'm offering this option as a means of unifying our approaches so that we can get to a single API that's supported across browsers. Making life more difficult for existing developers (who are using this feature to the tune of hundreds of thousands of instances per day) is not a feature for us.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 5, 2023

Making life more difficult for existing developers (who are using this feature to the tune of hundreds of thousands of instances per day) is not a feature for us.

Who are these developers with hundreds of thousands of instances per day, and why are they on main-thread?

Isn't this concerning, when even the Media WG acknowledged that: “There is a consensus and we agree that media processing in general should happen in a Worker context. Not all use cases require this, though, e.g., non-realtime transcoding.”?

Assuming these developers aren't doing non-realtime transcoding, would it be fair to ask why they haven't switched to workers?

A worker-first API doesn't have these problems.

@youennf
Copy link
Collaborator

youennf commented Dec 6, 2023

FWIW, the web sites that are using this API I know of are all using workers, where the standard API is slightly simpler than Chrome's version.

we do have a fundamental disagreement about the need for main-thread support

Both Chrome API and the standard API are supporting main-thread AFAIK, the standard API is just requiring a few more lines to do main thread processing.
Is the goal to allow web developers to remove those lines, or is it something more important?
If the former, the priority of this issue seems rather low to me.

@alvestrand
Copy link
Contributor Author

The reason given by the developers I and @guidou are in contact with is that the benefit of using a worker has been evaluated for their use case, and the performance / quality benefit is negligible or negative.

Naturally, our closest contacts are with the Google Meet folks.

@youennf
Copy link
Collaborator

youennf commented Dec 6, 2023

negligible or negative.

Negligible is fine, negative is very different.
I would be glad to further dive in any performance issue that Google Meet folks might be facing when adopting Safari WebRTC encoded transform support.

FWIW, WebKit used to have RTC networking hitting main thread and we got evidence this was causing issues to actual users. Using a WebRTC encoded transform on main thread would typically reintroduce these issues that Safari users encountered. Why should we be doing this?

WebRTC engines like libwebrtc typically try to stay off the main thread in their networking path.
It makes sense to apply the same design principles to WebRTC encoded transforms.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 6, 2023

I would be glad to further dive in any performance issue that Google Meet folks might be facing when adopting Safari WebRTC encoded transform support.

I'd like to extend the same help to Google Meet folks with adopting Firefox's support of the same API (should they run into any differences from Safari).

Mozilla's findings remain that main-thread is a poor environment for realtime code due to the risk of jank. Workers leading to worse quality or worse performance contradict this and would be new information indeed.

@guidou
Copy link
Collaborator

guidou commented Dec 6, 2023

negligible or negative.

Negligible is fine, negative is very different. I would be glad to further dive in any performance issue that Google Meet folks might be facing when adopting Safari WebRTC encoded transform support.

The negligible part was about performance. The negative part was the extra complexity/maintenance burden of introducing a worker in their architecture for no visible gain in performance.

@jan-ivar
Copy link
Member

I'm glad to hear there was no negative performance impact.

The negative part was the extra complexity/maintenance burden of introducing a worker in their architecture ...

The spec API simplifies workers.

The use case I use as an example is https://webrtc.github.io/samples/src/content/insertable-streams/video-analyzer/, where the whole point of the processing is to get data that is presented on the page by code running on the main thread.

I've updated webrtc/samples#1646 to use the spec. It's only a few lines of difference (moving code to worker), and it's more performant, since there was no need to move processing to main thread in the first place.

Here's a working demo that works in all browsers. Hopefully this is helpful for those needing help with workers.

@lgrahl
Copy link

lgrahl commented Mar 16, 2024

Hi everyone, long time no see.

If I understand correctly, forwarding the stream from the Worker to some other MessagePort (where ever that ends up) is allowed, so where is the benefit of making it harder for developers to do just that? There are complex Worker scenarios that would benefit from not receiving the stream at a Worker-global entrypoint.

I can understand that you want to incentivise the use of Workers in this API (although I'm torn on the chosen method) but I believe allowing for MessagePort in the constructor could be a sweet spot and would also allow main thread processing in a "clunky but not so clunky" way.

@jan-ivar
Copy link
Member

If I understand correctly, forwarding the stream from the Worker to some other MessagePort (where ever that ends up) is allowed, so where is the benefit of making it harder for developers to do just that?

Things already being possible in JS generally counts against exposing new APIs, not for it.

Also, just because something is possible doesn't make it desirable or without cost. Transferable streams produce a "readable side in another realm" where the original realm feeding that readable in this case is the dedicated worker provided in the constructor.

There are complex Worker scenarios that would benefit from not receiving the stream at a Worker-global entrypoint.

What benefit? It seems hard to do anything in a worker without a Worker-global entrypoint e.g. self.onmessage.

I can understand that you want to incentivise the use of Workers in this API (although I'm torn on the chosen method) but I believe allowing for MessagePort in the constructor could be a sweet spot and would also allow main thread processing in a "clunky but not so clunky" way.

The MessagePort idea was abandoned up-thread so I won't rehash the arguments against it.

But a goal of the API is to avoid implicit main-thread, which requires a worker argument, and a MessagePort (another tunneling concept) is no substitute. whatwg/streams#1124 is selling this idea that it doesn't matter where readables are initially exposed, which is how we got into this mess, but it does, unless we want to have to rely on UA optimizations that break semantics.

@jan-ivar
Copy link
Member

There are complex Worker scenarios that would benefit from not receiving the stream at a Worker-global entrypoint.

Doesn't the combination of event listener architecture and the constructor's options argument help decentralize things?

// main.js
sender.transform = new RTCRtpScriptTransform(worker, {message: "myApp123"});

// worker.js
class MyApp123 {
  constructor() {
    self.addEventListener("rtctransform", ({transformer}) => {
      if (transformer.options.message != "myApp123") return;
      this.transformer = transformer;
    });
  }
}

@lgrahl
Copy link

lgrahl commented Mar 19, 2024

Okay, I think I understand the problem a bit better now you're tackling here. And your suggestion to attach and then immediately detach from the 'rtctransform' event when the matching transformer is received would work for the use cases I can think of. Thanks @jan-ivar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants