Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should WebCodecs be exposed in Window environments? #211

Closed
youennf opened this issue Apr 30, 2021 · 121 comments · Fixed by #316
Closed

Should WebCodecs be exposed in Window environments? #211

youennf opened this issue Apr 30, 2021 · 121 comments · Fixed by #316
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).

Comments

@youennf
Copy link
Contributor

youennf commented Apr 30, 2021

As a preliminary to #199, let's discuss whether it is useful to expose WebCodecs in Window environment.
See the good discussions in #199 for context.

@youennf
Copy link
Contributor Author

youennf commented Apr 30, 2021

@chcunningham, can you describe the usecases you think will benefit from using WebCodecs in Window?
It seems you are referring to MSE low latency but I would like to be sure.

@chcunningham
Copy link
Collaborator

In the previous issue I mentioned the use cases as:

... boutique use cases where using a worker is just extra hoops to jump through (e.g. pages that don't have much of a UI or pages that only care to encode/decode a handful of frames).

And I later clarified:

My argument is: for folks who do not need lots of codec io, or for which the main thread is doing little else besides codec io, the main thread is adequate. Requiring authors to use workers in this case just adds obstacles.

I used the word "boutique" to suggest that such uses need not fit into one of our well known categories. The web is a vibrant surprising place (and my own creativity is pretty limited). Can we agree that such uses will exist and they may meet one or both of the scenarios I gave above?

I didn't intend to say MSE low latency is in this camp. MSE low latency has lots of codec io and many sites in that category will have a busy main thread.

Let me grab a few other snips from that issue since we're moving here...

Keep in mind that WebCodecs explicitly does not do actual decoding/encoding work on the main (control) thread (be that window or worker). The work is merely queued there and is later performed on a "codec thread" (which is likely many threads in practice).

@aboba @padenot: opinions on the core issue? Is it reasonable to expose on Window for use cases where Window's main thread ability is sufficient?

@padenot
Copy link
Collaborator

padenot commented May 3, 2021

I see it as just a control thread, to me, it's fine. I don't expect folks to do processing on the media, but I do expect developers to use this API in conjunction with the Web Audio API, WebGL or Canvas.

@youennf
Copy link
Contributor Author

youennf commented May 4, 2021

I do not have any issue with Window being the control thread. The current proposal is to surface media data in the control thread. This somehow conflates control and media threads.
If you look at low-level APIs like VideoToolbox (which WebCodecs try to take inspiration from), the thread of the output data is not the thread where parameters (say bitrate) are set or where the input data is provided.

With @chcunningham, we agreed that using main thread as the output data thread is a potential issue:

web pages should use WebCodecs from a DedicatedWorker and that the above snippet has a potential memory issue.

If we look at the WebCodecs API surface, it is easy to write shims that expose codec APIs to window environments.
The main advantage I see is that it tells authors what is probably best to do without forbidding them to do what they want to do.

@chcunningham
Copy link
Collaborator

chcunningham commented May 4, 2021

The goal of the control vs codec thread separation was to ensure that implementers don't perform actual encoding/decoding work on the main thread. We can maintain that separation while still receiving outputs on the main thread.

The actual threads used under the hood aren't what we intend to describe. For example, in Chromium the VideoToolbox APIs are invoked in a completely different sandboxed process from where the web page is rendered. And, in that separate process, we expect that actually many threads are used.

With @chcunningham, we agreed that using main thread as the output data thread is a potential issue:

I don't agree that the main thread inherently creates a memory issue. The nuance from the other issue is important. There, you wrote:

Based on this understanding, if the controlling thread is spinning, the frames might stay blocked in the controlling thread task queue, which is a potential memory problem.

I agree* with the above statement, irrespective of the whether the thread is the main window thread vs the main dedicated worker thread.

* nit: depending on the implementation, it may be more of a performance problem rather than a memory problem. For Chromium, I think we have a finite pool of frames to allocate to a camera stream. If users fail to release (close()) the frames back to us, the stream simply stalls.

@youennf
Copy link
Contributor Author

youennf commented May 5, 2021

I don't agree that the main thread inherently creates a memory issue

Do you agree though that, on a worker thread, spinning the controlling thread is most certainly a bug that the web application can (and should fix)? But that this is not the case on main thread (i.e. some code outside of the control of the web application may randomly spin the web app controlling thread).
If so, we can agree that it is safer for any application to not use main thread as 'media' thread.

So far, no use case has been brought that justifies taking this risk.

For Chromium, I think we have a finite pool of frames to allocate to a camera stream. If users fail to release (close()) the frames back to us, the stream simply stalls.

That is a very good point we should discuss.
The current API design makes surfacing to the web all those OS/implementation details, which is something we should think of very carefully.
This can lead to issues in terms of perf/memory/fingerprinting/interoperability.

@chcunningham chcunningham mentioned this issue May 6, 2021
@chcunningham
Copy link
Collaborator

Do you agree though that, on a worker thread, spinning the controlling thread is most certainly a bug that the web application can (and should fix)?

Generally yes.

But that this is not the case on main thread (i.e. some code outside of the control of the web application may randomly spin the web app controlling thread).

I replied to this point in the previous issue.

I agree that some apps don't know what's running on their page. Those apps don't fit the use case I gave....

To emphasize, I expect some sites will offer very focused experiences, free of third party scripts, ads, etc, and for which the window main thread plenty available.

So far, no use case has been brought that justifies taking this risk.

I offered scenarios in the comments above. In these scenarios, there is no memory/perf risk. Why is this not sufficient?

Maybe a concrete example would help. Imagine a simple site that streams a feed from a single security camera. This need not be a flashy big name site. It might even be someones hobby project. The UI for this site is just a box with the video feed. The site has no adds, no third party scripts. It's main thread is doing very little. There is ample room for the main thread manage fetching, codec io, and rendering of the video.

@youennf
Copy link
Contributor Author

youennf commented May 6, 2021

To emphasize, I expect some sites will offer very focused experiences, free of third party scripts, ads, etc, and for which the window main thread plenty available.

This is not really a use case for exposing to main thread, this is more a scenario where the issues I am pointing out may not happen (but see below).

First, the JS shim works equally well and does not bring any major drawback AFAIK. Do you agree?

Also, if the web site wants frame access, this is probably to do some fancy processing on each frame.
A worker processing cost is probably neglectible at this point, and the fancy processing is probably best done in the worker anyway.
If the page is not doing any per-frame processing and wants to really optimise things, I agree a worker might be overkill.
Website will best be served by MediaStreamTrack+HTMLVideoElement.

The UI for this site is just a box with the video feed.

Why not using a MediaStreamTrack directly created by the UA from the decoder output then?

The site has no adds, no third party scripts. It's main thread is doing very little

Are you suggesting to restrict exposure of WebCodecs on window environments to only those safe cases?

In any case, let's say as a user I sent an email containing that website URL to a friend and the user clicks on the link from a web mail (say gmail). Depending on the web mail, the website and the UA, the website may be opened in the same process as the web mail, or in a process with other pages. This might be especially true on low-hand devices as a UA decision to save memory.
Let's also say that user opened two pages of the same website, a simple one and a complex one which contains some third-party scripts. Website will not know whether the two pages are running in the same process or not.

To reliably select whether using a worker or not, a web developer will need to understand a lot of things and do extensive research. In practice, it will be difficult to get any guarantee across User Agents, OSes and devices.

Exposing WebCodecs solely to workers is a good hint to web developers that they should do their processing in workers.

@sandersdan
Copy link
Contributor

sandersdan commented May 6, 2021

I disagree with the premise that it's only correct to use WebCodecs in a worker even in the case the main thread is contended.

Offloading WebCodecs use can improve latency, but you can still get full WebCodecs throughput when controlling it from a contended thread. Low-latency use cases are not the only use cases that WebCodecs is intended for (if they were, we wouldn't have an input queue).

WebCodecs allows for low-latency use but I do not think we should require apps to use it that way.

First, the JS shim works equally well and does not bring any major drawback AFAIK. Do you agree?

Having to have a worker context around to be able to use the Chrome JS Console to experiment with WebCodecs would substantially frustrate my learning and debugging. And I'm an expert at this!

Depending on the web mail, the website and the UA, the website may be opened in the same process as the web mail, or in a process with other pages.

Main thread contention across sites is to me a quality of UA issue, and is substantially improved in recent history due to the widespread adoption of site isolation. Blink is currently experimenting with multiple render threads in a single process, which has the potential to resolve the remaining cases.

Exposing WebCodecs solely to workers is a good hint to web developers that they should do their processing in workers.

I think it's generally understood that the most direct solution to main thread contention is worker offload, so any sites that bother to collect performance metrics won't have any confusion here.

@bradisbell
Copy link

As a preliminary to #199, let's discuss whether it is useful to expose WebCodecs in Window environment.

For my projects, I can't think of a use case where I wouldn't use WebCodecs from Window.

I'm manipulating audio/video, with the video and canvas objects actually displayed on the page. Shuffling all that data off to a worker thread, just to then have it shuffled off to a codec on some other user-agent internal process or thread seems like unnecessary overhead, and is definitely a hassle for the developer.

I wholeheartedly agree with @chcunningham that there will be other use cases not imagined here.

@dalecurtis
Copy link
Contributor

Unless TAG or another standards group has concluded that certain classes of APIs must be limited to worker contexts, I think the phrasing of the initial question is inverted. I.e., we should instead be discussing why wouldn't we expose this on window. We shouldn't apply restrictions without reason.

The only reason I can think is that we want to limit the ability of users to shoot themselves in the foot under certain specific low latency scenarios. While the reasons against seem multitude -- especially in the pain it would cause for common use cases and how it would impact first frame latency.

@youennf
Copy link
Contributor Author

youennf commented May 11, 2021

Low-latency use cases are not the only use cases that WebCodecs is intended for (if they were, we wouldn't have an input queue).

The driving use-cases I heard are low-latency MSE and WebRTC-like stacks.
I will be interested in hearing more about the other use-cases, whether they need raw video frame fine-grained access...

Having to have a worker context around to be able to use the Chrome JS Console to experiment with WebCodecs would substantially frustrate my learning and debugging. And I'm an expert at this!

This seems like a usability issue. I fear that the same principle will end up pages using WebCodecs in main thread while they should not. WebCodec examples I saw are most of them main thread only even though they are dealing with real time data.

Blink is currently experimenting with multiple render threads in a single process, which has the potential to resolve the remaining cases.

These are all good points that might solve the issues I described. It is great to see this coming.
Given the idea is to go very quickly with WebCodecs, I think it makes sense, for V1, to restrict the feature set to what is needed for sure. And progressively extend API after careful consideration.

It is always easy to extend an API to Window environment in the future.
I do not see WebCodecs in Window environment as a P1 but as a P2.

@youennf
Copy link
Contributor Author

youennf commented May 11, 2021

For my projects, I can't think of a use case where I wouldn't use WebCodecs from Window.

That is interesting. Can you provide pointers to your applications?
Window APIs do not necessarily have to be the same as worker APIs.

In WebRTC, we went to a model where APIs are Window only/mostly but do not give the lowest granularity.
Things might change and we are discussing exposing the lowest granularity in Worker/Worklet environments, not necessarily in Windows environments.

@youennf
Copy link
Contributor Author

youennf commented May 11, 2021

The only reason I can think is that we want to limit the ability of users to shoot themselves in the foot under certain specific low latency scenarios.

Low latency scenario is one such example.
A video decoder that is using a buffer pool might end up without available buffers in case they are all enqueued for delivery while there is thread contention, leading to potential decoding errors. Pages might more often end up in those cases on main thread, and these issues might not be obvious to debug given buffer pools might vary across devices.

Again, I am hopeful this can be solved.
It seems safer though to gradually build the API. Exposing to Worker at first covers major know usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.

@dalecurtis
Copy link
Contributor

dalecurtis commented May 11, 2021

A video decoder that is using a buffer pool might end up without available buffers in case they are all enqueued for delivery while there is thread contention, leading to potential decoding errors. Pages might more often end up in those cases on main thread, and these issues might not be obvious to debug given buffer pools might vary across devices.

I agree buffer pools are an issue, but I feel that's orthogonal to window vs worker for a few reasons:

  • The window/worker distinction is meaningless on low end devices.
  • Buffer pool starvation generally doesn't lead to decoding errors, just a slow down in throughput - so again this is really only a concern for low latency cases. Is there a case that you're thinking of that causes errors?
  • Since starvation is not solved even in a worker, the right place to address buffer starvation is through tooling and API improvements. I.e., developer console messages upon starvation or when scenarios that may lead to starvation are detected (e.g., not calling close() on VideoFrames). API shape improvements can be of the form of mechanisms which indicate that hardware capped buffer pools shouldn't be used (e.g., accelerationPreference = 'deny').

Again, I am hopeful this can be solved.
It seems safer though to gradually build the API. Exposing to Worker at first covers major know usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.

I have trouble following this logic. I don't agree that limiting to a worker covers the main use cases. We'll certainly query all the developers in our OT though for feedback. Limiting to a worker will be detrimental to high frame rate rendering (transferControlToOffscreen would help this -- but is Chromium only) and time to first frame. Especially for single-frame media such a limit would dominate the total cost.

@aboba
Copy link
Collaborator

aboba commented May 11, 2021

Youenn said:

A video decoder that is using a buffer pool might end up without available buffers in case they are all enqueued for delivery while there is thread contention

Pool exhaustion seems most likely to be caused by a memory leak (e.g. VideoFrame.close() not being called), possibly in conjuction with use of a downstream API for rendering (e.g. Canvas, MSTGenerator, WebGL or WebGPU). It seems that this problem can occur regardless of whether WebCodecs or other related APIs runs in a Worker. So to address pool exhaustion concerns, you'd probably want to require related APIs to automatically free buffers.

@chcunningham chcunningham added the breaking Interface changes that would break current usage (producing errors or undesired behavior). label May 12, 2021
@chcunningham chcunningham added this to the V1 Launch Blockers milestone May 12, 2021
@chcunningham
Copy link
Collaborator

chcunningham commented May 26, 2021

We sent an email to all participants in the Chrome origin trial. On the question of "should we keep WebCodecs in Window scope", the tally was 10 in favor, 6 neutral, 1 ambivalent, 1 opposed. Overall I think this shows a compelling case for maintaining window-exposed interfaces. Breakdown below.

Click the summaries below to see reply excerpts.

The opposed response argues for forcing developers into a pattern that frees the main thread.
@willmorgan of iproov.com wrote:

I am in favour of moving this to a DedicatedWorker scope and think that doing so would be a big tug in the right direction for freeing up the UI thread, and encouraging developers to adopt that pattern for many other things.

But many apps found no performance reason to use workers and highlight that workers creates additional complexity.
@koush of vysor.io wrote:

I am using web codec in the window scope just fine. For low latency as well (vysor)....
I actually tried a worker implementation and saw no appreciable performance gains, and the code became more complicated: had to use workers, off screen canvas, detached array buffer issues, etc.

@etiennealbert of jitter.video wrote:

Our use case is to convert sequences of canvas generated images into a video. When a user triggers this kind of job it becomes the one and only thing in our interface, and we do nothing else until the final video file is ready. We don't do this for performance reasons but because this "export" job is an important job to the eyes of our users.

@AshleyScirra of scirra.com wrote:

We also intend to use WebCodecs in future for transcoding media files. Again providing the API is async it seems to only be a hindrance to make the API worker-only, as this use case is not latency sensitive, and should already be doing the heavy lifting off-main-thread with an async API.

@BenV of bash.video wrote:

We also have a smaller internal application for prototyping camera rendering effects that currently exclusively uses WebCodecs in the main thread, mostly just out of convenience. We're using TypeScript/Babel/Webpack and the process of setting up a worker causes enough friction that it was just easier to just do everything in the Window scope. All this application does is capture frames from the camera, manipulate them, and render the output so contention with UI updates is not really an issue so simplicity won out for this case.

@bonmotbot of Google Docs wrote:

We are using the ImageDecoder API to draw animated GIF frames to canvas. We're using it in the Window scope currently, so this change would be breaking. From our perspective, adding a Worker to this use case would add complexity without much benefit.

Also, some apps desire to use WC in combination with other APIs that are only Window-exposed.

Mentioned examples include: Canvas, RTCDataChannel, WebAudio, input (e.g touch) events. Forcing apps to use DedicatedWorkers adds complexity to code that needs other Window-only APIs.

From the performance angle, Canvas is of particular note. It is the most common path for apps to paint VideoFrames. OffscreenCanvas is not yet shipped in Safari and Firefox, which means no way to paint directly from a worker. OffscreenCanvas may eventually ship everywhere, but its absence now adds complexity to using WebCodecs from workers and removes the theoretical performance benefit.

@surma of squoosh.app (Google) wrote:

The decoding code path of Squoosh runs on the main thread. If we need to use Wasm, we invoke that in a worker, but for native decoding we need to rely on Canvas (as OffscreenCanvas isn’t widely supported), so we have no choice. So the WebCodecs code in the PR runs on main thread as well. In my opinion, at least in the context of ImageDecoder, I am convinced that the API should be offered on the main thread. The API is async anyway so decoding can (and should) happen on a different thread analogous to Image.decode().

Aside: the use of Canvas above is not unique to ImageDecoder.

@AshleyScirra of scirra.com wrote:

Further it's already a pain how many APIs work in the Window scope but are not available in workers - adding APIs that are only available in workers and not in the window scope seems like it would just be making this even more of a headache. In general we want to write context-agnostic code that works in both worker and window mode.

@jamespearce2006 of grassvalley.com wrote:

We are (or will be) using WebCodecs in a DedicatedWorker scope, and I think it makes sense to encourage this as the normal approach. However, given that WebAudio contexts are only available in the Window scope, I think requiring use of a dedicated worker may be too burdensome for some simpler audio-only applications.

Finally, even for apps that use WC in workers, Window-interfaces are useful for synchronous feature detection.
@BenV of bash.video wrote:

We do check for the existence of WebCodecs in the Window scope so that we can pick the appropriate rendering path synchronously based on browser features rather than having to spin up a DedicatedWorker and query it asynchronously.

@youennf
Copy link
Contributor Author

youennf commented Jun 1, 2021

There was some interesting feedback in particular from 'many apps found no performance reason' at last W3C WebRTC WG meeting. @aboba, would it be possible to have the web developer feedback here?

@youennf
Copy link
Contributor Author

youennf commented Jun 1, 2021

Also, some apps desire to use WC in combination with other APIs that are only Window-exposed.

This is a fair concern. Do we know which APIs are missing in workers?

@youennf
Copy link
Contributor Author

youennf commented Jun 1, 2021

Finally, even for apps that use WC in workers, Window-interfaces are useful for synchronous feature detection.

More and more feature detection are done asynchronously, for instance listing existing codec capabilities.
It is very easy to write a small shim doing that as an async function.

@youennf
Copy link
Contributor Author

youennf commented Jun 1, 2021

Do we know which APIs are missing in workers?

Oh, I now see the list.
For Canvas, there is OffscreenCanvas, is the issue to learn it or are there missing features?

About RTCDataChannel, it is now exposed in workers in Safari and there is a PR for it.
As of WebAudio, this might be feasible once MediaStream/MediaStreamTrack can be exposed in workers, which is ongoing work.

@surma
Copy link
Member

surma commented Jun 1, 2021

For Canvas, there is OffscreenCanvas, is the issue to learn it or are there missing features?

OffscreenCanvas is not widely supported, so it can only be used as a progressive enhancement. But I envision Web Codecs to also be useful with other, main-thread-only APIs like CSS Paint API or WebUSB.

@chcunningham
Copy link
Collaborator

There was some interesting feedback in particular from 'many apps found no performance reason' at last W3C WebRTC WG meeting. @aboba, would it be possible to have the web developer feedback here?

@youennf, each of those statements in my above comment is a clickable zippy that expands to show the developer feedback. LMK if this isn't what you meant.

@chcunningham
Copy link
Collaborator

chcunningham commented Jun 1, 2021

@youennf, does the web developer feedback above persuade you to maintain window exposure for WebCodecs?

@koush
Copy link

koush commented Jun 1, 2021

For Canvas, there is OffscreenCanvas, is the issue to learn it or are there missing features?

OffscreenCanvas is not widely supported, so it can only be used as a progressive enhancement. But I envision Web Codecs to also be useful with other, main-thread-only APIs like CSS Paint API or WebUSB.

@surma That's how Vysor already works today (https://app.vysor.io/).
As you mentioned, WebUSB is main thread only, so removing WebCodec from the main thread would increase complexity a good amount. As noted in @chcunningham's comment, worker webcodec was an avenue I tried and reverted since it just made things complicated without any noticeable performance benefit. I think it likely actually increased latency since I needed to copy the WebUSB array buffers before passing them along as detached buffers, though this is something I could probably refactor to work around but didn't want to bother.

@jan-ivar
Copy link
Member

Media application developers tend to do the right thing when given appropriate guidance, documentation and code samples.

"sheesh this is a pain, why are there all these hurdles here?"

The above two descriptions of Media application developers appear in opposition. I think the latter is correct.

Yes, media libraries are going to have to think about threading. Yes, the developer experience is going to be harder with workers than with exposure on main thread. The problem is none of those arguments seem contained to non-realtime use cases. Instead, they highlight the path of least resistance.

This makes me more concerned, not less, that if we expose to both main thread and worker, then some realtime applications may never be written correctly (on a worker). And a chorus of end users will blame individual browsers for the sub-par experience.

Even well-read and well-guided media application developers have bosses. Designing an app's media threading model to use workers, may not be something a few such devs will succeed at pushing for on their own (because of short-term costs). We have an opportunity to help them push for doing it right, and help end-users have better experiences, by making the right option the default option. This is what we're here to do.

@chrisn
Copy link
Member

chrisn commented Jul 20, 2021

One of the main arguments I've heard for deferring decision is a lack of use cases that require Window exposure. As somebody in the discussion said, it's likely any use case could be made to work in Worker context - even if that means transferring data between Window and Worker. This would mean that a decision to defer at this time leaves us without good criteria to later decide to expose in Window. This makes me concerned that a decision to defer becomes a decision we cannot re-evaluate later. What new information would you be looking for?

@koush
Copy link

koush commented Jul 20, 2021

One of the main arguments I've heard for deferring decision is a lack of use cases that require Window exposure.

As a consumer of WebCodecs, if it becomes worker only, I'll need to write a shim to expose it to window (or pass the data to worker), because WebUSB (the data source) is only available in window.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 20, 2021

+1 to @chrisn. I was just typing out the same point.

@jan-ivar it's hard to reconcile:

Wanting to defer a decision doesn't imply that. We've stated we're open to revisit this down the line. Why rush this?

Given the copious amount of feedback you've received from both inside and outside the work group indicating a worker restriction is problematic for performance and developer experience reasons. Combined with statements like:

We see this as critical to begin exposing realtime media to JS responsibly, and also as the best API surface. An API that helps inform what to do and how to do it right, is worth a thousand footgun warnings in documentation.

We are unsure what criteria would ever satisfy you. I.e., you seem hyper-focused on real-time use cases to a point that precludes discussion of other use cases and the performance costs of the shim (~2.68x memory usage for a toy example; 51mb -> 135mb!). Can you provide any criteria which would ever change your mind?

Even well-read and well-guided media application developers have bosses. Designing an app's media threading model to use workers, may not be something a few such devs will succeed at pushing for on their own (because of short-term costs). We have an opportunity to help them push for doing it right, and help end-users have better experiences, by making the right option the default option. This is what we're here to do.

Why is this outcome any more likely than those same developers just using a shim and resulting in an even worse experience?

@jan-ivar
Copy link
Member

This makes me concerned that a decision to defer becomes a decision we cannot re-evaluate later. What new information would you be looking for?

I believe the decision to reevaluate rests with the chairs. A deferral (unlike a no decision) might not technically even require new information to reopen (but check that). Would people feel better if we scheduled to revisit it, say a year from now?

A year from now, I'd expect there would be production sites to look at and measure, and even more widespread support for media sources and sinks in workers across browsers. If we find key use cases that are hurting, we can weigh the pros and cons of exposure then. We'll be in a better position to decide at that time than now.

In contrast, if we expose to main thread now, and a year from now we find this was a mistake, we won't be able to change it.

@jan-ivar
Copy link
Member

As a consumer of WebCodecs, if it becomes worker only, I'll need to write a shim to expose it to window (or pass the data to worker), because WebUSB (the data source) is only available in window.

Mozilla considers WebUSB harmful, so this use case is not compelling to us.

@surma
Copy link
Member

surma commented Jul 21, 2021

On the topic to defer the decision: I can see that it’s an attractive decision from a standards point, but — in my opinion — it’s at the cost of developers. Developers are already struggling with a lumpy web platform where some UAs support an API and others don’t. The resulting feature detection techniques and progressive enhancements are rather complex. If we now have to extend feature detection from yes/no to yes/no/yes-but-not-on-this-thread, I worry about the impact.

In contrast, if we expose to main thread now, and a year from now we find this was a mistake, we won't be able to change it.

Is there precedent that exposing an API in more places (in scope A in addition to scope B) has ever been considered a mistake? How did the mistake manifest? (Note that this is different from exposing an API in scope A instead of scope B.)

@AshleyScirra
Copy link

AshleyScirra commented Jul 21, 2021

The WebCodecs argument seems to apply equally to WebSockets:

  • Some web apps do real-time-critical networking with WebSockets
  • Those web apps ought to use WebSockets in a worker to avoid main thread jank
  • People might choose wrong, so we should force the decision for them, and only allow WebSockets in a worker
  • Non-real-time cases can still just post data back and forth from a Worker, maybe with a shim
  • So WebSockets should not be exposed on window

So was it a mistake to expose WebSockets on window? Surely not. Yes, some people will choose wrong, and do real-time-critical WebSockets on the main thread. That's a shame. But some people will jump through hoops to choose wrong, no matter what the API design is. To me, this applies equally to WebCodecs. It's not the place of API design to force decisions on developers who might have good reasons to choose differently.

@jan-ivar
Copy link
Member

If we now have to extend feature detection from yes/no to yes/no/yes-but-not-on-this-thread, I worry about the impact

I sense agreement against Balkanizing WebCodecs support, and I hope it doesn't come to that.

I'm going to ignore the WebSockets strawman and other lateral comparisons of pattern, since they ignore context.

The context is that:

  • canvas.drawImage aside, realtime video pipelines in browsers today are kept away from main thread for a reason.

  • The main thread is overworked already. For instance, we see people unable to type in gmail when meet is open in another tab in Firefox. This is because Firefox supports more tabs than Chrome by isolating sites by process, whereas Chrome (at least on desktop) appears to mint one process per tab regardless of site. This skews results of measuring one user agent in one environment. There's more than one user agent (and more than one environment, desktop) to consider.

In this context, the cautious and sensible approach to preserve the quality of end-user experiences across user agents and devices is to consider JS exposure to workers first, because this is closest to the environment we have today. Hopefully, no-one is surprised to hear that all browsers fire up background threads to handle media today.

There's precedence here with ScriptProcessorNode, and we'd never have AudioWorklet today if we'd applied some of the criteria cited here about things being "unprecedented". This API is unprecedented.

It's not the place of API design to force decisions on developers who might have good reasons to choose differently.

Sorry, but that's not so. There are many examples of this, like permissions. Nevermind that malicious web developers may also think they have good reasons. End-user experiences trump developer convenience in the priority of constintuencies.

I'm also having trouble reconciling the opposition with the few use cases that would seem to genuinely be affected. If so many people can't get excited about this wonderful new API in workers, it makes me wonder how many were planning on using workers in the first place.

@dalecurtis
Copy link
Contributor

@jan-ivar you are conflating rendering and buffering. WebCodecs is not a rendering API, its decoder outputs may be used in rendering, but it is not a rendering API itself. As you know the WebCodecs encoding and decoding pipelines are already detached from the main thread.

What you're trying to control by forcing worker only exposure for WebCodecs is where developers create their rendering pipelines. I.e., you're indirectly trying to use WebCodecs to force developers to operate their rendering pipelines in an OffscreenCanvas. It seems like you would be better served arguing for a worker restriction on WebGPU and OffscreenCanvas.

@surma
Copy link
Member

surma commented Jul 26, 2021

Hopefully, no-one is surprised to hear that all browsers fire up background threads to handle media today.

I thought that is the case, too! If it is, what is the reason to force developers to create yet another thread? (Apologies if this has been answered before, this thread has become... long 😅 )

@willmorgan
Copy link

For what it's worth, given the above discussion and points raised, I'd like to revise my original position of being anti-main-thread. I think they should be exposed on the main thread.

@jan-ivar
Copy link
Member

@dalecurtis this API isn't limited to Canvas and OffscreenCanvas.

... where developers create their rendering pipelines.

It's not just rendering. Critical realtime use case pipelines are going to be capture + send, and receive + playback:

  • worker: camera capture → MST processor → (remove background in JS) → WC encode → encrypt → WebTransport
  • worker: WebTransport → decrypt → WC decode → (ML in JS?) → MST generator → (transfer to) video.srcObject

Chrome intends to ship APIs to do this, so this isn't theoretical. These nonstandard APIs are riding along in origin trials of WebCodecs.

Touching main thread from these pipelines would block on main thread, a significant regression from status quo. Just because they may work on high-end devices in user agents affording one process per tab, doesn't mean our concerns over web compat and real-world jank aren't valid.

Looking at WebCodecs in a vacuum doesn't get it off the hook, because in all use cases, it will be part of some media pipeline, with inherent source/sink problems that are transitive, such as buffer exhaustion. Lower-end devices with smaller CPU caches / smaller number of buffers they can keep around in GPU memory, may suffer if we expose to unsuitable environments.

@dalecurtis
Copy link
Contributor

You can s/rendering/processing/ in my comment and my point remains the same: you're indirectly using WebCodecs to force an outcome upon developers. You could just as well be arguing that WebTransport, WebML, WebGPU, OffscreenCanvas, etc should be limited to worker only and your arguments are interchangeable. WebCodecs is just a convenient scapegoat. E.g., In both of your examples, limiting the MST processor and generator to a worker would achieve the same results.

Your last point also ignores CFC feedback and test data showing that low end devices sometimes suffer more with workers due to their constrained core counts.

@aboba
Copy link
Collaborator

aboba commented Jul 28, 2021

@dalecurtis It seems to me that restrictions (if they are to be imposed at all) should be imposed on the APIs where there is a demonstrated problem. Restricting WebCodecs or MST processor/generator to a worker in order to address a problem in WebML does not make sense to me. That's like looking for your keys under a lamp post because the light is better there.

@jan-ivar
Copy link
Member

jan-ivar commented Jul 28, 2021

In both of your examples, limiting the MST processor and generator to a worker would achieve the same results.

@dalecurtis That's w3c/mediacapture-transform#23, but as you can see, we're seeing similar resistance there.

But one can also compose near-realtime media pipelines around it. E.g.

  • worker: WebTransport → WC decode → composite participants together in JS → WC encode → MSE
  • worker: OffscreenCanvas → WC encode → WebTransport

Both WebCodecs and MST+ expose raw video frames for manipulation in JS, and should be limited to workers.

Restricting WebCodecs or MST processor/generator to a worker in order to address a problem in WebML does not make sense to me

@aboba WebML?? Oh, because I said "ML in JS?" Sorry, I meant that as a stand-in for any kind of processing or analysis including plain JS/WASM bit manipulation, be it face-tracking, compositing participants together, or lip/mood reading for accessibility.

But this confusion proves my point: APIs like WebTransport or WebML can be used on all sorts of data, and make no sense to restrict. It's specifically chewing on raw video data on the main-thread we want to restrict, so we restrict the access APIs (WebCodecs & MST+) that expose that data.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 28, 2021

But one can also compose near-realtime media pipelines around it. E.g.

  • worker: WebTransport → WC decode → composite participants together in JS → WC encode → MSE
  • worker: OffscreenCanvas → WC encode → WebTransport

Both WebCodecs and MST+ expose raw video frames for manipulation in JS, and should be limited to workers.

You've included WebCodecs to make your point, but your point is the same even if we remove WebCodecs. By your arguments you wouldn't want "WebTransport -> Processing -> MSE" or "OffscreenCanvas -> WebTransport" to be done on window. I.e., you're against any high bandwidth / expensive processing on window that could lead to a poor user experience.

@jan-ivar
Copy link
Member

jan-ivar commented Jul 28, 2021

Specifically (near) realtime raw video manipulation and framerate drop.

E.g. I'm not opposed to WebTransport -> decrypt -> MSE which operates on (much less) encoded data.

@alvestrand
Copy link

alvestrand commented Jul 28, 2021 via email

@jan-ivar
Copy link
Member

Yes, I mentioned that already. I was contriving a case without MSTP/MSTG to counter the claim we could just limit MSTP/MSTG.

@dalecurtis
Copy link
Contributor

Specifically (near) realtime raw video manipulation and framerate drop.

WebGL and canvas regularly operate at >= 60fps on the main thread without any queuing, so I assume they meet your definition for (near) real-time "video" manipulation. If your position is that those APIs shouldn't have shipped on window, I'll drop this line of reasoning. If it isn't, then I don't think your position on a worker-only restriction for WebCodecs is fully considered.

My point with the prior dialog is that WebCodecs isn't a source of main thread contention - it's fully detached from the main thread. Performance loss can only occur based on how the API is given input and where its outputs are processed. E.g., a user on a canvas drawing site with a busy main thread feeding a WebCodecs encoder in a worker will suffer in experience regardless of a worker restriction - possibly more.

I recognize that your goal with a worker restriction is to force developers to move their entire pipelines (potentially at the cost of web compatibility for the next N years) into a worker to avoid these issues, but lets be clear that it's an indirect mechanism without precedent for having the effect you're trying to achieve. It's certainly not clear to me why you think developers won't just workaround this limitation instead of embracing it (indeed, developers in this very thread have said they will just do that).

@kdashg
Copy link

kdashg commented Jul 29, 2021

Given the lack of support for even OffscreenCanvas, I think requiring Workers here is over-ambitious.
Maybe that'll change in the next year, but I think that it's over-ambitious for now.
Just as our browser implementation bandwidth is limited, so too is application development bandwidth.

@cynthia
Copy link
Member

cynthia commented Jul 29, 2021

It is not clear to me the TAG landed on a precise recommendation yet.

Group feedback:

The precise recommendation is to optimize for developer experience - which in this case is to allow on the main thread.

We did discuss this in a call and the overall consensus (albeit not unanimous) was to allow usage on Window.

We took a similar position for the autoplay API, where our recommendation was to do sync for developer convenience.

Personal feedback:

I believe for scenarios where main thread jank is an issue the developers would make good judgement and move it to worker. I don't think it is up to us to dictate their choice.

Finally, poorly optimized apps which jank will jank regardless of this API being there or not, since poor development practices are rarely isolated to media-specific calls.

If one wants to protect the main thread from compute intensive APIs, maybe we should look into heavy compute as a thing we should gate behind a permission?

@youennf
Copy link
Contributor Author

youennf commented Jul 29, 2021

If we step back a little, maybe we can get consensus on a few principles that might guide the design and shipping of the API.
Do we agree on the following?

  1. Applications processing realtime raw video (or realtime raw audio) should do processing (including WebCodecs) in a worker context
  2. Applications handling non-realtime raw video (or audio) may use WebCodecs in a window context

@chrisn
Copy link
Member

chrisn commented Jul 29, 2021

I have just posted a response from the Media WG chairs to the mailing list.

@dalecurtis
Copy link
Contributor

dalecurtis commented Jul 29, 2021

Thanks @chrisn, @jernoble, and @cynthia (on behalf of TAG) for your consideration on this long discussion.

From the response:

To encourage good practice, we strongly recommend that all code examples in the WebCodecs specification supporting code samples in GitHub show use of DedicatedWorker, and for developer articles (e.g, on https://web.dev) and MDN documentation to do similar, and explain the reasons for this.

I'll begin working on a PR to add some non-normative text around this (inline with the suggestion from #211 (comment)) to the specification and instruct the team to update the documentation and examples we've produced so far.

@alvestrand
Copy link

alvestrand commented Jul 29, 2021 via email

@chrisn chrisn removed the agenda Add to Media WG call agenda label Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).
Projects
None yet
Development

Successfully merging a pull request may close this issue.