Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Implement OfflineMediaContext #2824

Open
guest271314 opened this issue Jul 9, 2017 · 16 comments

Comments

@guest271314
Copy link
Contributor

commented Jul 9, 2017

Implement an OfflineMediaContext, modeled on OfflineAudioContext, to fetch media resource and create independent media fragments of a given range of bytes or time slices as fast as possible, capable of being played individually.

Ideally the necessary parts can be implemented internally, without having to use MediaRecorder.

Proof of concept

    class OfflineMediaContext {
      constructor({
        url = "", timeSlice = 1, from = 0, to = 15
      }) {
        this.duration = 0;
        this.blobURL = void 0;
        this.blob = void 0;
        this.url = url;
        this.timeSlice = timeSlice;
        this.from = from;
        this.to = to;
      }
      async getMedia() {
        this.request = new Request(url);
        this.mediaRequest = await fetch(this.request);
        this.blob = await this.mediaRequest.blob();
        this.blobURL = URL.createObjectURL(this.blob);
        this.media = document.createElement("video");
        return new Promise(resolve => {
          this.media.onloadedmetadata = () => {
            this.duration = Math.ceil(this.media.duration);
            console.log(this.media.duration);
            resolve(this)
          }
          this.media.src = this.blobURL;
        })
      }
      processMedia(blob, index) {

        console.log(blob, index);

        return new Promise(resolve => {

          let data, recorder, chunks = [];

          const media = document.createElement("video");

          media.onpause = e => {

            console.log(e);
            recorder.stop();
          }

          media.oncanplay = () => {
            media.oncanplay = null;
            media.play();

            let stream = media.captureStream();

            recorder = new MediaRecorder(stream);

            recorder.ondataavailable = e => {
              console.log("data event", recorder.state);
              resolve(e.data);
            }

            recorder.onstop = e => {
              console.log(e);
            }


            media.play();
            recorder.start();
          }

          if (index + 1 < this.duration)
            media.src = `${blob}#t=${index},${index + this.timeSlice}`;
          else
            media.src = `${blob}#t=${index}`;
        })

      }
      startRendering() {
        return Promise.all(
          Array.from({
              length: this.to
            }, () =>
            this.processMedia(this.blobURL, this.from++)
          )
        )
      }
    }
    const video = document.querySelector("video");
    
    video.oncanplaythrough = () => {
      console.log(video.duration)
    }

    const url = "https://nickdesaulniers.github.io/netfix/demo/frag_bunny.mp4";

    let mediaContext = new OfflineMediaContext({
      url: url
    });
    let mediaResponse = mediaContext.getMedia();
    let mediaChunks = mediaResponse.then(() => mediaContext.startRendering())
      .then(chunks => {
        console.log(chunks);
        
        let select = document.createElement("select");
        document.body.appendChild(select);
        let option = new Option("select a segment");
        select.appendChild(option);
        for (let chunk of chunks) {
          let index = chunks.indexOf(chunk);
          let option = new Option(`Play ${index}-${index + mediaContext.timeSlice} seconds of media`, index);
          select.appendChild(option)
        }
        select.onchange = () => {
          video.src = URL.createObjectURL( chunks[select.value] )
        }
      })

https://github.com/guest271314/OfflineMediaContext

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2017

@guest271314 this reads a bit "I've made a thing now pls add it to the platform". If your library above does the job, why does it need to go in a standard? If your library can't quite do what it intends to do, what's missing?

The extensible web means we'd rather solve the lower level limitations than check a high-level library into browsers (if that's the choice).

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2017

@jakearchibald Did not ask "please". The approach does not do the job adequately, as described at original post. What is missing is 1) tab crashing when N number of Blobs have been created; 2) needing to use two different approaches for Chromium and Firefox; 3) imprecise time slices of Blob withing chunks array; we could use granular control of the resulti 4) MediaRecorder is a helpful technology - particularly as a media container - though difficult to get precise result as to expected .size of Blob - as dataavailable is dispatched once after .stop()is called; 5) why do we needMediaRecorderto achieve this, anyway?; 6; we can incorporateReadableStreamto avoid the codec limitations ofMediaSourceto render media chunk as the stream is occurring; 7) the media still plays - we don't need it to play while theBlobs are being generated; 8) ability to specify a chunk of media from any valid position of original media - instead of simply using Array.from()`'s callback; again, granular control of result.

Yes am asking for low-level help for the missing parts - or parts which could be improved at a low-level, posted due to lack of experience at C++, for adjusting code at source, or else would just do it myself and create a custom build of chromium.

The code is a proof of concept. We have OfflineAudioContext, why not offline media context for video as well?

If the interest or need is not there, the interest is not there.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2017

@jakearchibald Have previously used response.body.getReader() to read and generate a 189MB file for download. Not sure why tab is crashing when 30 Blobs are generated totaling less than 10MB?

Another way to view the issue is as a question: How to generate a discrete file capable of being played independently from a range request, having content length of requested range?

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2017

How to generate a discrete file capable of being played independently from a range request

Does w3c/ServiceWorker#913 solve this?

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2017

@jakearchibald It may. Are media headers or media codecs necessary to be included within response at ServiceWorker to render the media fragment at HTML? How can we show and prove the result for certain? Tried to create a plnkr of your work related to w3c/ServiceWorker#913. Not sure if the changes made are consistent with what your aim was at that thread. As you streamed the entire video as requirement, yes?

Can you put together a piece of code or plnkr where fetch() or fetch() and ServiceWorker are used, and response from a request is a range other than 0-N bytes, i.e.g., from 20 to 25 seconds as a time measurement of media or content 20000 to 100000 relevant to Content-Length as a bytes measurement of original media resource, using Media Fragment URI, or other means; where the response is only the requested range, is a distinct resource itself, and can be immediately played at a media element?

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2017

I can't create a live demo of this since it isn't implemented, but it would work like this:

  1. Media element generates range request (due to fragment or user seeking).
  2. Request lands in service worker.
  3. Either event.respondWith(fetch(event.request)) or event.respondWith(caches.match(event.request)) returns the appropriate partial response.
@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2017

@jakearchibald Then evidently w3c/ServiceWorker#913 does not currently solve the present issue.

Do we need a boundary string HTTP/1.1 Range Requests within response to play 20 through 25 seconds?

Perhaps a viewer with experience as to the essential parts of necessary media resource headers will chime in.

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2017

evidently

I'm struggling with this because I don't feel like you've stated the actual problem. All I've got so far is you'd like to fetch a temporal range. Can you describe:

  • When you'd want to do this.
  • The format of the result.
  • What you'd like to do with the result.
@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2017

@jakearchibald 1) Anytime; 2) Any format which can be immediately played at a media element without the other portion of the original resource; for example if the original file is .mp4, though need to be converted to .ogg to get 5 seconds of media as a discrete file, that is ok; 3) Any number of uses for media - a) play the media fragemnt immediately; b) send the media fragment as Blob or ArrayBuffer to another user or service; c) concatenate the media fragment with another, unrelated media fragment to create a mix of media.

The primary issue is range request returns expected result for range bytes=0-1024*1024, though not bytes=1024*1024-1024*1024*32. And with current workaround more than 30 generated Blobs crash tab of browser.

The actual problem is: How to request and get any segment of a media resource as a distinct and discrete resource, capable of being played back without reliance on the other portion of the media resource.

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 11, 2017

It feels like you're deliberately concealing details, but we could just be talking past each other.

  1. Anytime

I think it's clear that I'm looking for something specific. For instance, if the question was "When would the browser want to make a ranged request", the answer could be "To obtain metadata at the end of the file without downloading the whole file". Can you follow that example and answer with specifics? When I say "when" I'm not looking for a time like "early evening" or "just after breakfast" 😄 .

  1. Any format which can be immediately played at a media element

Ok, so you must mean an HTTP response that represents a fully-contained, but spliced media resource? If the resource is cut outside of a keyframe, I assume the browser will have to re-encode the media? What encoder settings should be used for this re-encode (quality etc)? Should this be standard across browsers?

c) concatenate the media fragment with another, unrelated media fragment to create a mix of media.

If the HTTP response you've generated is a full container, are you sure you can concatenate two to produce a media resource which is one after the other? Is this true of all container and track formats the browser could use?

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2017

@jakearchibald

It feels like you're deliberately concealing details

No, not deliberately concealing details. Created a repo and demo to illustrate the concept attempting to convey.

but we could just be talking past each other

That could be possible. Though since we are aware of the possibility we should be able to move beyond that stage of communication.

When I say "when"

Not sure what you mean by "when", here

Ok, so you must mean an HTTP response that represents a fully-contained, but spliced media resource?

Yes. Though if that is not possible, once the full media resource is returned as a response, create a fully-contained resource reflecting the spliced media sought, whether the parameters are a time slice or a byte range.

I assume the browser will have to re-encode the media?

Yes.

What encoder settings should be used for this re-encode (quality etc)?

To begin with, whichever encoding returns expected result. Then we can work on quality.

Should this be standard across browsers?

Ideally, yes.

If the HTTP response you've generated is a full container, are you sure you can concatenate two to produce a media resource which is one after the other?

Well, not entirely sure as to exacting time slices using approach at https://github.com/guest271314/OfflineMediaContext. As MediaRecorder events can be challenging to capture. Again, dataavailable event is dispatched again once .stop() is called. Tried two different approaches using JavaScript above. The first approach, which is what the code above is, passes event.data at initial dataavailable event. If we use .start(1000) to get a 1 second time slice, dataavailable will be called again when .stop() is called; we would need to push two event.data Blobs to an array, then pass the array to resolve(), which, when tried, results in a media fragment over 1 second. We want granular control of the resulting media fragment.

An attempt to concatenate individual Blobs, which does not return expected result http://plnkr.co/edit/KnEXCBizDkkHgkTB6jyt?p=preview. Yes, we want the ability to concatenate media fragments where when played at HTMLMediaElement the rendering will be the same as if the .src was set to URL of full media resource.

@jakearchibald

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2017

An attempt to concatenate individual Blobs, which does not return expected result

Ok, so your proposal that the browser should return response objects that represent media files doesn't support one of the use-cases you've laid out, so I guess you're looking for something else.

You still haven't given a full use-case, so I'm going to come up with one, and you can correct it if need be.


The aim is to build a web-based video editor that takes multiple sources and outputs a single resource.

  • Sources may be multiple different formats.
  • Sources may be local or network.
  • A source may be a temporal slice of a larger source, and it should be possible to represent this slice without downloading the full source.
  • The result can be output as a fully encoded and packaged media resource.

If the use-case is correct (and again, I'm guessing from the bits of information you've given me), the low level features seem to be:

  • A representation of a media resource by url, than can:
    • Seek to precise points.
    • Provide raw image & audio data in a way that can be iterated.
  • A streaming media encoder that:
    • Takes metadata for the overall format.
    • Takes frame by frame image and audio data & outputs an encoded media resource.

The idea is you'd be able to read from multiple media sources in raw formats, modify the image data using canvas, audio data using web audio, then feed them to the encoder for final output.

This system avoids the CPU overhead and generational quality loss you'd suffer in your proposal, as slicing doesn't automatically incur encoding.

The "representation of a media resource by url" sounds like <audio> or <video>, so it could be an extension of that. Although maybe it should be possible in a worker.

The "streaming media encoder" seems a little similar to MediaRecorder, but MediaRecorder seems to be built for realtime recording, whereas a video editor output should be able to be generated faster or slower than realtime.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2017

@jakearchibald Impressive.

It is challenging for two or more observers to interpret the same phenomenon in the same manner from different vantage points. Your description is very close, if not equal in practical application, to what was attempting to describe. That is a cohesive write up that may not have not able to convey, here, at this stage of own development in technical writing.

Probably do not want to add or subtract from your composition, for concern of not being as clear as you have been. Though should now also include that the implementation should be possible using either an HTMLMediaElement, Fetch, or Worker. And that, if possible, we would want to be able to stream the output, or have the ability to pass the encoded media fragments or completed output to ReadableStream or WritableStream to produce streaming media.

@guest271314 guest271314 changed the title Proposal: Implement OfflineMediaContext proposal/addition needs implementer interest Proposal: Implement OfflineMediaContext Jul 15, 2017

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2017

@jakearchibald A state of the art working example using existing browser technologies to merge discrete media fragments into a single media file, courtesy of Kaiido.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2017

@jakearchibald fwiw what have composed so far https://github.com/guest271314/recordMediaFragments, with much of the credit going to https://github.com/legokichi/ts-ebml. Firefox implementation has several issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.