Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: DecodeVideoData API, exposed at Window and Worker #3593

guest271314 opened this issue Mar 27, 2018 · 1 comment


Copy link

commented Mar 27, 2018

Motivation: Science Commons, A Shared Culture

Related: #2824

MediaSource provides the functionality to utilize an HTMLMediaElement stream media to the element. MediaSource does not expose the SourceBuffer where media is appended as ArrayBuffers, see w3c/media-source#209.

<canvas> provides the ability to draw images, for example from a <video> to a display. Images drawn onto a <canvas> can be captured using HTMLMediaElement.captureStream(), which returns MediaStream containing a single MediaStreamTrack of the live capture of the <canvas>, see Record almost everything in the browser with MediaRecorder.

MediaRecorder does not currently provide a default means by which to record media fragments from more than one MediaStreamTrack by default, see w3c/mediacapture-record#147, Real time front-end alchemy by Soledad Penadés.

While it is possible to record multiple media fragments to a single .webm file using canvas.captureStream() and AudioContext.createMediaStreamDestination(), as demonstrated by author of the Mozilla article and video linked above, and by @Kaiido at their Answer to this Question answer How to use Blob URL, MediaSource or other methods to play concatenated Blobs of media fragments?, reduced to the bare essentials to described the expected result of using MediaSource with MediaRecorder(recordArbitraryMediaFragmentsToASingleVideoWithCanvasCaptureStreamAndAudioContextMediaStreamDestination.html.

This proposal essentially has these goals, as noted at w3c/media-source#190 (comment)

  1. Request only media fragments (using for example Media Fragment URI);
  2. Server responds with only media fragments (this does not currently occur and is, perhaps, outside the scope of this issue);
  3. We should now have the options to
    a. render playback of media fragments individually
    b. render playback of media fragments as a single ("seamless") media stream ("media stream" should be clearly defined here; are referring to MediaSource or MediaStream or rendering at <canvas>, etc.)
    c. create a single file of the media fragments as a single media file (for example, for download);
    which incorporates both immediate usage of the media and later usage of the media;
  4. The codecs, etc. of the original files are arbitrary; we cannot guarantee that the original individual media files having the content intended to be rendered as a single "stream" have the same file type; as long as the media is served with appropriate CORS headers an the <video> element at the browser can play the file type, we should be able to re-arrange the content in any conceivable manner at the browser - without using third-party polyfils or libraries;
  5. We should be able to get, re-arrange and concatenate the media fragments in any order for 3.a. into a single file - without having to first play the files - and possibly faster than the time that would be necessary to play the media fragments individually.
  6. The functionality of 1-5 should be able to be processed outside of a document; for example at a Worker thread; as we want to splice, re-arrange, concatenate, merge the media content as raw data, then either offer the file for download, stream the media as the normalization is occurring, or post the resulting media to a document for rendering.

The code at creates a .webm file by using a webp image. The code at is capable of creating seekable files (which Chromium does not currently achieve, see;

AudioContext provides a means to create an AudioBuffer from example, an .mp4, .wav, .ogv or .webm files containing only the media fragments (time slices) set at AudioBufferSourceNode.start from, BaseAudioContext.decodeAudioData, and OfflineAudioContext.startRendering without rendering the media to an output device

1.3. The OfflineAudioContext Interface
OfflineAudioContext is a particular type of BaseAudioContext for rendering/mixing-down (potentially) faster than real-time. It does not render to the audio hardware, but instead renders as quickly as possible, fulfilling the returned promise with the rendered result as an AudioBuffer.

The AudioBuffers created can be concatenated into a single AudioBuffer or WAV file, without playing the media.

A DecodeVideoData method should be capable of being utilized at either a Window or Worker to process specific time slices (media fragments) of one or more videos, possibly having different media containers and codecs, and produce an ArrayBuffer, or preferably, a .webm or other file, as fast as possible, without rendering the media to an output device (or playing the file at an internal media player).

For example, we should be able to perform this procedure with video, resulting in an ArrayBuffer or .webm file of the concatenated video data of the requested

  let urls = [{
    src: "video.ogv",
    from: 0,
    to: 4
  }, {
    src: "video.webm#t=10,20"
  }, {
    from: 55,
    to: 60,
    src: "video.mp4"
  const audioContext = new AudioContext();
  let numChannels = 2;
  let sampleRate = 44100;
  let media = await Promise.all({
    }, index) => {
      const url = new window.URL(src);
      // get media fragment hash from `src`
      if (url.hash.length) {
        [from, to] = url.hash.match(/\d+/g);
      return {
        buffer: await fetch(src).then(response => response.arrayBuffer()),
  function appendBuffer(buffer1, buffer2) {
    var numberOfChannels = Math.min(buffer1.numberOfChannels, buffer2.numberOfChannels);
    var tmp = (audioContext || new AudioContext()).createBuffer(numberOfChannels, (buffer1.length + buffer2.length), buffer1.sampleRate);
    for (var i = 0; i < numberOfChannels; i++) {
      var channel = tmp.getChannelData(i);
      channel.set(buffer1.getChannelData(i), 0);
      channel.set(buffer2.getChannelData(i), buffer1.length);
    return tmp;

  const offlineBuffer = async({
      from, to, buffer
    }) => {
      console.log(from, to, buffer);
      let context = new OfflineAudioContext(2, (to - from) * 44100, 44100);
      let audioBuffer = await audioContext.decodeAudioData(buffer)
        .then(ab => {
          let source = context.createBufferSource();
          source.buffer = ab;
          source.start(0, from, to);
          return context.startRendering()
      return audioBuffer;
  // audio buffer of concatenated media fragments 
  let audioBuffer = (await Promise.all(;

We can use lines 102, 106, 142, 157, 164 and 170 of to create a WAV file from the AudioBuffer.

Am not sure how to compose EBML from scratch, nor how difficult it will be to achieve the requirement, though we should be able to achieve the same procedure with a DecodeVideoData API with video, to create a video collage from multiple media fragments having dissimilar media containers and/or codecs; without the need to draw the video to a <canvas>, play the video at a <video> element or use MediaSource, MediaStream and MediaRecorder, as those API's were not intended for the purpose of concatenating multiple media fragments into a single file without first playing the media.


This comment has been minimized.

Copy link

commented Mar 27, 2018

Given all the other efforts already ongoing in this area, I don't think we should pursue this here. There's no particular need to tie this to HTML. It seems you already raised issues on those other repositories, so let's see how those play out first.

@annevk annevk closed this Mar 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.