-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How would Web Codecs support extracting PCM data for a specific time range? #28
Comments
Is the requirement to extract a specific time slice from the file without reading the file? |
@guest271314 |
@JohnWeisz Have considered and proposed similar functionality for video in brief see w3c/mediacapture-record#166, et al. If the structure of the file is the same throughout it should be possible to estimate where in the file 5 seconds is and where 10 seconds is. That is, given an |
@guest271314 First, Second, since you currently have no other option than decodeAudioData, you have to load the entire file into memory to process it through OfflineAudioContext. Technically it's a solution to play the file back through |
Another option is to perform the task of creating time slices (in seconds or bytes) exactly once then you will have the ability to serve and merge any portion of the media thereafter. |
The entire file does not need to be loaded into memory for The simplest approach (though does require loading the entire file) would be to use |
Yes, you can read the entire file using |
How to you propose to read the file to determine where a specific time slice is without reading without reading the entire file? Only read the metadata parts, which could be anywhere in the file, depending on which application encoded the media? Encoder parameters are not necessarily consistent between applications, e.g., Chromium and Firefox implementations of |
Yes, we precisely want to avoid reading the entire file, and instead read only some chunks from it. This currently is only possible (somewhat) using an
This example demonstrates a theoretical API for extracting audio PCM samples from 5 seconds to 10 seconds, from a source Blob, even if it's a 30 hours long audio file behind that Blob. Now I understand many formats don't support fully random access behavior, and have to be at least scanned through before knowing what byte offset to even start looking at. However, that doesn't mean the entire file has to be read into memory, and especially not entirely at once. The implementation of my proposal could handle all of this behind the scenes. |
That said though, there is definitely going to be a solution if this proposal ends up supporting non compressed output formats. You could simply convert to, say, a WAV, and then easily read chunks from that. |
At
In that case you can still utilize Media Fragment URI specification
fetch the Blob URL, convert to |
How do you get |
Either from an |
Then the application would not know beforehand the container. The entire file would need to be read to get the metadata and/or extract the underlying audio and/or video from specific time slices of the media. Unless WebCodecs develops a parser for each container that could be potentially used to extract and re-encode the required time slices of media, e.g.,
(or Media Fragments URI) could be used to to playback (and re-record if necessary using |
@guest271314 If the Web Codecs proposal could read the file in its native implementation efficiently, by streaming through the file, we could get proper access to streaming PCM sample data access without having to keep the entire file in memory. |
So, what I propose, implementation-wise, is that:
|
How would that be possible? Is the requirement to only play back the media fragments? Or to also offer the individual media fragments extracted for download? |
This is an interesting question, for me, personally, simply having access to sample data in Float32Array would be sufficient, but I can see other possible uses here. For the record, my use case is accessing waveform data for navigable visualization purposes (i.e. drawing parts of a waveform). |
The exact same way as e.g. playing back audio from a specific timestamp is possible, only in this case, the decoded audio data is offered in JS-compatible container objects, instead of being written to the output buffer directly. |
Technically
Though how do you know where you are in the file without metadata?
The file metadata is required for the ability to seek. You could use range requests, though still metadata is needed and this is client-side. |
You could test and estimate the total duration and where you want to extract from if you are only dealing with |
Yes, however, the metadata can be mapped without requiring the entire file be loaded in memory. That said though, I think we kinda misunderstand each other, the problem here is a missing web API. We want to get random access to audio sample data, and we don't want to load the entire audio into memory. Currently:
|
You can use |
The problem is that |
In case why this isn't clear as a problem: Imagine you have a 20 hour long audio file, and you want to get 2 minutes of sample data from it somewhere in the middle. Currently, you can either:
None of these are convenient, or even feasible in most cases. |
The restriction of
is not technically possible, particularly with a If the media is finite, which a 20 hour media file is, and the encoded media is consistent then the exact points which need to be accessed can be calculated mathematically. The rudimentary algorithm would be to divide the total duration by the number of total frames. Once the partitions are known you can extract any given part of the set mathematically. If the encoded media is variable the exact portions can still be extracted by compensating for the differences between discrete variable rate encoded media. Thus, the concept of
is not viable in the first instance as a Yes, the simple solution would still be to use Media Fragments URI with an The alterantive solution would be to calulate the total number of samples or frames within the finite amount of sample or frames then extract the required time slices mathematically. For example,
https://plnkr.co/edit/Inb676?p=info Alternatively the frame duration can be calculated dynamically, instead of storing the frames separately, concept courtesy of @thenickdude
https://plnkr.co/edit/ThXd9MKYvEYq2kKyh8oc?p=preview Each approach outputs similar results. One reads all the frames and calculates variable frame duration first then writes the file. One reads and writes the frames at the same time. Given the current requirement you can calculate where
mathematically, set that estimated index as start and use the previously calculated frame or sample duration to determine the ending index. Set the included values to a separate Using either example it is possible to get the total number of frames in the media, or The If what you are proposing is for an API to parse any file potentially containing any content and any possible variable sample rate or frame rate, the program has no way to determine what the file contains without reading the entire file. The question then becomes what the most efficient means of extracting specific time slices of media from a media file containing unknown content is. "Efficient" is a difficult term to substantiate. All tests need to be performed on the same machine to have any significance. Online tests of "efficiency" are useless without the complete hardware and software listed. Even then the results could vary substantially due to technical limitations wholly unrelated to the program. "Efficient" would need to be clearly defined for the proposal, and exactly how "efficiency" is evaluated. Unless you are suggesting that there is a means to extract specific time slices from a file containing unknown media content, though having a finite content length without reading the entire file? If so, can you describe the algorithm that you are suggesting achieve that requirement? |
I appreciate your thoughtful response, but I think you are missing the point. A native implementation could easily stream through the file chunk-by-chunk to work with whatever needs to be done with the file, instead of first reading the entire file into memory and then operate on the in-memory data. And since decoding/demuxing is already available in native implementations, this could be used to stream sample data through a JS-enabled API, precisely how the same audio data is currently streamed to the output buffer to play back audio using an Or am I missing something here? |
That is already possible using implemented JavaScript APIs. If you prefer you can use
That is already possible. Have you actually tried using
that statement needs absolute clarity. Kindly define what you mean by "instead of reading the file into memory". If the file is a It appears that you are expecting a JavaScript API to be able to extract specific time slices from a media file without reading or relying on the file metadata. Though you have not provided any formal algorithm reflecting how that will occur. As described above, it is mathematically possible to extract any part of a "file" by determining the total number of samples, or frames, and the total duration. |
Yes, you are correct that this can be entirely done with JS, and it's not even particularly challenging with e.g. uncompressed WAV. You can slice up Blobs/Files and read the slices with async FileReader, then operate on the individual chunks, without requiring the entire file to be kept in memory. In this case however, especially with more complex formats, you have to re-create complex decoders and demuxers in JS code, while they are already available natively in virtually every single browser, only they are unusable for the task at hand, because there are no JS-enabled APIs to use them. |
It does, but you cannot access the decoded sample data (PCM) without recording the stream at a real-time speed, which is way too slow for many applications. |
But I still feel we are going in circles here, so let me try put the original title question into an alternative phrasing: does WebCodecs plan to offer a way to convert a part of a source file? Like If so, given sufficient performance, it could be used to perform a conversion of a chunk to an uncompressed format, and then acquire PCM data from that in a relatively easy way. |
Am not certain what you mean by "access the decoded sample data". That goes back to whether the requirement is to playback or offer the media for download or other purpose.
If the claim is that |
There are various other uses for having access to sample data, including offline audio analysis and static audio visualization.
Mind sharing what these ways are? Obviously, you can make a request to a server with the possibly several GB sized audio file from a web browser (not really efficient), or execute a command in elevated environments, such as electron (limited to web-based desktop applications only). Or alternatively, you can compile ffmpeg to JS and use that, but that's again inefficient and without sufficient changes, you are again operating on in-memory contents. So I'm really interested in how this is possible.
I'm afraid this is beyond my current knowledge without actually digging into ffmpeg myself. I'm requesting/proposing an API, not an exact implementation. What I know from experience is that ffmpeg can convert very long chunks of audio without taking up several dozen gigabytes of RAM like virtually every single JS-based solution. |
In reverse order of the points addressed in you previous comment
is your claim. It is your responsibility to demonstrate in a minimal verifiable example, or provide the primary source basis for the claim if you have not reproduced the output yourself, that your claim is true and accurate and can be reproduced. At least attempts to produce the expected output - utilizing any approaches - are necessary, for your own edification and substantiation. Else the basis for the claim itself must be pure speculation as to FFmpeg being capable of performing a specific operation - until proven otherwise.
You essentially covered two of the possibilities.
Another solution is to use Native Messaging to pass execute a native command, optionally passing values to the command. An example of such an approach is described and implemented at https://github.com/guest271314/native-messaging-mkvmerge. -- Am currently considering experimenting with an approach using Native File System and
which should provide the same result at main thread substituting the need to use some form of native script to observe changes to a file or directory for using Native messaging. |
The Native Messaging approach is very interesting. In the meantime I looked briefly into the ffmpeg matter and while I didn't yet dig into the part of the implementation where streaming-based conversion is done, see https://stackoverflow.com/questions/7945747/how-can-you-only-extract-30-seconds-of-audio-using-ffmpeg which at least explains how partial conversion is used. The way ffmpeg accomplishes this is two-fold:
See https://ffmpeg.org/ffmpeg.html This is very similar to how HTMLAudioElement does playback, with the major notable difference that playback is done faster than realtime, and the decoded audio stream is written to a file. About
Indeed, as I said this is technically possible, but either rather inefficient, or extremely limiting. Compiling FFmpeg to JS duplicates logic that's already available natively, and requiring a local server does not allow a public web-based application from accessing this functionality (as I said, you can do this on your public server, but then you have to upload/download the conversion source and result, which is a huge payload). |
The FFmpeg approach at the SO answer does not indicate that program achieve the requirement
The procedure resembles using
and
e.g.,
at Mixing two audio buffers, put one on background of another by using web Audio Api. |
To answer the original question, this would be achieved by decoding an EncodedAudioChunk corresponding to the time range you're interested in. The decoder would output an AudioFrame which contains a timestamp and an AudioBuffer (type from Web Audio), containing planar float 32. Some handy spec links https://wicg.github.io/web-codecs/#audiodecoder-interface This discussion covered a lot of topics, but I think this answers the main question. I'll go ahead and close this so. Feel free to file new issues with specific follow up questions. |
Hey,
As you are likely aware, there is a huge and painful limitation in the Web Audio API, and accessing only a specific time range of audio sample data is not possible in any remotely feasible fashion without jamming the entire audio file into memory.
We are looking forward to finally getting this limitation behind using Web Codecs. Are there any plans for somehow supporting extracting raw PCM audio data from a specific time range, say from 5 seconds to 10 seconds? (obviously given an audio file not shorter than 10 seconds for this specific example)
The text was updated successfully, but these errors were encountered: