Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow request/response.blob() to resolve before reading the full stream #556

Closed
jakearchibald opened this issue Jun 20, 2017 · 12 comments
Closed

Comments

@jakearchibald
Copy link
Collaborator

From #554 (comment) by @asutherland:

The blob() case is more efficient by far in terms of what the browser can and does optimize. Speaking for Gecko, the blob() call only creates a handle to the underlying file on disk. No reads of the file's contents need to be performed before returning the blob. This handle can be given to the new Response and passed across processes, allowing the target process to perform just the 1024-byte read directly, and without needing to involve the ServiceWorker or its global.

This is against the current spec which requires reading the full stream, but seems like a nice optimisation.

This should be limited to bodies with a known size, which means items with a Content-Length header, or items backed by disk (cache API).

The blob should enter an errored state if the body doesn't eventually match the size of the blob.

@wanderview
Copy link
Member

FWIW, while we do this optimization for blobs in IDB/etc, I don't think we do it for Cache API in gecko yet.

@jakearchibald
Copy link
Collaborator Author

I guess there's less need for this if no one does it already. w3c/ServiceWorker#913 is maybe still the best fix.

@wanderview
Copy link
Member

The problem with the blob optimization here is that its different semantics to a ReadableStream. A blob you can random access read in any order, etc.

What if we just added some kind of "skip" operation to ReadableStream? This would let you efficiently seek forwards to the part you want without necessarily loading it all into memory.

giphy

@domenic
Copy link
Member

domenic commented Jun 20, 2017

Although upon seeing this issue initially I was sympathetic, thinking that the spec should not prohibit interesting optimizations, now I am not so sure. It really comes to down the question of, what does response.blob() fulfilling mean? Or, from another angle, what does having a Blob object in hand mean?

I see two possibilities:

  • It means you have a handle to some data that could in theory be read, perhaps even successfully, by the browser/the FileReader API/etc.
  • It means that you have some assurance that the data is a known, readable quantity, that (modulo transient filesystem errors) can be read successfully at any time.

If we say that having a Blob/a fulfilled response.blob() promise means the latter, then we really do need to wait to read the response stream to the end first.

What if we just added some kind of "skip" operation to ReadableStream? This would let you efficiently seek forwards to the part you want without necessarily loading it all into memory.

I feel like we discussed this before. I am amenable. IIRC last time we talked the idea was the default implementation would throw away data but the underlying byte source could provide a specialized skip operation to be more efficient.

@jakearchibald
Copy link
Collaborator Author

Yeah, I think optimising blobs here is pretty complicated, I was only driving it when I thought it was something Firefox already did.

The right solution is fixing range requests, and maybe a skip/advance method on streams.

@wanderview
Copy link
Member

I feel like we discussed this before. I am amenable. IIRC last time we talked the idea was the default implementation would throw away data but the underlying byte source could provide a specialized skip operation to be more efficient.

It would be nice to expose a hook to js ReadableStreams as well. They may be able to do something smart depending on what their underlying source really is. If its computed they can just compute ahead, etc.

@domenic
Copy link
Member

domenic commented Jun 20, 2017

It would be nice to expose a hook to js ReadableStreams as well. They may be able to do something smart depending on what their underlying source really is. If its computed they can just compute ahead, etc.

Yeah, that's definitely what I meant; a new method you pass to the constructor.

@asutherland
Copy link

Apologies for the confusion. I oversimplified and oversold the awesome powers of Blobs in that issue. Gecko only ever hands out Blob instances when the entire contents are available. There are currently no plans to speculatively return Blobs before all the data is received. Having said that, there is the edge-case w3c/FileAPI#47 wherein Gecko will throw an error on reads if the backing file selected from input[type=file] has had its file size change since selection.

Clarifying Gecko optimizations: As @wanderview points out, Gecko's Blob optimizations do not currently apply to Response.blob(). Its implementation is naive and treats all responses like they came over the network, consuming them in their entirety. Its returned Blob will be file-backed if it was large enough to spill to disk (and privacy/security settings allow), but it will be a new file on disk that is distinct from the one stored by the Cache API. Having said that, we're not far from being able to implement such an optimization, but it definitely will not return the Blob until the response has been received in its entirety (making them fully-written "incumbent records" instead of in-progress "fetching records").

@yutakahirano
Copy link
Member

ReadableStream based solution sounds good.

@guest271314
Copy link

guest271314 commented Jun 24, 2017

@jakearchibald Was able to set Range request header to bytes=0-<number less than or equal to Content-Length> at Request() object. Media is played back to specific range when set at .src of HTMLMediaElement. However, when first portion of bytes is greater than 0, no media is played back.

When 0 is set at first portion of value of Range header media plays back

let headers = new Headers();
let range = 1024 * 8 * 128;
headers.append("Range", "bytes=0-" + range); 
let request = new Request(url, {headers:headers});
fetch(request)
.then(response => response.blob())
.then(data => audio.src = URL.createObjectURL(data))

When a number greater than 0 is set at first portion of value of Range header media does not plays back

headers.append("Range", "bytes=1-" + range);

http://plnkr.co/edit/vci20DGSjX1fAjrHOcOY?p=preview

Are you aware of the reason for this result?

@beaufortfrancois
Copy link

audio.error.message says "DEMUXER_ERROR_COULD_NOT_OPEN: FFmpegDemuxer: open context failed". and this doesn't help...
However, I guess that the first byte (missing) is needed to read properly metadata from the audio file hence why it is failing to read.

@jakearchibald
Copy link
Collaborator Author

Seems ok for this to be an implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants