Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding generic seek or read/write-at-offset abilities to readable/writable streams #1128

Open
domenic opened this issue May 26, 2021 · 4 comments
Labels
addition/proposal New features or enhancements

Comments

@domenic
Copy link
Member

domenic commented May 26, 2021

A lot of the streams ecosystem so far has focused on network and device streams, where data is definitely sequential and read in order. However, File System Access and the Storage Foundation API (which will hopefully soon become an extension of File System Access) operate on file system streams, which are slightly different. In particular, a common operation on file system streams is random access, i.e. reading/writing to a specified offset.

One could imagine using a separate API for random access, and leaving streams only for the kind of streaming sequential reads/writes that they're already good at. But this feels like a bad outcome. It would result in two similar APIs side-by-side, e.g. read/cancel on stream readers, and read/cancel/seek on random access readers.

Instead we could imagine augmenting stream readers/writers to support this use case. If the underlying source/sink advertises seeking or random access support, then the corresponding reader could expose that capability. Most streams on the web platform today would not support random access. (E.g., seeking a HTTP response doesn't make much sense. Except maybe seeking forward?) But file system streams, and maybe blob.stream(), could support it.

There are a few API details that come to mind:

  • Is reader.seek(offset) the right API, or should it be something like reader.read(view, { offset }) or reader.readAt(offset, view)? (That's the BYOB case; omit the view for the default reader case.) This seems like a big fork in the road that affects other parts of the API. E.g. if it's a seek-type API, then we need to consider how to queue up the seeks vs. the reads/writes/read requests, or how reads/writes advance the "current position".

  • Should this be done via adding a seek() or readAt() method to ReadableStreamDefaultReader / ReadableStreamBYOBReader / WritableStreamDefaultWriter, which throws if the underlying sink doesn't support it? Or should we create dedicated "seekable reader/writer" classes or subclasses? The former is a good deal simpler on the spec and implementation side, and is perhaps a better precedent for any future such expansions. But then feature detection would need some kind of canSeek or supportsOffset getter, which is a bit annoying.

  • What are the "units" for the seeking offset? They could be totally opaque: just a value you pass through to the underlying source/sink. (This starts feeling like some of the generic message-passing mechanisms discussed in Idiomatic Way to Explicitly Tell an Underlying Transform Stream to Flush to the Sink #960 and Reset for TransformStream (and WritableStream?) #1026.) Or there could be some minimal validation, e.g. has to be a number (integer?), has to be nonnegative, has to be finite.

  • Relatedly, should there be a convention for whether seeking past the end throws an exception vs. clamps to the end? I don't know if we can enforce this in the streams infrastructure, but if we could that'd be cool.

@domenic
Copy link
Member Author

domenic commented May 26, 2021

Here is the start of a proposal for a seek-based API:

  • Underlying sources/sinks can supply a promise-returning seek method. If supplied, then the stream's readers/writers support seeking; otherwise they don't.

  • We add seek() methods and canSeek getters to ReadableStreamDefaultReader / ReadableStreamBYOBReader / WritableStreamDefaultWriter. The seek() method forwards to the underlying source/sink after doing some basic argument validation (nonnegative, finite).

  • Seeks are queued up (i.e. not yet forwarded to the underlying source/sink) if there are any outstanding read requests or write requests. So e.g. even without awaits, writer.write(c1); writer.seek(10); writer.write(c2) writes c2 at position 10.

  • If you seek while a readable stream's queue is non-empty, the queue gets emptied; all the buffered-up chunks are lost since they're no longer relevant.

In this model, the underlying source/sink is responsible for knowing what seek means, and how it interacts with reads/writes. The expectation is that they implement things so that reads/writes advance the current position, e.g. writer.seek(10); writer.write(size5Chunk); writer.write(chunk) writes chunk at position 15. But this is not enforced by the streams mechanisms.

Here is the start of a proposal for an offset-based API:

  • Underlying sources/sinks can set supportsRandomAccess: true.

  • For such streams, defaultReader.read({ at }), byobReader.read(view, { at }), and defaultWriter.write(chunk, { at }) work. (For streams without that boolean set, supplying any value for at rejects the promise.) They perform basic validation on at.

  • We add reader.supportsAt and writer.supportsAt booleans.

  • For writable streams, the underlying sink's write() method gets forwarded the at value from the writer.write() call, which it can use as it sees fit. The existing queuing mechanisms for writes ensure that the stream is never asked to write to two different locations concurrently.

    • If no at is supplied, we can either omit it from the call to the underlying sink, or we can auto-compute it based on the size of the chunks. Not sure which is best.
  • For readable streams, the situation is similar, except with the underlying source's pull() instead of the underlying sink's write(). The automatic calls to pull() which occur based on highWaterMark would take place at an auto-computed or omitted at, and would not be able to fulfill read requests with mismatching ats. The simplest thing to do here might be to empty the queue if a read request comes in with an at mismatching what was expected; otherwise the "queue" starts becoming a non-queue.

On balance the offset-based API seems a bit cleaner.

@domenic domenic added the addition/proposal New features or enhancements label May 26, 2021
@taralx
Copy link

taralx commented May 30, 2021

In my zip file reader, I used the Blob.prototype.slice API to get readAt functionality on files.

@jimmywarting
Copy link

What if an http request supports byte range requests?
Would be kind of nice to be able to "resume" a broken request

  1. makes a normal request, sees that it accepts range request
  2. now it's able to advertise that it supports ranges so you can abort a request and make a new request whenever you make a new seek() call

or how about http2, would you be able to do some seeking with that?

@taralx
Copy link

taralx commented Mar 18, 2022

FWIW there is a difference between streaming read, where you probably want aggressive readahead etc., and delimited random access read, where you probably don't and might even want discard-type behavior. So I feel like the solution here might need to be at a higher level, where a File can have a read(offset, length) that returns a ReadableStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements
Development

No branches or pull requests

3 participants