New proposal for byte stream uses cases #289

domenic · 2015-02-26T19:58:35Z

As recently discussed in #253 and #288 there are several open issues with our current design for readable byte streams. The biggest ones are:

async ready + sync readInto model does not work very well for read(2)-in-a-threadpool scenarios, like file handles, which require providing the buffer at the async time instead of at the sync read (Support reading bytes into buffers allocated by user code on platforms where only async read is available #253)
async readInto models have the potential to cause observable data races, as another thread fills up the buffer (Support reading bytes into buffers allocated by user code on platforms where only async read is available #253 (comment) and elsewhere).

At the same time I think it is important that, as a guiding principle, we try to make readable byte streams deviate from other readable streams as little as possible, in author-facing interface. The entire point of streams as a primitive is to have a shared abstraction useful across lots of code, that many people build libraries on top of. Those libraries should ideally work just as well with readable byte streams as they do readable streams, without any extra work needed. That is why I have continually insisted that ReadableByteStream support the ReadableStream interface. This also is what motivated #288---at the time I was convinced that async readInto was a good solution to our problems for ReadableByteStream, which made me want to align ReadableStream with its own async read.

I think I have a new idea that satisfies our constraints. I go into more detail in https://gist.github.com/domenic/e251e37a300e51c5321f where you can see some evolution going on. The idea is that we revert from .ready to .wait(), and add an optional buffer parameter to .wait(). Here is some sample code from the gist:

async function chunkwise(rbs, processChunk) {
  let current = new ArrayBuffer(ONE_MIB);

  for (const i = 0; i < 10; ++i) {
    // Detaches current, transfering it to newAB
    // Begins the fread(newAB, 0, newAB.byteLength /* === ONE_MIB */) call
    await rbs.wait(current); // fulfills when the first fread(newAB, 0, ONE_MIB) call finishes
    current = rbs.read();    // return value is newAB, i.e. not === ab

    // Right now nothing is happening since even though the stream is empty, there is no buffer it can use to read.

    await processChunk(current);
  }
}

I think this idea is pretty nice, for a few reasons:

If I omit the optional parameter to wait(), then wait() can auto-allocate a buffer for me. (The underlying source passed to the ReadableByteStream constructor could determine how, perhaps with an allocator function. Or we could let authors do it directly?) Thus, generic stream consumers that don't know they are dealing with a byte stream are in good shape.
Given that we have to have transfer semantics of some sort to avoid observable data races, I think this is the most intuitive possibility. Unlike alternatives such as read(sourceAB, offset, bytesDesired) -> Promise<{ result, bytesRead }>, this clearly separates supplying a buffer to the stream from getting a chunk from the stream. This kind of cognitive distance between the two operations helps make the transfer process feel less unnatural.
In general, wait() being a function is a bit more future-proof, so we can add semantics like this to other streams or stream-alike APIs. For example, I wonder what will happen when we start designing WritableByteStream.

It has one downside I want to highlight:

Unlike readInto designs, this does not let you do multiple reads into the same buffer. Until/unless we get segment-wise detachment for array buffers, I don't think this is possible without observable data races. However there is an API that would allow multiple reads into the same pre-allocated backing memory: the read(sourceAB, offset, bytesDesired) -> Promise<{ result, bytesRead }> API. This API is very awkward though and I can't see a clear way to redeem it. So I think we'd want a very compelling use case before doing so.

What do you guys think? I'd like to get this decided soon since I know there are implementations of ReadableStream under way and, per my long paragraph, our decisions on ReadableByteStream will impact ReadableStream.

@tyoshino @yutakahirano @wanderview @calvaris

The text was updated successfully, but these errors were encountered:

tyoshino · 2015-02-27T03:50:38Z

I have been investigating an idea similar to yours. That's exposing WritableStream (stream for feeding) and ReadableStream (output emitter) on the ReadableByteStream. Your wait(current) resembles write(value) of the WritableStream. You've hidden weirdness of calling write() for non-byte-stream by naming it wait() and made its argument optional. This looks good point but I'm still not sure. There could be some pitfall.

read() has the same signature as ReadableStream (object stream), but how piping between object stream and byte stream would be like? I think we should list what we mean by "compatibility with object stream" concretely and prototype how it'll be like. Does wait() call required for each read() even for object stream or it'll be just like the old ReadableStream where it's just a kicking switch of pulling.

Just a naming preference, but I prefer pull() to wait().

domenic · 2015-02-27T04:45:19Z

read() has the same signature as ReadableStream (object stream), but how piping between object stream and byte stream would be like? I think we should list what we mean by "compatibility with object stream" concretely and prototype how it'll be like.

Definitely agree. We don't even have a design for WritableByteStream yet :-/. We need to get on this. I will start trying to draw up the landscape tomorrow.

Does wait() call required for each read() even for object stream or it'll be just like the old ReadableStream where it's just a kicking switch of pulling.

I think more like the latter. In this way a ReadableByteStream is like a ReadableStream with HWM = 0 that always respects backpressure and doesn't proactively fill up the queue.

Just a naming preference, but I prefer pull() to wait().

Yes, I think that is better. wait() was originally there for symmetry with writable stream but I think it was largely a false symmetry from when we were more naive.

OK. I have a lot of work to do tomorrow. I will try to write up examples of the entire ecosystem: readable streams, writable streams, exclusive readers, exclusive writers, byte versions of each of these four, and with both pull and push underlying sources (i.e. fread-style and epoll-style). If we don't discover a fatal flaw based on the examples we can start prototyping the readable side to make sure that exclusive readers are not too complex. If they are too complex, we can consider making pipeTo magical, as per original #241.

Actually, maybe we should consider just making pipeTo magical to start with. Then we could introduce exclusive readers later to explain pipeTo. It might be tricky to do backward-compatibly (e.g. introducing a new state value seems possibly hostile). But it's worth considering.

yutakahirano · 2015-02-27T07:18:53Z

Does wait() call required for each read() even for object stream or it'll be just like the old ReadableStream where it's just a kicking switch of pulling.

I think more like the latter. In this way a ReadableByteStream is like a ReadableStream with HWM = 0 that always respects backpressure and doesn't proactively fill up the queue.

Calling wait() many times has virtually no performance penalty on ReadableStream.
Calling wait() allocates a buffer on ReadableByteStream.

Are these correct? If so, I'm not sure if a user code expecting ReadableStream can work with ReadableByteStream - It may consume unnecessarily large memory.

domenic · 2015-02-27T09:35:42Z

@yutakahirano that's a good point :(. Maybe it is OK, since it is not "normal" to call wait() many times. And we would probably store any extra buffers so that if you do rbs.wait(); rbs.wait(); await rbs.wait(); then you could do rbs.read() three times before state goes from "readable" to "waiting". But it is indeed a big semantic difference.

In that case there's another possibility, which is the feed design. We could even make the feed design a bit more ergonomic by, instead of returning undefined from the feed method, returning this.ready. But then it's still impossible to use a ReadableByteStream with generic readable stream code. I would rather not give up on that.

domenic · 2015-02-27T23:51:52Z

I wrote out a pretty long exploration of readable byte streams today. I didn't get to the other parts I planned quite yet. But it does cover, in a lot more detail than done previously, how both file- and socket-backed readable byte streams would work, and how generic readable stream processors can transparently use readable byte streams. It is based on the proposal in the OP, although I adopted @tyoshino's suggestion of naming the method pull.

https://gist.github.com/domenic/65921459ef7a31ec2839

I'd really appreciate a read-through. The issues noted in "Reading a byte stream into ArrayBuffers" are interesting, and I am curious how bad they are in practice.

tyoshino · 2015-03-02T10:13:25Z

Domenic, what's the practical reason to shrink the given ArrayBuffer to correctlySizedAB? This means that at least when reading the last bytes of the resource, we'll often have to return a shrunk buffer to the pool. How about transferring the given ArrayBuffer for detaching but keep the size as-is and create a view to represent the region?

domenic · 2015-03-02T20:35:52Z

@tyoshino hmm that is an interesting idea. I forgot that we had that trick up our sleeve, of packing up a byte length + a buffer into a single typed array. So people would access the underlying buffer with result.buffer? I guess that is OK.

domenic · 2015-03-02T21:10:06Z

@tyoshino I am now convinced your idea is 100% necessary since the example socket_chunkwise is very buggy as-is. The buffer that we mean to be re-using will be shrinking all the time, which sucks.

I will update the gist with your idea instead.

Also I had another idea which is to combine wait() and read() into a single async read with an optional argument. It seems OK but not great.

tyoshino · 2015-03-03T05:55:08Z

So people would access the underlying buffer with result.buffer? I guess that is OK.

Right. ArrayBufferView can be just used as a tuple of offset, size and buffer. Simple and powerful.

I summarized scattered discussion into some comments at #253. Feel free to fix/add.

domenic · 2015-03-11T16:14:07Z

Rolling this into the larger meta-issue at #253.

domenic added the byte streams label Feb 26, 2015

domenic mentioned this issue Feb 27, 2015

Change the model for ReadableStream to have async read() #288

Closed

domenic mentioned this issue Feb 28, 2015

Maybe giving async readInto (and read?) another shot #290

Closed

tyoshino mentioned this issue Mar 3, 2015

Support reading bytes into buffers allocated by user code on platforms where only async read is available #253

Closed

domenic closed this as completed Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New proposal for byte stream uses cases #289

New proposal for byte stream uses cases #289

domenic commented Feb 26, 2015

tyoshino commented Feb 27, 2015

domenic commented Feb 27, 2015

yutakahirano commented Feb 27, 2015

domenic commented Feb 27, 2015

domenic commented Feb 27, 2015

tyoshino commented Mar 2, 2015

domenic commented Mar 2, 2015

domenic commented Mar 2, 2015

tyoshino commented Mar 3, 2015

domenic commented Mar 11, 2015

New proposal for byte stream uses cases #289

New proposal for byte stream uses cases #289

Comments

domenic commented Feb 26, 2015

tyoshino commented Feb 27, 2015

domenic commented Feb 27, 2015

yutakahirano commented Feb 27, 2015

domenic commented Feb 27, 2015

domenic commented Feb 27, 2015

tyoshino commented Mar 2, 2015

domenic commented Mar 2, 2015

domenic commented Mar 2, 2015

tyoshino commented Mar 3, 2015

domenic commented Mar 11, 2015