-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft at tee algorithms, for critique #302
Conversation
Wait, upon re-reading #271 I realized this is contradictory. What I meant to say is that if both |
Glad to see this part getting spec'd, but I have a few questions:
I guess I'm thinking of how our native C++ stream does tee'ing when I read this. Our nsPipe class internally maintains a single buffer. Every tee'd stream then has its own cursor into that shared buffer. The buffer is split up into segments and as the slowest reader finishes a segment it is free'd back to the allocator. I think this is more efficient then duplicating buffers in two queues, etc. Do you anticipate us being able to implement streams returned from DOM APIs using this kind of native code? |
@wanderview thanks for jumping into this!!
const [branch1, branch2] = SpeculativeTeeReadableByteStream(rbs);
const reader1 = branch1.getReader({ feedBuffers: true });
const reader2 = branch2.getReader({ feedBuffers: true });
const view1 = new Uint8Array(512);
const view2 = new Uint8Array(1024);
reader1.read(view1).then(({ value, done }) => {
// value uses same backing memory as view1, via transferrence
// we were able to pass view1 directly to the `reader` for `rbs`
});
reader2.read(view2).then(({ value, done }) => {
// value uses same backing memory as view2
// however, how the data gets there was a bit different, and less efficient:
// once rbs was finished with view1, we cloned it into a new buffer that we enqueued
// into branch2's internal queue. then, the call to reader2.read(view2) caused us to
// copy the queued buffer into the backing memory used by view2.
// hmm, there's a redundant copy here :-/ ... might be able to avoid ...
});
Hmm, very interesting. It sounds like the tee'd streams are not really of the same "type" as the original stream? That is, they are not very general purpose, but instead there's a cooperation between the branches and the original, where the branches largely consist of a cursor and not much else? The duplication is not really avoidable when dealing with the no-data-races-in-JS mandate. But I'll try to think on this more to see if there's something we can learn. |
Do you mean "ReadableByteStream.prototype.tee() then returns SpeculativeTeeReadableByteStream(this)"? So SpeculativeTeeReadableByteStream() conforms to the same semantics at TeeReadableStream()? I think its the word "speculative" that's throwing me off here. Defining the contract that all tee() functions must conform to separately might help me. I like to see interface separate from implementation, etc.
Well if you do The one case I see where the clone argument makes sense for single JS context like this is if the ReadableStream chunks are mutable. In that case branch1 could read chunks, modify them, and then branch2 would see the modifications when it later reads. Doing the clone would avoid that mutation from being observed in branch2.
Well, no. From the consumers point of view they are exactly the same as the original. The consumer just sees a stream interface with a read() method. For this nsPipe case the original stream was originally just a cursor with an underlying buffer; the buffer just wasn't shared yet. Add'ing another branch to the nsPipe doesn't effect the original stream at all. Of course, this is a purely byte stream, so maybe it only applies to ReadableByteStream. |
Yes, sorry. My "speculative" adjective is indeed confusing things; it's only meant there as "if we actually had ReadableByteStream in the spec/refernece implementation, I think this is what it would look like." It's not about the semantics of the tee. So, implicitly, if we actually had a ReadableByteStream to add a ReadableByteStream.prototype.tee too, I would probably have removed the "Speculative" prefix at that point. And yes, the semantics should be the same.
Definitely. Roughly, I think it would be:
I think I see. So, I think postMessaging a stream actually doesn't use the tee functionality. Instead it just grabs a reader and sends the bytes over the wire to the counterpart stream. So similar to how when you So in particular,
If you do this,
Right, that's pretty much all objects in JS, including
Yeah, I meant, in terms of them being different implementations of the same interface (and thus in JS, different classes). This might manifest as creating a new stream class of some sort, TeeBranchReadableByteStream or something, which reaches into the innards of its parent as arranged by the tee algorithm. I am unsure this makes that much sense given that we need to clone anyway for pretty much all cases, so it's not like we gain efficiency by having multiple pointers into the same buffer. But, maybe it can help me avoid that extra allocation I noted in the example... |
Ok, this is where I had different expectations. I was thinking Is it consumed because of the way this is written as an external function right now? It seems like particular stream implementations could be smart enough to copy their internal details to produce a new branch. In gecko we actually do something similar:
The constraints that force us to use the external function in native don't seem to apply to streams exposed in js, though. So I'm trying to understand why we can't require stream implementations to produce a tee without any observable state change in the original stream handle. |
In my opinion it is more natural from the Streams POV though it looks a bit strange from the Fetch API POV ( |
|
||
function maybeCancelSource() { | ||
if (canceled1 && canceled2) { | ||
reader.cancel([cancelReason1, cancelReason2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return a promise got from reader.cancel(...)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. It turns out to be a bit more complicated since we need to return a promise even to the first person to cancel, not just the second.
Unless we think the first should fulfill ASAP and the second should signal? I will mention below so it doesn't get lost in this old diff.
Sort of. It's consumed because it's meant to be general functionality that can work with any stream. It's true that if you created a stream implementation with a focus around being clone-able, you could make it work without consumption. For example, you could create a stream implementation that keeps track of all clones that have been produced, and every time a chunk is enqueued into the stream by the underlying source, it also enqueues it into the clones. (Or for byte streams, each read---from either the original stream or the clone---also copies the data into an internal buffer. You then ref-count which memory regions have been consumed by which subset of [original stream, ...clones], and only release a region when all clones have consumed it.) This seems a lot more complex though. In particular we either have certain streams being complex and cloneable and other not, or we add complexity to all streams to track their clonability. Compared to just a simple operation that can work with any stream through its preexisting interface, I'm not sure it gives much. What is the advantage? |
To keep the original stream available and its state unchanged even after teeing, we need to add one more layer in addition to the stream and the reader. Introduction of the new layer doesn't necessarily require definition of a new class but we need to at least clearly define a concept to explain the new layer. Like For But once there's a stream on it, the stream doesn't fit this approach well. It's unfortunate but we shouldn't be dragged by the fact and introduce unnecessary complexity. |
Right. To be clear, the impact on var s = res.body;
var res2 = res.clone();
var s1 = res.body;
var s2 = res2.body;
assert(s !== s1);
assert(s !== s2);
assert(s1 !== s2);
// s is locked (cannot be really used)
// s1, s2 are unlocked and usable I think it is totally OK for |
Even more aggressive approach is making |
By "leave it unchanged" in the last post, I meant keep |
I basically agree with the idea in Domenic's #302 (comment) so far. |
@annevk agreed. One thing I forgot to mention: I think people who are "response focused" will likely not touch body, and will do |
Anne: It's just a thought experiment. It's not realistic, right. Sorry for confusing. I wanted to clarify from where this mismatch came. Domenic: Nice justification. It's not hard to expect users of |
New revision up that is starting to get to be what we might put in the real spec. Including tests, taking care of many edge cases, and using internal methods to avoid accessing public APIs. (I was tempted to try to use only public APIs and make this generically applicable to any ReadableStream. But, I think we can save that for later if there is a good use case. I have proven that you can write it using only public APIs, and that is enough to satisfy my no-magic impulse for now.) |
One question @yutakahirano brought up in #302 (comment) is a good one. Consider the following situation: const rs = new ReadableStream({
cancel() {
throw new Error('wheee');
}
});
const [branch1, branch2] = rs.tee();
branch1.cancel().then(/* what should happen here? (1) */);
branch2.cancel().catch(/* what should happen here? (2) */); The semantics with regard to cancel is that if you cancel both branches, only then will we communicate back to the original stream that it should be cancelled, and will perform an action. That action might fail, as shown here. First cancel fulfills, second rejectsMaybe Both cancels rejectMaybe they should both reject: that is, maybe we should not resolve or reject either promise until the ref count has decreased to zero, and we can tell whether the original underlying source cancel succeeded or failed. One possibly-unintuitive consequence of this is that if you only ever cancel branch1, then Both cancels fulfillMaybe they should both fulfill: maybe we treat the success or failure of the actual underlying source cancellation as irrelevant, and say that the promise returned by The downside of this is that you lose any information about errors canceling the original underlying source. It's important to keep in mind that this is a pretty small point. Many consumers will not care if canceling fails (or succeeds). The whole point of cancel() is to communicate "I don't care about this stream anymore," so it's kind of rare you care about how well your not-caring went. But, we do need to pick one. I think I have a slight preference for both reject, but could also go with one fulfills, second one rejects. Both fulfills seems bad. |
I think this is really bad and unexpected. Why can't body be a wrapper around a concrete stream? Then clone() would swap out the underlying stream, but the external users of .body would not be able to observe it. |
To clarify, I don't think anyone expects |
It seems to me we should be able to wrap the underlying source in a TeeSource that does this work. The original stream just replaces its source with the new TeeSource and hands the TeeSource to the new stream. The new stream then attaches itself to the TeeSource as well. Any new tee() requests do the same thing (without creating a new TeeSource wrapper again, ideally.) But I haven't read the spec recently enough to write this code. So maybe I am missing something. This would keep the mutation as non-observable to the immediate user of the original stream. |
It's an interesting idea. The wrapper has
methods but doesn't have |
I don't want to create a whole new type of stream just to satisfy an esoteric equality test that nobody should be doing in practice. What is the real hazard here? |
Here is a more broad take on the issue here, that maybe will be helpful. I think we have simply run into an ordering issue with our design process. While designing Now that we have streams, it becomes clear that clone-while-leaving-unconsumed is not a very natural thing to do with streams as specified. You can probably make it work, but doing so would be invasive to the stream implementation. Whereas, teeing is perfectly natural, and falls out of basic usage patterns of the public stream APIs. The question at hand, I think, is whether we believe clone-while-leaving-unconsumed is a core use case for streams as a primitive. If it is, we should do appropriate re-design of stream internals and algorithms to support it. There are several possibilities here mentioned already: @tyoshino's three-tiered approach; the stream-wrapper idea; the hooking-and-unhooking underlying source path; or the all-streams-keep-track-of-clones-internally path. But my perspective is that, given how natural teeing is (and not just here, but in other streaming or reactive or iterator APIs), and how unnatural clone-without-consuming is, it really isn't worth the added complexity. Basically, we goofed a bit by choosing |
Ok. I think you convinced me. The zero-copy-clone of the stream is an optimization that can be added later if there is a perf need for it. |
In regards to Response.body changing in Response.clone(), I have a question: Does reading .body() set the bodyUsed flag to prevent Response.clone()? If "yes", then I'm ok with .body being swapped to a different stream. If "no", then how do we prevent something like this:
|
@wanderview great question. Fortunately I think we took care of it with the reader design :). To read from a stream, you need to acquire a reader: so, If anyone else wants to read from the stream---say, the clone procedure---they will need to also get a reader. But, they cannot get a reader at the same time as you. So, you need to release your reader first. AND! You're not allowed to release your reader, until all of the read-promises you've asked the reader to create, have settled. So, the example is more like: var reader = resp.body.getReader();
var readPromise = reader.read();
// this will throw, since it needs to get a reader, but the stream is already locked
try {
var resp2 = resp.clone();
} catch (e) { }
// this will also throw, since we haven't waited for readPromise to settle
try {
reader.releaseLock();
} catch (e) { }
// this will work:
readPromise.then(_ => {
reader.releaseLock();
var resp2 = resp.clone();
}); Make sense? |
And if they try to use Sounds good. Thanks! |
Once you release the reader it acts like a closed stream, is the idea. (We could make it act like an errored stream, but in the rare case that you care, closed seems more likely to go down the right path.) |
In regard to this:
I think its important to make it work such that a library consuming a stream:
So, from an interface point of view, I think this should work exactly how normal stream cancellation works. If cancel() normally rejects() then it should reject. If cancel() normally fulfills, then it should fulfill. |
Hmm, yeah. I think those are good principles. They point me to both-cancels-fulfill or both-cancels-reject. Since both-cancels-fulfill loses information, I think that means we want both-cancels-reject. Although, even one-fulfills-one-rejects wouldn't necessarily violate any of those constraints. It would tell you whether you were the first to cancel, but not whether you were the "left" side of the tee or the right side. But, even that is a kind of information leakage between the two branches, so yeah, let's not do that. (BTW my mismatched .then + .catch was not intentional, both should be .then) |
If the consumer of either branch really wants to interact with the original stream via I basically think such needs should be taken care of by more complex custom tee-ing code. But if we're to build some helper for convenience (so that any change on the consumer code is unnecessary), I'd propose something like the following:
|
I think that is the right approach, especially for now. I do want to decide on a default though---both for other specs to use (like Reading your |
For the tee algorithm being drafted in #302, it's important to have an abstract operation CloseReadableStream that does all the things that ReadableStreamController.prototype.close does. So, we factor that out. The old CloseReadableStream is renamed to "FinishClosingReadableStream".
Again, needed for formalizing the tee algorithm from #302.
Again, needed for formalizing the tee algorithm from #302.
For the tee algorithm being drafted in #302, it's important to have an abstract operation CloseReadableStream that does all the things that ReadableStreamController.prototype.close does. So, we factor that out. The old CloseReadableStream is renamed to "FinishClosingReadableStream".
Again, needed for formalizing the tee algorithm from #302.
Superceded by #311! |
First shot at addressing #271. Want to get these up early for review before formalizing them.
branch1
orbranch2
(triggering their underlying source'spull
method), they in turn "pull" fromstream
(usingreader.read()
). Once a chunk has been pulled fromstream
, it gets enqueued in both branches---no matter which branch initiated the pull. This should preserve backpressure and also has the "OOM" property discussed in Define "tee"ing a stream #271 such that if the consumer for branch1 is slow and the consumer for branch2 is fast, then branch1 ends up with a lot of unconsumed chunks in its queue.read()
will happen. The underlying sources forbranch1
andbranch2
both havepull
methods like in TeeReadableStream, which might be used if someone goes for an auto-reader on either of them. But they also haveread
methods, such that if someone does e.g.reader1.read(view1)
, it will call through tous1.read(view1)
which does two things: (1)reader.read(view1)
, pulling a chunk fromstream
and using it to fulfill the direct read call; (2) enqueue the view into the queue forbranch2
. Thus, later, if someone doesreader2.read(view2)
, there will be a copy/transfer frombranch2
's queue intoview2
.All of these include a
clone
boolean argument, which if present, will result in structured clones happening for anything enqueued. This is important if we envision the two branches being consumed on different threads, e.g. as in #244 or #276.Would love a review to validate!