Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify ReadableStream.[[Transfer]] #623

Closed
wants to merge 6 commits into from
Closed

Conversation

isonmad
Copy link
Contributor

@isonmad isonmad commented Dec 4, 2016

See also #276.

As @domenic mentioned in #244 (comment), this effort seems more or less blocked on whatwg/html#935, but this at least gives a general idea of the parts it would need to touch.

In particular, there's an apparent problem with transferring errored streams (namely [[storedError]]), as well as the open question of what even ends up in [[storedError]] if the underlyingSource throws or calls controller.error() it from the original realm with something uncloneable.

StructuredClone doesn't seem to be actually possible to truly polyfill as it iterates over objects in a different order than for-in does, and there doesn't seem to be an obvious way to replicate it; is there any reason StructuredClone isn't/ shouldn't be exposed on its own by the JS engine, for admittedly really niche use cases like this?


Preview | Diff

As [[storedError]] is observable, can be an arbitrary object,
and is very likely an uncloneable Error, it can't be sent to
a new realm reliably. So just forbid errored streams.

Still needs clearer semantics of when structured cloning occurs
and how DataCloneErrors are reported.

Cloning needs polyfilling somehow too.

Related to: whatwg#244, whatwg#276
Lacking an obvious way to actually postMessage a stream
in this polyfill, this adds the 'cloning' stream type
for testing purposes.

Byte streams to come.
@ricea
Copy link
Collaborator

ricea commented Dec 6, 2016

I want to apologise for not reviewing this yet. I'm very interested in it, but I've just been busy finishing up stuff for the end of the year. Hopefully someone else will get to it soon.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm surprised how reasonable this all is even without whatwg/html#935. If you ignore the question of "so how does this work for Indexed DB's use of structured clone", it looks pretty great. I have some question of strategy which I've included in the array.

In particular, there's an apparent problem with transferring errored streams (namely [[storedError]]), as well as the open question of what even ends up in [[storedError]] if the underlyingSource throws or calls controller.error() it from the original realm with something uncloneable.

This seems similar to the question of what happens if the underlyingSource calls controller.enqueue(somethingUncloneable). The answer is to error the destination stream, right? With a TypeError of some kind I guess.

StructuredClone doesn't seem to be actually possible to truly polyfill as it iterates over objects in a different order than for-in does

Doesn't it iterate over objects in the same order that Reflect.ownKeys does?

is there any reason StructuredClone isn't/ shouldn't be exposed on its own by the JS engine, for admittedly really niche use cases like this?

That's whatwg/html#793. Basically lack of implementer interest, presumably driven by lack of use cases.

1. Set _that_.[[readableStreamController]] to _controller_.
1. Let _queue_ be _controller_.[[queue]].
1. Repeat for each Record {[[value]], [[size]]} _pair_ that is an element of _queue_,
1. Set _pair_.[[value]] to ! <a abstract-op>StructuredClone</a>(_pair_.[[value]], _targetRealm_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use ?, not !, as StructuredClone could throw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually can't at this point, because every [[value]] here is already the result of a previous StructuredClone call. The only call to StructuredClone that could throw is at enqueue time.

@@ -3589,11 +3634,13 @@ throughout the rest of this standard.
</emu-alg>

<h4 id="enqueue-value-with-size" aoid="EnqueueValueWithSize" throws>EnqueueValueWithSize ( <var>queue</var>,
<var>value</var>, <var>size</var> )</h4>
<var>value</var>, <var>size</var>, <var>targetRealm</var> )</h4>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems nicer to me if EnqueueValueWithSize stays as a naive implementation of the queue-with-sizes data structure, and the structured cloning happens elsewhere. Is doing that much uglier?

Copy link
Contributor Author

@isonmad isonmad Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for one, it would mean reimplementing the same code for WritableStream eventually, so I guess there could be a wrapper both RS and WS use to enqueue? But then there would be...a single user of EnqueueValueWithSize that didn't use the wrapper, WritableStreamDefaultControllerClose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I also had a preference for throwing the RangeError for invalid strategy return values before potentially throwing the DataCloneError.

@@ -12,6 +14,7 @@ const { AcquireWritableStreamDefaultWriter, IsWritableStream, IsWritableStreamLo

const InternalCancel = Symbol('[[Cancel]]');
const InternalPull = Symbol('[[Pull]]');
const InternalTranfer = Symbol('[[Transfer]]');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo Tran[s]fer

@@ -251,6 +257,33 @@ class ReadableStream {
const branches = ReadableStreamTee(this, false);
return createArrayFromList(branches);
}

[InternalTranfer](/* targetRealm */) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in code I would reify the realms as their globals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow you to write web platform tests using iframes (although they might not do so great in our test runner... but maybe they would!)

throw new TypeError('Only cloning streams are transferable');
}
/* can't exactly polyfill realm-transfer */
const that = new ReadableStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here this would be new targetRealm.ReadableStream()

@@ -1447,6 +1478,12 @@ Instances of {{ReadableStreamDefaultController}} are created with the internal s
</tr>
</thead>
<tr>
<td>\[[targetRealm]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach, of using the ReadableStreamDefaultController in one realm instead of just creating a new stream + controller pair, is very interesting, but not what I expected. Why did you choose that?

To be specific, I would have expected the algorithm to be something like

const reader = this.getReader();

const that = new ReadableStream({
  pull(c) {
    return reader.read().then(
      ({ value, done }) => {
        if (done) {
          c.close();
          return;
        }

        c.enqueue(StructuredClone(value));
      },
      e => c.error(e)
    );
  },
  // probably more
});

Copy link
Contributor Author

@isonmad isonmad Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach just seemed intuitive to me.

  • For one, I wanted to make throw-on-enqueue possible, so the clone call has to occur when a chunk first enters the controller's queue, not when it leaves it, as would happen in your algorithm.
  • The underlyingSource still lives in the original realm, and has a reference to the controller object. (Conceptually, 'controller' is sort of a term for two different things that are conflated, a public API object provided to the underlyingSource and the hidden implementation details of the stream. The API object is stuck in the original realm of the underlyingSource, but the 'implementation details' of the stream are in charge of moving things between realms, and I was conceiving that as synonymous with a single controller, I guess, not a controller+a second, entirely invisible controller with no corresponding public API object.)
  • I conceived of a transferred stream as a communications channel between two realms, but the destination realm could change when it's transferred repeatedly. You could also represent that as an ever-increasing chain of readablestreams that move messages between realms, and the implementations just do invisible optimizations as if those in between steps never happen, but this seemed a more accurate model of reality. Plus, what if those realms go away, or should go away, except for this one readablestream ferrying messages between two other realms keeping its event loop alive?
  • I thought it would also be nice to have a stream that clones its chunks without having to transfer it to a worker and back again, and there's no need for a second readablestream at all in that case.
  • It just seems like less code.

@isonmad
Copy link
Contributor Author

isonmad commented Dec 9, 2016

If you ignore the question of "so how does this work for Indexed DB's use of structured clone"

Can you elaborate? I'm not familiar with IndexedDB. This PR doesn't make streams cloneable anyway, trying to clone them is still an error as long as ReadableStream lacks a [[Clone]] internal method. They're still only transferable.

This seems similar to the question of what happens if the underlyingSource calls controller.enqueue(somethingUncloneable). The answer is to error the destination stream, right?

Not necessarily. Why should a cloning stream be errored, when bytes streams don't error the stream when you provide an unacceptable chunk either? You can just have enqueue throw if cloning/serializing to an Agent Record fails. If the underlyingSource doesn't catch the thrown DataCloneError, then the stream obviously gets errored.

And 'erroring the stream' still leaves the original question of what even to actually put in that storedError slot at all--

With a TypeError of some kind I guess.

Right, that. Or a generic DataCloneError? We could attempt a StructuredClone of the error object actually, since it may or may not be uncloneable, and just place a DataCloneError in the slot if the clone attempt throws. ...Or we could do something weird and set it to the results of running String(err) in the original realm? Or make that available somehow?

Reflect.ownKeys

That I wasn't aware of, neat.

@domenic
Copy link
Member

domenic commented Dec 9, 2016

Can you elaborate? I'm not familiar with IndexedDB.

IndexedDB uses structured clone, kind of. But then it writes to disk, instead of recreating the object in another realm. So the current structure of StructuredClone is kind of broken for it.

This PR doesn't make streams cloneable anyway, trying to clone them is still an error as long as ReadableStream lacks a [[Clone]] internal method. They're still only transferable.

Good point. Maybe it is even more fine then.

Not necessarily. Why should a cloning stream be errored, when bytes streams don't error the stream when you provide an unacceptable chunk either? You can just have enqueue throw if cloning/serializing to an Agent Record fails. If the underlyingSource doesn't catch the thrown DataCloneError, then the stream obviously gets errored.

I said error the destination stream, not the original being-cloned stream.

We could attempt a StructuredClone of the error object actually, since it may or may not be uncloneable, and just place a DataCloneError in the slot if the clone attempt throws.

Yes, we'd definitely try to clone it at first. I guess "DataCloneError" DOMException is better if that fails.

@isonmad
Copy link
Contributor Author

isonmad commented Dec 9, 2016

I said error the destination stream, not the original being-cloned stream.

Okay I'm semantically confused. I'm considering it conceptually the same stream, 'transferred' to a new realm. Same underlyingSource and everything. The original ReadableStream object is 'detached' and an empty shell of its former self.

So we're both talking about the exact same stream here and I don't really understand your reply.

@domenic
Copy link
Member

domenic commented Dec 13, 2016

Okay I'm semantically confused. I'm considering it conceptually the same stream, 'transferred' to a new realm. Same underlyingSource and everything. The original ReadableStream object is 'detached' and an empty shell of its former self.

No, they're definitely not the same stream. The underlying data is what gets transfered, but there are two different ReadableStream objects, and thus two different streams.

@domenic
Copy link
Member

domenic commented Feb 14, 2017

Although I still am unsure on what model we want to pursue here, and I should spend more time thinking about it, I was talking with a coworker today who was very interested in this use case. To demonstrate to him exactly how it would work I produced two gists:

I thought they'd be worth dropping in the thread for the community to see.

@ricea
Copy link
Collaborator

ricea commented Feb 15, 2017

I've been thinking about it a bit. The transferred stream ends up with a different strategy. This is weird. But I think it's unavoidable.

For byte streams it's not too bad. The HWM may be different from the original stream. It's attractive to me to let the browser set that to some "optimal" value which depends on the transfer overhead.

For other streams I guess we just end up with the equivalent of CountQueuingStrategy(1). Not great.

We could recognise the original value CountQueuingStrategy.prototype.size and ByteLengthQueuingStrategy.prototype.size and treat them specially. That seems like such a terrible idea that I don't want to pursue it.

@domenic
Copy link
Member

domenic commented Feb 17, 2017

I think that might not be so bad... For example, consider a readable stream with HWM = 5 in a worker that gets transferred to the main thread, where it ends up with HWM = 1. The HWM isn't observable from the main thread, since all you have is a reader. Meanwhile, the stream creator over in the worker thread is still getting appropriate backpressure signals, because they still see HWM = 5.

A writable stream is a bit more troubling. Assuming the same setup, the producer in the main thread will always see desiredSize = 1. I wonder if we should consider some kind of asynchronous proxying of desiredSize over to the main thread... that makes things two-directional, which seems weird. This reminds me of when we were discussing whether the caller of getWriter() should determine the HWM, instead of the writable stream creator.

The fact that readables and writables are so different here implies to me I might be missing something. Thoughts welcome.

@isonmad
Copy link
Contributor Author

isonmad commented Apr 1, 2017

the producer in the main thread will always see desiredSize = 1.

The underlyingSource of the transferred stream could only see backpressure if its queue isn't being dequeued from, and it fills up to the HWM=5. So for the second case, after writing one chunk, why wouldn't the main thread continue to see writer.desiredSize as 0 for as long as its internal queue wasn't dequeued from, either?

The producer realm has to somehow know when not to dequeue something and send it off to the other side. it can't just send chunks off without some kind of ready signal. Isn't it the same in both cases? When the underlyingSink's queue is full, the extra (transferred) single-chunk queue will fill up with one chunk and stay full until it gets the ready signal.

@ricea ricea added the do not merge yet Pull request must not be merged per rationale in comment label Sep 6, 2018
Base automatically changed from master to main January 15, 2021 07:25
@ricea
Copy link
Collaborator

ricea commented Mar 3, 2021

We've implemented transferable streams in a different way, so I'm closing this.

@ricea ricea closed this Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge yet Pull request must not be merged per rationale in comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants