New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can streams be transferred via postMessage()? #276

Open
wanderview opened this Issue Feb 4, 2015 · 26 comments

Comments

8 participants
@wanderview
Member

wanderview commented Feb 4, 2015

Are there plans to support transferring streams between workers or iframes using postMessage?

This question came up in https://bugzilla.mozilla.org/show_bug.cgi?id=1128959#c2

Does this allow receiving a ReadableStream, from for example a filesystem
API or a network API, and then transferring that ReadableStream to a Worker
thread using postMessage({ readThis: myStream }, [myStream]);

Likewise, can a page instantiate a ReadableStream (through a ReadableStream
constructor or through some other means) that can then be transferred to a
Worker thread?

This doesn't mean that all ReadableStreams have to be transferrable. For example a stream of plain JS objects can obviously not be transferred. So maybe we need a notion of transferrable and non-transferrable streams.

@wanderview

This comment has been minimized.

Member

wanderview commented Feb 5, 2015

I don't know if there are plans for this, but one thought I had:

  1. Transferred stream is locked and drained by the UA. The values are effectively piped to the new environment.
  2. If a stream value is not a transferrable DOM object or structured clonable JS object, then the stream aborts.
  3. If content wants to use the stream in two environments then it can call clone() before transferring.
@domenic

This comment has been minimized.

Member

domenic commented Feb 5, 2015

Yeah, that sounds about right. I was hoping to work out the details, and at some point I had an ambitious plan to first figure out how we clone/transfer promises and then do the same for streams (see dslomov/ecmascript-structured-clone#5). But maybe we can just jump to the end if this is important for a particular use case that an implementer wants to implement.

@wanderview

This comment has been minimized.

Member

wanderview commented Mar 24, 2015

I was talking with @sicking today and he raised an interesting point. What do we do in this situation:

  1. Script gets a ReadableStream for a native resource like a file or something.
  2. Script overrides ReadableStream.read() with stream.read = function() {...}
  3. Script tries to transfer the stream to a worker with postMessage()

Ideally the worker would have access to the stream data without all the buffers going through the original thread, but that seems impossible if .read() is JS implemented in the original context.

Can we make the .read() and .writer() methods unforgeable? Or should postMessage() just throw in this situation?

@domenic

This comment has been minimized.

Member

domenic commented Mar 24, 2015

I would imagine we just do what we do for all other platform objects (including e.g. promises), and define the behavior of how other code interacts with them not in terms of their public API but in terms of abstract operations that apply to them. So e.g. the postMessage algorithm references TeeReadableStream(s) instead of s.tee().

@domenic

This comment has been minimized.

Member

domenic commented Mar 24, 2015

(Sometimes this is also stated in specs as "using the original value of X.prototype.method," if the appropriate abstract-operaton factoring hasn't happened to the spec in question.)

@wanderview

This comment has been minimized.

Member

wanderview commented Mar 24, 2015

And because the tee concept operates on the source it effectively by-passes any overrides on the outer object?

@domenic

This comment has been minimized.

Member

domenic commented Mar 24, 2015

I mean, I'd say that it operates on the stream, but does so by reaching into its innards instead of through the public API, in order to be more tamper-proof.

@sicking

This comment has been minimized.

sicking commented Mar 25, 2015

So that's true even for piping? I.e. if I do

var writable = nativeFileSystemAPI.appendToFile(filename);
writable.write = function(buffer) {
  console.log(buffer);
  WritableStream.prototype.write.call(this, buffer);
}
readable.pipeTo(writable);

Then nothing will get logged since pipeTo is reaching into the innards of the argument that's passed to it, rather than using the public API?

@domenic

This comment has been minimized.

Member

domenic commented Mar 25, 2015

@sicking That's a good question. My initial answer was no: pipeTo operates polymorphically on its argument, which must take the structural type of a writable stream, but could be any implementation of the writable stream contract. (E.g., queue-backed WritableStream, or specialized WritableByteStream, or any author-created writable stream types.) So necessarily it operates through the public interface of its arguments, and thus the override will be consulted. The motivation behind this kind of operation is discussed in some length in a new section of the spec; feedback welcome on that. In the end I think this is key to a thriving stream ecosystem with both specialization and generality built in.

However I of course recognize the impulse behind it; you want to have fast off-main-thread piping be the norm for UA-created streams talking to each other. I see a few perspectives on your question in that light:

  • Modifying a UA-created writable stream in such a way "deopts" you to the normal pipe algorithm, since you have made it observable. This honestly seems fine to me; it's similar to how most JS objects work. (E.g. if you start adding numeric properties to Array.prototype/string-valued properties to an array, all arrays/that array will now be slow.)
  • Alternately, we could say that it depends on the implementation of readable. If readable is a type of stream that knows how to recognize NativeFileSystemAPIFileAppendWritableStreams (or whatever), then its pipeTo implementation might be different than ReadableStream.prototype.pipeTo. E.g., as its first line it could do a brand-check to see if its argument is a NativeFileSystemAPIFileAppendWritableStream, and if so reach into its internals. This would of course mean readable is not a ReadableStream, but instead a more specialized type that has such additional logic. But that seems expected; if it has a different implementation, then it needs to be a different type.
  • Going even further, we could try to make ReadableStream.prototype.pipeTo itself extensible in this fashion, thus avoiding the proliferation of types. That is, in the same way you want to have specialized piping algorithms for your UA-created streams, we should be able to explain that specialization mechanism and give authors access to it, since of course it's not only UA streams which could benefit from this specialization. There's a few approaches to this, including a double-dispatch one I spent some time typing up before realizing it was probably too complicated for this bug at this stage. But the simplest model is probably just a registry of (rsBrandCheck, wsBrandCheck) -> pipeAlgorithm which pipeTo can consult.

I'd be quite interested in an implementer perspective on which of these seems more attractive (from everyone, @tyoshino and @yutakahirano included!).

@wanderview

This comment has been minimized.

Member

wanderview commented Mar 26, 2015

  • Modifying a UA-created writable stream in such a way "deopts" you to the normal pipe algorithm, since you have made it observable. This honestly seems fine to me; it's similar to how most JS objects work. (E.g. if you start adding numeric properties to Array.prototype/string-valued properties to an array, all arrays/that array will now be slow.)

I asked @bzbarsky and this option is possible, but it would probably not be our first choice. It would require adding back features that were removed from SpiderMonkey for being a bit hacky. It seems we should go with other options if possible.

@bzbarsky

This comment has been minimized.

bzbarsky commented Mar 26, 2015

I should note that option 1 requires carefully defining exactly what happens if the deopt happens in the middle of the copy operation somewhere; you have to synchronize somehow so that the writes that need to be observable after the deopt are actually observable or something.

@tyoshino

This comment has been minimized.

Member

tyoshino commented Mar 26, 2015

We've redesigned the readable stream to have the reader+stream structure and are going to also introduce the writer+stream structure to the writable stream. Given that change, the example by Jonas (#276 (comment)) doesn't work since pipeTo() obtains a writer by itself. There's no point where we can substitute writer.write which the pipeTo() uses.

@tyoshino

This comment has been minimized.

Member

tyoshino commented Mar 26, 2015

So, to implement the 1st option in Domenic's comment (#276 (comment)), for example, we should make overriding getReader() deopt the special piping algorithm of the readable stream writable stream pair, I guess.

@tyoshino

This comment has been minimized.

Member

tyoshino commented Mar 26, 2015

But getReader() is very special. It requires interaction with internal states of streams. Some API support is needed.

@yutakahirano

This comment has been minimized.

Member

yutakahirano commented Mar 26, 2015

@tyoshino, sorry, I don't understand what you meant at #276 (comment). Can you explain?

@yutakahirano

This comment has been minimized.

Member

yutakahirano commented Mar 26, 2015

Not streams but the reader and the writer are important for pipeTo. If the reader and the writer gotten via getReader() and getWriter() are authentic, we can enable the optimization.

var rs = ...; // rs is a UA-created ReadableByteStream.
var ws = ...; // ws is a US-created WritableByteStream.

var reader = rs.getReader();
var writer = ws.getWriter();

var rsx = {
  getReader: () => reader,
  pipeTo: rs.pipeTo
};

var wsx = {
  getWriter: () => writer
};

rsx.pipeTo(wsx); // This works with the optimization.
@bzbarsky

This comment has been minimized.

bzbarsky commented Mar 26, 2015

OK, that seems to push the issue off to how pipeTo interacts with the reader and writer. That is, all of #276 (comment) applies but now to the reader and writer, right?

@domenic

This comment has been minimized.

Member

domenic commented Mar 26, 2015

I think it helps reduce the surface area a little bit. In particular one strategy would be that you only have to check if dest.getWriter is the same as the original getWriter, and as long as that can be done unobservably, you're then good to go. (I think it can, too: you insert the equality check in between the "get" and the "call" steps of the invoke-a-method algorithm.)

Except... what if someone overwrites WritableStreamWriter.prototype.write or similar -_-.


Another strategy would be that the first thing the algorithm does is grab all the methods off the writer, and then use them forever more after. E.g.

const [write, close] = [writer.write, writer.close];

// we can unobservably insert a check here that write and close are the expected ones...

// later:
write.call(writer) // instead of writer.write()

Except ... this doesn't work for our ready getter, without hacks. And, seems pretty JS-unnatural anyway...


I'm starting to feel that we need to program what we want more directly into the spec, instead of trying to make it an unobservable optimization. That implies one of the other two strategies. (Or perhaps less-extensible versions of them, at least in the short term...)

I hope to think on this more productively next Monday, when my brain has had a chance to un-fry itself after a TC39 week. I feel like I need to take a step back, and say what the high-level goals and constraints are here. (I've left a lot implicit, I think.) But the above is kind of where I'm at right now; hope it helps people understand that we're taking this seriously at least :)

@bzbarsky

This comment has been minimized.

bzbarsky commented Mar 26, 2015

You don't have to check dest.getWriter, right? You have to check something about the object it returns, as you note. I assume getWriter is only called once, up front.

But yeah, my point was that we have the same problem with the writer. :(

@domenic

This comment has been minimized.

Member

domenic commented Mar 26, 2015

Well, you could do either, but yeah, both fail.

@wanderview

This comment has been minimized.

Member

wanderview commented Apr 6, 2015

It seems we can avoid most the issues if we just make it so you can postMessage() the stream, but not the reader. When you postMessage() the stream then it locks the stream and transfers the underlying source. This doesn't give 3rd party js any surface area to override as far as I can tell.

@domenic

This comment has been minimized.

Member

domenic commented Apr 6, 2015

Right, that does address the OP, although we did get into the question of how to make pipeTo unobservable (independent of postMessage).

@domenic domenic added the question label Apr 7, 2015

@tyoshino

This comment has been minimized.

Member

tyoshino commented Apr 8, 2015

Maybe we should first discuss the overwrite detection issue at #321.

@isonmad

This comment has been minimized.

Contributor

isonmad commented Jun 2, 2016

This issue is really old now, but what happened to the alternative from #97 (comment) that just uses an explicit 'socketpair' connecting to the other page? Issue #244 never had any discussion about it, and it sidesteps any transfer-of-state issues.

@tyoshino

This comment has been minimized.

Member

tyoshino commented Jun 23, 2016

Wrote some update on #244. I have less objection to postMessage()-ing ReadableStream now.

@jimmywarting

This comment has been minimized.

jimmywarting commented Aug 19, 2016

This would be very useful for my StreamSaver.js to be able to post a stream to the service worker. behind the seen it uses a messageChannel.

But this lib would be obsolete if we would ever get a api to write to the filesystem so it doesn't have to go through a SW 😄

isonmad pushed a commit to isonmad/streams that referenced this issue Dec 3, 2016

isonmad
naive attempt at transferring readable streams
As [[storedError]] is observable, can be an arbitrary object,
and is very likely an uncloneable Error, it can't be sent to
a new realm reliably. So just forbid errored streams.

Still needs clearer semantics of when structured cloning occurs
and how DataCloneErrors are reported.

Cloning needs polyfilling somehow too.

Related to: whatwg#244, whatwg#276

isonmad pushed a commit to isonmad/streams that referenced this issue Dec 3, 2016

isonmad
naive attempt at transferring readable streams
As [[storedError]] is observable, can be an arbitrary object,
and is very likely an uncloneable Error, it can't be sent to
a new realm reliably. So just forbid errored streams.

Still needs clearer semantics of when structured cloning occurs
and how DataCloneErrors are reported.

Cloning needs polyfilling somehow too.

Related to: whatwg#244, whatwg#276

isonmad pushed a commit to isonmad/streams that referenced this issue Dec 4, 2016

isonmad
naive attempt at transferring readable streams
As [[storedError]] is observable, can be an arbitrary object,
and is very likely an uncloneable Error, it can't be sent to
a new realm reliably. So just forbid errored streams.

Still needs clearer semantics of when structured cloning occurs
and how DataCloneErrors are reported.

Cloning needs polyfilling somehow too.

Related to: whatwg#244, whatwg#276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment