-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop/resume #33
Comments
I admit I have a hard time following that thread. Could you help me understand a few things?
If you are curious how we will be doing crypto streams, see https://gist.github.com/domenic/7337821. |
It's on readable stream, this is valid for any operation that does something specific on close (hash, crypto) or that has "bytes floating around", this would apply for textDecoder too for example without the need of an explicit stream option. I have proposed too a pause/unpause which is different (I see you do not recommend it, I would like to understand why), stop would generate something like a STOP-EOF. Let's retake the example in the thread, a progressive hash computing 4 bytes by 4 bytes: incoming stream: ABCDE bytes --> you get the digest for ABCDEFGHI resume: --> you get the hash for ABCDEFGHIJKL etc... This is related to WebCrypto ISSUE22: http://www.w3.org/2012/webcrypto/track/issues/22 But here it's different, the advantage is that there is no public clone required, the operation would clone its state internally. Up to now this was deemed to be very specific to something like WebCrypto but as I mentioned above this applies too to textDecoder. One of the question is: do we really need 'resume' or does it resume implicitly when new data are coming? Maybe that's an example that shows that whatever good concepts are behind a Streams API, it will fail to address everything if not properly coordinated (ie with W3C/WHATWG groups that have invented their streams), ideally in one place only. Regarding cryptoStreams, I have proposed something like (idea coming from node): crypto.subtle.encrypt(xxx).createStream, so for WebCrypto or other APIs you don't have to modify the spec and just say that it is supporting the Streams API (and its createStream method). |
I see. As with most parts of a streams API, this needs to be handled in two different ways depending on whether you're dealing with a pull or push data source. But, the examples at https://github.com/whatwg/streams#example-creation should show you how you would handle this with the streams API as it stands. E.g., insert the appropriate code before calling I think this answers the question, so I'll close this issue. I'll try to address your other points, but open other issues for them if you feel they deserve separate discussion, and feel free to continue the discussion here if you think your original point has not been addressed.
It sounds like you need to split the destination stream into two, one of which will be closed, and one of which will stay open for further writing. For example var hashStream1 = createProgressiveHashWritableStream();
var hashStream2 = createProgressiveHashWritableStream();
incomingStream.pipe(hashStream1, { close: false });
incomingStream.pipe(hashStream2); // close: true is default After (BTW, this assumes the automatic multi-pipe implementation via a tee stream described in https://github.com/whatwg/streams#readable-streams.) An alternative API design that might make sense for specific stream subclasses that have an efficient clone functionality would be var hashStream1 = createProgressiveHashWritableStream();
incomingStream.pipe(hashStream1, { close: false });
// later, once `ABCDE` + `FGHI<EOF>` have come in from `incomingStream`:
// `clone` is specific to this type of stream; it is not a general stream interface
var hashStream2 = hasStream1.clone();
hashStream2.close(); In this case the result would be the same in terms of what
These kind of questions are why I am so insistent on us learning the lessons from JavaScript streaming APIs that come before us. The answer is very clearly the latter, if you look at history.
I don't think coordination has anything to do with how good the streams API is; I think the quality is independent of that. And I think that there is already a lot of work showing how to make a good streams API, and that this repo's streams are an attempt at capturing that, whereas the W3C streams seem like an ad-hoc attempt to scenario-solve problems as they come up by bolting on new APIs or trying to mutate APIs in strange ways. Whereas, I claim that previous work has lighted a fairly clear path toward a coherent solution that addresses all use cases outlined in the "Requirements" section of this readme. Ignoring that previous work in favor of trying to invent it from scratch is not the path I want to see streams go down.
I do not think these factory functions are a very good interface. Nobody is going to be generically testing to see "does this random thing, which might be crypto-related or might be XHR or might be filesystem, have a
Letting each API provide what is natural seems better than trying to create a uniform interface that is not useful to people anyway. And anyway,
seems like a pipe dream, as you would of course have to give details on how the data streams. E.g., what is the default high water mark for a given readable or writable stream? Is it customizable? What type of objects stream--- As an analogy, it would be very silly for the promises spec to propose a There are so many questions to answer when designing a streaming interface, and trying to abdicate responsibility for them under a single "you have Thanks for engaging. As I said, happy to continue discussion, either here or in more focused issues. |
I am not sure your first solution does work for the hash example, that's not a one shot operation but a repeated one for a given stream, that's why it's not a real EOF but a STOP-EOF. Clone: it seems to work better but the idea is not to expose the clone method, that's why I find it interesting if it's an internal one linked to a Stream status. Resume: that's what I think too but I prefer to doublecheck. I agree that the purpose is not to reinvent things and to take lessons from previous APIs, now I am wondering how you can be so confident that your API covers all the cases without getting feedback from related groups using streams inside browsers. I open a different ticket for createStream. |
Could you explain in more detail how this works with the existing C/C++ APIs we would be basing this on? I am speaking only from my experience with the crypto bindings available in my languages and environments, namely Node.js and .NET, where what I presented works well.
This is fairly simple. There are no new streams in the world, really. TCP sockets and file descriptors cover most of them (and nicely encapsulate the push-vs-pull source model). Throw in TLS and GZIP and you have even greater coverage. That is, if you can encapsulate TCP, filesystem, TLS, and GZIP streams in one streaming model, you will be able to cover any conceivable streaming data source. Browser streams are generally just specializations of this. E.g. a readable stream for a HTTP response is a TCP socket. A duplex stream for a websocket is a TCP socket. A duplex stream for a web worker is a readable-writable file descriptor (since this is how OSs generally model processes). A A crypto stream is not an immediate fit for the streaming model, since it does not represent I/O, is often synchronous, etc But it might as well implement the writable stream interface in order to integrate with the rest of the stream ecosystem, and then add its own methods for crypto-specific things. (That is, it is not primarily a stream, but can implement the stream interface so that e.g. you can pipe things to it.) To be sure, it's extremely important to make sure streams solve the problems that other specs need solved. But I am fairly confident, from our experience solving these problems outside of standards-land, that this design does. There is important work to be done helping other spec writers understand how to use the streaming mechanisms here for their specs, but to be honest, I would rather provide a solid and well-thought out primitive that maps to the underlying I/O abstractions, than I would scenario-solve various specs' use cases. Then, we can let users demonstrate how to use the flexible streams we provide to create pleasant and well-designed streaming interfaces to existing APIs, as per extensible web philosophy. Once those are ready, only then do we need to incorporate them into the platform officially. Streams do not have to roll out uniformly to all APIs at once, if they are designed well enough to be rolled out gradually. |
The Tor protocol for example (https://www.github.com/Ayms/node-Tor), it does hash/encrypt/decrypt the payload of 512 KB cells the way I have described in this post, so the payload would be multi piped to the hash/encrypt or hash/decrypt depending on the way of the flow, with a STOP-EOF message inserted for each payload. Indeed streams are existing since a long time (so what are we talking about here? :-) ) Now browsers have some specifics, if you are interested take a look at http://www.peersm.com where I am streaming I think about everything that exists on earth, currently trying to optimize flow control/backpressure. |
I think this is where I get lost. Can you describe more concretely what that looks like in terms of the Tor C++ APIs and the data? I am starting to think STOP-EOF is an in-protocol signal, not a stream-is-closed signal. Similar to the discussion of in-protocol vs. stream-ending errors at #25 (comment). But I definitely have more to understand here. |
I don't know exactly what to show you, you have the C Tor repo here: https://gitweb.torproject.org/tor.git/tree/HEAD:/src/or , graphically it could be: Nodes 1,2 and 3 in the path
Where --> is a normal pipe and ==> is a pipe where you need to insert STOP-EOF for each block of 500 KB of messages |
I don't think you need a special EOF-stop as part of the core streams protocol. I don't think the source needs to insert a "EOF-stop" symbol in its stream. Maybe you could just have the destinations keep track of how much they've read and if it hits 512KB they manually execute the EOF-stop code? This would work unless its important for the source to control the place of the EOF-stop symbol. This might be the pseudo code of your diagram (with sync calls because async calls in while loops are verbose) var torMessages = source
var hasher = new HashStream()
var encoder = new EncodeStream()
var hashesEncrypts = source.pipe(transform(function (chunk, push, cb) {
this.chunks = this.chunks || []
this.chunks.push(chunk)
while (size(this.chunks) > 512 * 1024) {
var message = getMessage(this.chunks); // get 512kb chunk and mutate array
var hash = hasher.hashMessage(message); // should be async
var enc = encoder.encodeMessage(message); // should be async
// push a tuple into the readable buffer.
push({ hash: hash, enc: enc })
}
// done handling this chunk
cb()
})
// hashesEncrypts is a `Stream<{ hash: ArrayBuffer, enc: ArrayBuffer}>`
// torCell should know how to handle those tuples ?
hashesEncrypts.pipe(torCell).pipe(tls).pipe(ws) I guess the main problem here would be whatever efficiency concern you have about the implementation of |
This would imply that I clone each time the state of the hash and crypto operations, which I am doing today and don't find very efficient. The advantage of stop-eof would be:
The clone would be an internal one of the operation, not explicit and not exposed. This would work for other operations like TextDecoder: chunked data --> http parser ==> TextDecoder (stop-eof at the end of each chunk) --> handle chunk So you would not need the stream option of TextDecoder. I find it a good way to handle operations that can have unresolved bytes without modifying the operation API, and better than cloning or using options. |
@Ayms why do you need to clone ? Why can't |
Same for text decoder. It doesn't need to clone data. You just store the remaining undecoded bytes on the text decoder itself (you need to have one text decoder per http parser though) and then when the text decoder gets the next chunk written to it it just concatenates them and continues. No cloning involved. |
Yes (of course!!!) for TextDecoder, the clone operation would be: do nothing, just keep unresolved bytes. So the "clone" operation depends on the Operation (TextDecoder, crypto, hash, etc), it "clones" the state of the operation to restart from it, so it's not even a real "clone", maybe another name can be found. Each Operation supporting STOP-EOF would know what it has to do and again I find it nicer than implicit exposed/potentially leaking clones and options. For crypto/hash, you need to clone the state for operations that have a specific finalization process or/and if unluckily the size of the chunks defined by the protocol is not a multiple of the number of bytes processed by the algorithms. |
@Ayms I don't quite understand how STOP-EOF helps here. how does STOP-EOF help avoid the Clone operation? |
@Raynos , you wrote "You would have to generate a new local hasher and encoder for each", let's imagine it's feasible I would have to keep the state of the operation. TextDecoder: I have to use the option {stream:true} For both, why should I do this if a stop-eof can take care of it for me? Let's take WebCrypto hash, the interface is something like crypto.sublte.digest(xxx) For security reasons I have no way to interact with the crypto operation, clone its state and reuse if after digest. Now if WebCrypto is supporting streams and stop-eof, it will handle stop-eof internally (ie clone the state making sure it does not leak, return the finalization, resume on next data) How would you handle this another way? This would imply that you can manipulate the crypto operation, so it would be insecure. |
For streaming crypto to work the crypto primitive needs to have a streaming friendly interface. Node's crypto api has http://nodejs.org/api/crypto.html#crypto_class_hash which is This allows for an efficient implementation. The same applies to If your TextDecoder and Crypto primitives just have a single I recommend that the decoder & crypto apis are changed to be stream friendly, i.e. allow you to build up a hash piece by piece efficiently. I'm not sure how stop EOF helps avoid rewriting decoders & crypto apis to support the fundamental use case of computing the operation on the arraybuffer piece by piece. Imagine you want to md5 hash a 4gb file. You need some kind of api that does not load a 4gb array buffer into memory. |
It seems like you don't get my point or I am not explaining correctly, I had some hard time explaining this to WebCrypto too before they recognized the issue (ISSUE22 http://www.w3.org/2012/webcrypto/track/issues/22) The purpose of stop-eof is of course to cut things into pieces and process it as you describe, that's basically the purpose of streams, but unlike EOF, it does allow to close the operation and restart it while you continue streaming. For example, as Tor protocol is specified there is a unique hash associated to a circuit at circuit creation and this hash will hash all data streamed within this circuit by chunks of +/- 500 KB closing the hash for each chunk, then restarting it, so each time you close it you get the hash for all data since the begining (which of course you are not going to recompute entirely for each chunk). This API is for browsers, not for node, so it should be compatible with Web APIs (TextDecoder, WebCrypto, etc) And node does not allow to restart the operation after digest, that's exactly my point here, node does not support stop-eof too, at least that's what I still read in the doc, I did propose something to node to allow this for the hash some time ago but finally removed it. |
@Ayms node allows you to restart after digest. You call We want to support the ability to "cut things into pieces and process it as you describe". I think the current whatwg streams api can do so if it has stream friendly crypto primitives like node's primitives. If the WebCrypto primitives are not stream friendly then this is a WebCrypto issue not a whatwg/streams issue. |
@Raynos So, node's doc is not up to date because we can read: "Note: hash object can not be used after digest() method has been called." Where is this documented? Because this would mean that node does support a stop-eof for hash (where stop-eof is implicitely inserted while calling digest), but the issue is not only for hash, current APIs are based on the assumption that there would be an EOF in the stream, see WebCrypto, unlike node there is no update/digest, only digest, so you can not make the difference between both, the hash will close on EOF, same for encryption, I don't really see these APIs changing... Again if you see another solution than a stop-eof, please advise, I don't see other not implying to be able to access the state of the operation and clone it. I think you should reopen that issue so the discussion is public, because if node is indeed supporting multiple digest for the same hash operation, this just means that I am correct requesting a stop-eof mechanism. |
@Ayms can you redirect me to documentation on stop eof for web crypto ? If we know what the stream friendly web crypto api is then we can make sure whatwg/streams can wrap it cleanly. @isaacs the node crypto api implies you can only call |
@Raynos The only documentation available for this in WebCrypto is ISSUE22, it was recognized to be a problem but was closed due to lack of good/secure proposals to solve it and was considered (wrongly) as a marginal case, but finally I think the group was right to close it because there was no good solutions at that time, now with streams we get a chance to solve it in a clean way, and again the subject is not limited to crypto. I am updating the group with what is happening on streams but that's not its priority for now even if the group knows streams are important. So, basically I am trying to propose things so Web APIs can support streams without changing everything and learning from the mistakes/limitations of the past. W3C Streams API is at least considering the case, a bug has been open to track it. |
So if a stream wants to wrap an async operation piece-wise but read intermediate states it would need to:
Would it not be simpler if the operation had a |
I think it's about the same, .value would need to keep references to the state of the Operation before computing final and restarting it from the previous state. 'Clone' is probably not the right word, you don't clone the Operation but just its state, and you don't close it but just compute the final state. 'stop' should be a property of the stream (not the Operation) inserting a stop-eof signal, most of the time you are supposed to know when/where you insert stop-eof but sometimes you might not. Then receiving stop-eof the Operation would proceed as discussed here. Could you please reopen the issue so it's visible for others? |
So if we add I think the important part is having Note a crypto operation does not need to be a duplex stream that receives a stop-eof signal on the writable side and produces a current final state on the readable side. That seems rather heavy, it would be leaner to just have a |
I think, as @Raynos has helped explain, this is not a streams issue, but a CryptoStream issue. As such, it's not for the streams spec to include such crypto-specific additions. Your issue 22 on WebCrypto is still the best place to have such discussions on how CryptoStreams should extend WritableStream and/or ReadableStream, whether with your (IMO very overcomplicated and prescriptive) STOP-EOF proposal, or something more usable like compute/value/clone or similar. |
No, it's not a crypto specific feature, this is valid for any Operation that needs to keep the previous state to compute next one. compute/value/clone is of no use, the stream should handle the signal and the Operation should handle it. Then unanswered question: does node allow multiple hash digests while streaming data or not? |
stop/resume methods are discussed here: http://lists.w3.org/Archives/Public/public-webapps/2013OctDec/0401.html
As you can see that's different from potential pause/unpause, did you foresee this case and how to handle it?
The text was updated successfully, but these errors were encountered: