Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backpressure on fetch integrated with Streams #452

Closed
yutakahirano opened this issue Sep 10, 2014 · 121 comments

Comments

@yutakahirano
Copy link
Contributor

commented Sep 10, 2014

Imagine ReadaleStream is integrated with fetch. The most naive way is to add stream() method to Body interface.

interface Body {
    ...
  Promise<ReadableStream> stream();
};

Streams is an API that enables the data producer to know that the consumer doesn't want the data and stop producing data. On the other hand, other data types (text, blob, ...) don't have such feature.
My question is, should the loader start loading body while the Body representation is undetermined?

fetch(request).then(function(res) {
  return wait(longtime).then(function() {
    return res.stream();
  });
}).then(function(stream) {
  // Here we have a readable stream.
});

The above code waits longtime and then demonstrates that it wants to get the body data via ReadableStream interface. Should the loader load the body data while waiting and use the internal buffer unlimitedly, or stop loading while the representation is undetermined?

We can provide that information explicitly when calling fetch, but I don't know if it's a good API.

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2014

I'm not sure this is the right place to discuss this issue, but I chose here because it could affect the fetch interface.

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2014

@annevk

This comment has been minimized.

Copy link
Member

commented Sep 10, 2014

It would be good to get input from @domenic.

Note that I expect we would not expose a Stream through a promise. I would expect it to be returned from response.body synchronously. And then if you use one of the methods such as json() it would no longer work or not work predictably.

@domenic

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2014

To explain @annevk's comment a bit more, I believe our thinking was that body would be an instance of RequestBodyStream, which is a ReadableStream subclass that also has methods like json() etc. The .stream() idea doesn't seem bad on first glance, but I haven't thought about it too hard; in what follows I'll assume we're going with RequestBodyStream design.

(Conceptually, .json()s implementation would look roughly like the readableStreamToArray example, except it would decode and concat the chunks into a string instead of putting them in an array, and it would do JSON.parse before returning the result. In reality, it would probably be done much more efficiently, e.g. pre-allocate an ArrayBuffer, read all the data into it, and then decode all at once.)

My expectation would then be that upon fetch() being called, the body created and data is buffered internally (or left in the kernel buffer, perhaps?) up to the stream's high water mark (HWM). Thus, if the user never calls .json() or .read() or .pipeTo(...) or anything else that would read the stream, the maximum memory consumed is equal to max(HWM, Content-Length). (Or perhaps slightly above the HWM if backpressure cannot be exerted fast enough.) If the user later calls .json(), it un-pauses the stream until all the data is read and converted into JSON. Similarly if they call .read(), they consume buffered chunks, until they consume all the chunks that were buffered at which point the stream becomes "waiting" and they need to call .wait().

The choice of HWM could in theory be given to users (e.g. as an option to fetch()), but in practice in Node.js we see that nobody uses such a capability and they just accept the default. Node.js chooses a default of 16 MiB, but implementations could use heuristics (e.g. use a lower number on mobile phones). A HWM of 0 would be fine too.

/cc @tyoshino in case there's something I am forgetting

@annevk

This comment has been minimized.

Copy link
Member

commented Sep 15, 2014

@domenic json() and friends moved directly to Response/Request. So body would be a "pure" stream object I think.

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2014

Thank you. My understanding is the following - Are those right?

  • Request / Response have 'body' property of type ReadableStream.
  • When json(), text(), etc are called the stream gets closed.
  • When read() is called on the stream bodyUsed becomes true.
@annevk

This comment has been minimized.

Copy link
Member

commented Sep 18, 2014

Yeah, although I'm not sure if we can make the third bullet point work if we keep the stream object pure. Perhaps through some temporary observer? But even that would have to be synchronous for bodyUsed to return the correct value...

@domenic

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2014

The third bullet point is definitely achievable since we'd control the creation of the stream. E.g.

var that = this;
this.body = new ReadableByteStream({
  readInto(/* ... */) {
    // actual work
    that.bodyUsed = true;
  }
});
@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2014

@domenic #452 (comment)

max(HWM, Content-Length)

min?

The proposed back pressure mechanism looks correct. Regardless of where json() etc. are placed, we want to just start receiving and buffer some amount of data to reduce latency. The amount would be limited by some "high water mark". The buffering could be implemented not by ReadableByteStream but some platform (e.g. Blink) specific buffering library.

third bullet point

I agree that it works.

To simplify implementation, we would want to use ReadableByteStream not only for the body property but also as a backend for json(), etc. as @domenic proposed.

But even if we adopt this approach, it seems we can still choose to determine the body reading method to provide to the user at access on body property. Not on body.read() call.

  1. The body attribute's getter must run the steps below:
    1. If the used flag is set, return null.
    2. Set used flag.
    3. Return a ReadableByteStream representing the associated body.

Unless we want to allow the user to investigate body before deciding what body reading method to use. I guess there's no such need.

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2014

Fixed to return the ReadableByteStream for the second and later evaluation on the body property once we choose to use the stream.

Objects implementing the Body ... an associated byteStream (initially set to null), ...

  1. If byteStream is not null, return byteStream.
  2. If the used flag is set, return null.
  3. Set used flag.
  4. Set byteStream to a ReadableByteStream representing the associated body.
  5. Return byteStream.
@annevk

This comment has been minimized.

Copy link
Member

commented Sep 19, 2014

@tyoshino I guess that could work, but perhaps we should make it a method then given all the side effects. But instead of returning a promise it would be synchronous. Not sure though.

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2014

make it a method

@annevk Sounds good

@domenic

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2014

I don't really like the side-effecting method. I'd rather just go with the approach from #452 (comment). It also makes the name bodyUsed a lie as it's more someoneOnceAccessedTheBodyPropertyForUnknownPurposes ;)

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2014

Thanks, I will write a draft for the integration. @tyoshino's proposal looks fine. @domenic, can you tell me why you don't like it?

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2014

@yutakahirano because it's more complex (both in internal code and user-facing code/concepts) than #452 (comment) and doesn't gain anything.

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2014

@domenic

I see. json() and text() are only-once methods. So, we don't have to introduce a little complicated algorithm like #452 (comment).

In your idea, bodyUsed is set when json(), text() or any ReadableStream method is called, and are used to prohibit json() and text() but it must not disable the ReadableStream methods (we'll read() many times). Right? Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first? Then, we need something else than bodyUsed. Otherwise, we don't (bodyUsed suffices).

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2014

Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first?

Nah, if you do that you'll get what you asked for. (A mess.)

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2014

Right. To choose not to do that, I want to give a justification supporting that we're putting a guard against json()-after-read() but not against read()-after-json(). Both of them look unexpected usage. json() would be pumping on the stream. It sounds good to give json() exclusive access to the stream.

@yutakahirano said that it might be useful to keep cancel() available even after json() call as it allows us to abort json(). Is this included in your motivation to keep ReadableStream methods available after json() call?

Regarding guard against read()-after-json(), alternative proposed by @yutakahirano is making stream() only-once method. It's cumbersome that you need to save the returned stream to some variable. But we can reduce the complexity of algorithm inside.

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2014

We don't really have any concept of "exclusive access to a stream." If you have a stream object, you can read from it; this is similar to a file handle object. it would be possible to introduce a library or subclass that adds this concept, but I would prefer to let libraries prove the worth of that idea and if we see everyone building such tooling them consider standardizing it.

.cancel() is indeed a good reason to keep ReadableStream methods available.


That said, this kind of situation where a "C++ reader" and a "JS reader" might share a stream seems very reminiscent of the off-main-thread piping question, whatwg/streams#97. In fact you can view it as a subset of that if you define .json() using a concat-stream equivalent:

Response.prototype.json = function () {
  var concatenator = new ConcatStream();
  this.body.pipeTo(concatenator);
  return concatenator.asArrayBuffer.then(convertToString).then(JSON.parse);
};

(npm concat-stream is for Node streams and uses a callback; our hypothetical version uses a promise-returning .asArrayBuffer property.)

I will open a new issue on whatwg/streams and consider if maybe we want to lock the stream during piping.

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2014

We don't really have any concept of "exclusive access to a stream."

By "exclusive", I meant exclusive between arrayBuffer(), blob(), ... text() and direct access on the ReadableStream interface. I didn't mean exclusiveness between consumers who see the ReadableStream interface.

My idea in #452 (comment) realizes this by hiding the stream object itself when json(), etc. is called. The script can never touch any method of the ReadableStream interface. We can read from it only by waiting for the promise returned by json(), etc. which he/she called gets fulfilled.

.cancel() is ...

OK

... new issue ...

Good

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2014

It seems that whatwg/streams#241 allows "read-and-then-pipeTo", but you don't want to allow "read-and-then-call-json" in this issue, right?
The code in #452 (comment) allows that, I think.

With stream() method, we can solve the problem, though.

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2014

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2014

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Yeah, it's useful. But now there's no way to tell the stream how big the header is. Using ReadableByteStream.prototype.readInto()? Then, don't we want to provide json(num_bytes) than just jsonUntilEndOfStream() which we have now? Or, readOneJsonThenStopReading()?

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2014

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Oh, sorry, I mistook your intention, but it is a bit different from what I said at #452 (comment) .

And for your use case, we need read(size) which we don't have now.

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2014

You would use readInto, yes. Not sure about the utility of json(num_bytes). In my view json() is just a stopgap until we have real streams, and/or a convenience for the 80% case.

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2014

What is "readInto"? Is it part of the Streams API?

@annevk

This comment has been minimized.

Copy link
Member

commented Jan 31, 2015

I'm not sure how multipart/form-data is normally processed. I guess we don't parse it ourselves normally so making it a fatal error is fine if we keep doing that consistently.

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Feb 3, 2015

Just a nitpick, but you don't have to have an entry for fetch(req) in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md. It is included in new Request(req) case (at least in the current spec, and it seems nobody in this thread wants to change that).

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2015

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2015

Thanks, Anne, Domenic. Let's have json(), etc. cancel() the stream on any fatal error.


Re: Domenic (#452 (comment)),

OK. Let's proceed with that plan. I've moved (A)'' to the top in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md.

@annevk

This comment has been minimized.

Copy link
Member

commented Feb 3, 2015

Except text(), right? That is, it should not cancel() on incorrect bytes.

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2015

Yes. As you said, text() just replaces incorrect bytes, so it never fails and therefore is never required to do cancel().

@tyoshino

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2015

s/never fails/never fails as long as body doesn't become `"errored"/

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2015

Thank you @tyoshino and others, I will update the draft.

@wanderview

This comment has been minimized.

Copy link
Member

commented Feb 5, 2015

What is the type of the "chunk" provided by Fetch body ReadableStream objects? Is it an ArrayBuffer? The streams spec says a chunk can be of any type, but we should probably define what Fetch body streams return explicitly. (Sorry if this is already defined and I missed it. Its a bit hard to follow the overall effort.)

@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2015

Yes, read() returns an ArrayBuffer. I will describe that in the draft.

yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Feb 12, 2015
Specify Body locking with ExclusiveStreamReader.
Used 'passed flag' instead of 'used flag' as discussed
at w3c/ServiceWorker#452.
dstockwell pushed a commit to dstockwell/blink that referenced this issue Mar 20, 2015
[Fetch] Support various operations after reading data partially.
As we agreed to support various operations such as cache.put after reading
data partially through the body stream[1], let's support it.

1: w3c/ServiceWorker#452
BUG=435393
R=horo@chromium.org

Review URL: https://codereview.chromium.org/1018243002

git-svn-id: svn://svn.chromium.org/blink/trunk@192228 bbb929c8-8fbe-4397-9dbb-9b2b20218538
yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Mar 25, 2015
Specify Body locking with ExclusiveStreamReader.
Used 'passed flag' instead of 'used flag' as discussed
at w3c/ServiceWorker#452.
@yutakahirano

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2015

Hi!

Sorry that it took so long, but I updated the draft.
Please let me know if it has any errors regarding bodyUsed, passed flag, locked flag, etc.

dstockwell pushed a commit to dstockwell/blink that referenced this issue Apr 2, 2015
[Fetch] Body consume function should not set bodyUsed flag.
As discussed at [1], functions such as Response.text() should not set
'body passed' flag and the body will be accessible again when the consumption
is done.

1: w3c/ServiceWorker#452

BUG=472931

Review URL: https://codereview.chromium.org/1049983003

git-svn-id: svn://svn.chromium.org/blink/trunk@193054 bbb929c8-8fbe-4397-9dbb-9b2b20218538
dstockwell pushed a commit to dstockwell/blink that referenced this issue Apr 3, 2015
[Fetch] Request constructor should reflect body consumption.
As discussed at [1], new Request(req) should generate a Request object
having body that was unconsumed at the calling time.

1: w3c/ServiceWorker#452

BUG=472931

Review URL: https://codereview.chromium.org/1056783002

git-svn-id: svn://svn.chromium.org/blink/trunk@193069 bbb929c8-8fbe-4397-9dbb-9b2b20218538
dstockwell pushed a commit to dstockwell/blink that referenced this issue Apr 3, 2015
[Fetch] Request.clone() should reflect body consumption.
As discussed at [1], req.clone() should generate a Request object
having body that was unconsumed at the calling time.

1: w3c/ServiceWorker#452

This CL also adds similar tests for Response.
BUG=472931

Review URL: https://codereview.chromium.org/1056813002

git-svn-id: svn://svn.chromium.org/blink/trunk@193073 bbb929c8-8fbe-4397-9dbb-9b2b20218538
@annevk

This comment has been minimized.

Copy link
Member

commented Apr 5, 2015

I suggest any further review is done in that repository until the merge happens (at which point we should switch to https://github.com/whatwg/fetch I think for anything new).

@annevk annevk closed this Apr 5, 2015

dstockwell pushed a commit to dstockwell/chromium that referenced this issue Sep 23, 2015
[Fetch] Support various operations after reading data partially.
As we agreed to support various operations such as cache.put after reading
data partially through the body stream[1], let's support it.

1: w3c/ServiceWorker#452
BUG=435393
R=horo@chromium.org

Review URL: https://codereview.chromium.org/1018243002

git-svn-id: svn://svn.chromium.org/blink/trunk@192228 bbb929c8-8fbe-4397-9dbb-9b2b20218538
dstockwell pushed a commit to dstockwell/chromium that referenced this issue Sep 23, 2015
[Fetch] Body consume function should not set bodyUsed flag.
As discussed at [1], functions such as Response.text() should not set
'body passed' flag and the body will be accessible again when the consumption
is done.

1: w3c/ServiceWorker#452

BUG=472931

Review URL: https://codereview.chromium.org/1049983003

git-svn-id: svn://svn.chromium.org/blink/trunk@193054 bbb929c8-8fbe-4397-9dbb-9b2b20218538
dstockwell pushed a commit to dstockwell/chromium that referenced this issue Sep 23, 2015
[Fetch] Request constructor should reflect body consumption.
As discussed at [1], new Request(req) should generate a Request object
having body that was unconsumed at the calling time.

1: w3c/ServiceWorker#452

BUG=472931

Review URL: https://codereview.chromium.org/1056783002

git-svn-id: svn://svn.chromium.org/blink/trunk@193069 bbb929c8-8fbe-4397-9dbb-9b2b20218538
dstockwell pushed a commit to dstockwell/chromium that referenced this issue Sep 23, 2015
[Fetch] Request.clone() should reflect body consumption.
As discussed at [1], req.clone() should generate a Request object
having body that was unconsumed at the calling time.

1: w3c/ServiceWorker#452

This CL also adds similar tests for Response.
BUG=472931

Review URL: https://codereview.chromium.org/1056813002

git-svn-id: svn://svn.chromium.org/blink/trunk@193073 bbb929c8-8fbe-4397-9dbb-9b2b20218538
@xgqfrms-GitHub

This comment has been minimized.

Copy link

commented Nov 8, 2017

TypeError: body stream already read at FetchDatas.fetch.then ???

what.s wrong with this? Anybody can help?

promise-bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.