Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd format for fetch callbacks #536

Closed
jyasskin opened this issue May 2, 2017 · 32 comments · Fixed by #1165
Closed

Odd format for fetch callbacks #536

jyasskin opened this issue May 2, 2017 · 32 comments · Fixed by #1165

Comments

@jyasskin
Copy link
Member

@jyasskin jyasskin commented May 2, 2017

https://fetch.spec.whatwg.org/#fetch-method has:

  1. Run the following in parallel:

"process response" and "process response done" are defined as "Tasks that are queued by this standard are annotated as one of: ...", but they actually appear to be customization points within the Fetch algorithm for which the caller can provide an implementation. Assuming that's right, then:

  1. It doesn't make sense to say to run the fetch and its callbacks in parallel. Instead, we should run the fetch in parallel with the callbacks passed in.
  2. The callbacks need their defaults defined for when callers don't pass them in. I think the defaults are to be no-ops?
  3. The callbacks should be defined as customization points, not just annotations.
@annevk
Copy link
Member

@annevk annevk commented May 3, 2017

but they actually appear to be customization points within the Fetch algorithm for which the caller can provide an implementation

I don't think that's the case. What makes you think that?

@jyasskin
Copy link
Member Author

@jyasskin jyasskin commented May 3, 2017

  1. In https://fetch.spec.whatwg.org/#fetch-method, there's no introduction for response other than the phrase "To process response for", so it's probably a parameter, making that block a function definition.
  2. The uses of "process response", "process response end-of-body", and "process response done" in https://fetch.spec.whatwg.org/#main-fetch say to queue a task to do them, but they don't say what steps to do in them, leaving that up to someone else.
  3. The definitions at https://fetch.spec.whatwg.org/#process-response don't provide steps either, making me think the intended steps are actually the ones provided by #fetch-method.
  4. https://w3c.github.io/ServiceWorker/#cache-addAll and several algorithms in HTML provide other implementations of "process response", and if that's correct, then these implementations must be parameters to Fetch.

@annevk
Copy link
Member

@annevk annevk commented May 3, 2017

I guess you're asking for a more explicit link between the call to fetch and the tasks that end up being queued as a result. It seems somewhat reasonable to link them directly in the way you suggest.

They're not customization points however as these steps happen on different threads/processes and are not accessible from each other.

@jyasskin
Copy link
Member Author

@jyasskin jyasskin commented May 3, 2017

Ah, you have a more precise definition of "customization point" than I do. I just meant that they appear to be parameters that can be selected by the caller to customize how Fetch works, like other kinds of arguments.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Sep 18, 2017

I have a somewhat similar problem. In background fetch I make fetches in parallel, but I want to read the response body as it arrives from the network.

The synchronous flag is closest to what I want, but it waits for the response body before returning.

"process response" is second closest, but it wants to queue a task on an event loop. I just want to process the response in prose. I may not have an event loop at this point.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Sep 19, 2017

I think the callbacks such as "process response" should be algorithmic callbacks, like we did with signal abort.

  1. Add the following process response steps to the ongoing fetch:
    1. Then specs could optionally…
    2. Queue a fetch task on request to run the following steps:

The "ongoing fetch" part is a bit meh. This came up with aborting too. Ideally it should be easier to get a reference to the fetch record, and operations can be called against that.

@annevk
Copy link
Member

@annevk annevk commented Sep 19, 2017

Maybe we should make fetch return a fetch record? And then have "synchronous fetch" wrap that or some such to just return a response?

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Sep 19, 2017

Yeah, I think that would make it much easier to talk about actions relating to a particular fetch.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented May 4, 2018

In retrospect, I wish fetching sync didn't wait for the body, and spec authors had to explicitly wait if needed. Is it too late to make a change like this?

@annevk
Copy link
Member

@annevk annevk commented May 7, 2018

@jakearchibald you mean we would simply not offer a synchronous mode in Fetch? I think refactoring synchronous callers is reasonable still. I actually think refactoring all callers is still reasonable, as there's not that many and not all have even integrated yet to begin with.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented May 7, 2018

you mean we would simply not offer a synchronous mode in Fetch?

I think sync fetching is useful in in-parallel steps, but I'm running into cases where I want to do something like (excuse the hand-waving):

  1. Let response be the result of fetching request.
  2. If response is opaque, abort the fetch and abort these steps.
  3. Stream response into something.

I can't do the above, as step one involves waiting for the whole body, so the aborting happens too late, and the streaming is kinda pointless.

If a sync fetch returned before waiting for the body, it'd work fine. In the cases where APIs needed the whole body (like sync XHR), they could do:

  1. Let response be the result of fetching request.
  2. Wait for response's body.

@annevk
Copy link
Member

@annevk annevk commented May 7, 2018

Why can't you use the process response callback? In any event, I think if we improve the basics first, as suggested in OP, we can then build better higher-level functionality on top.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented May 7, 2018

Why can't you use the process response callback?

It queues a task, and I have no reason to get back onto the main thread. In the background fetch case I may not have an event loop to queue into.

@annevk
Copy link
Member

@annevk annevk commented May 8, 2018

I see. I wonder if we can untangle the task queueing somehow so that there's a lower-level hook you can use if you're already in parallel.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented May 8, 2018

Yeah, I guess you could remove the task queuing from fetch and add it to all APIs that need to be back on the main thread.

@gterzian
Copy link
Member

@gterzian gterzian commented Jun 6, 2020

Have you though of defining a parallel queue, and then instead of queuing tasks, queue steps on the parallel queue, and define that those steps would call into callbacks(still "in-parallel"), such as process_response.

So for example the current definition of process_response as used in fetch, could be reworded as:

  1. If locallyAborted is true, terminate these substeps. (note: this happens fully "in-parallel", with locallyAborted being a local variable of the parallel steps).
  2. Queue a task using the networking task-source to:
    2.1 /// Add current steps 2 -> 6 here, running inside the queued task.

Whereas in the background-fetch use case, the callbacks simply operate on the "local" data of the in-parallel algorithm that started the fetch, so there is no need to queue a task(so the spec could simply remove the note to this issue, and it would work "as-is" really).

It would be somewhat a weird use of a parallel-queue, although it could make sense if one thinks about it as having parallel steps "register callbacks" with this parallel queue, and then fetch queuing the steps on it that would invoke those callbacks.

Since data like the "request" are passed around those callbacks, and I think multiple specs could register callbacks, we'd have to be somewhat careful not to introduce race condition(for example the "transmitted bytes" member of body is written to by fetch, then read by background fetch in the callback, which is fine and could get interesting if multiple callbacks were to be registered by different specs, and some would for some reason write to the variable as well).

@annevk
Copy link
Member

@annevk annevk commented Jun 8, 2020

It's still not entirely obvious to me what abstraction would work well here. Some thoughts:

  • It's useful that fetch can be invoked from the main thread as a bunch of state (e.g., a mutable CSP policy) is passed in that doesn't really work from "in parallel" land (because it can be mutated). Perhaps though that is something that needs sorting out so that such state can be copied at the right time and passed across the in parallel boundary with ease.
  • If we want to support full duplex the model where you go in parallel, fetch, and then wait for the response, the body, and the trailers, doesn't work, as you wouldn't be able to send the request body and request trailers at the same time. (No browser supports full duplex though and even with streams nobody seems interested in supporting it. That is, the full request body would have to be transmitted before you start seeing response bytes.)

@gterzian the reason to use a parallel queue I guess is to ensure that you do not process the response body before the response headers? It might work I suppose, but it would have to be flushed out a bit to ensure the right information is passed around to be able to queue tasks from there and such. (Tasks need a bit more information these days.)

And I guess to support synchronous fetching we could use "Wait" (which we should probably define somewhere) in the main thread that looks for some variable shared with in parallel to flip.

@annevk
Copy link
Member

@annevk annevk commented Jun 8, 2020

Actually the DOM fetch method calls the second one from "in-parallel" steps(see step 9).

To be clear, I think that's problematic, as it means the CSP policy gets enforced from in parallel and it isn't clear at what point it is copied to make that deterministic. (In part this is a separate issue, but I find it useful to think about as it influences what kind of algorithms we should be offering to standards.)


I guess since the callbacks take a set of steps you could indeed give them any kind of variables they might need to take hold of. It does seem like that approach might work quite well.

I'd love it if one or more of @yutakahirano, @jyasskin, @jakearchibald, @domenic, and maybe @jungkees could take a look as well.

(I suspect that for most standards we want to provide a wrapper algorithm that preserves the status quo more or less, whereby you invoke "fetch with tasks" or some such and it continues queuing tasks for the callback steps.)

@gterzian
Copy link
Member

@gterzian gterzian commented Jun 8, 2020

It's useful that fetch can be invoked from the main thread as a bunch of state (e.g., a mutable CSP policy) is passed in that doesn't really work from "in parallel" land (because it can be mutated). Perhaps though that is something that needs sorting out so that such state can be copied at the right time and passed across the in parallel boundary with ease.

Yes I think the background-fetch provides a nice example of that, since it defines a DOM available method, which when called creates a bunch of "non-DOM" data, like a https://wicg.github.io/background-fetch/#background-fetch-record, and then this data holds enough information to later queue tasks back on the event-loop to communicate various progress.

Also, the DOM method is called from the event-loop, and later the actual fetch is started from parallel steps started by the DOM method(similar to how DOM global fetch calls fetch from parallel steps, see step 9)

@gterzian the reason to use a parallel queue I guess is to ensure that you do not process the response body before the response headers? It might work I suppose, but it would have to be flushed out a bit to ensure the right information is passed around to be able to queue tasks from there and such. (Tasks need a bit more information these days.)

Yes, the ordering of the callbacks is something I hadn't even thought of, and I was mostly thinking about the fact that those callbacks are asynchronous also from the perspective of fetch(currently as tasks are queued, and this would be preserved if steps were enqueued on a parallel queue).

And with regards to "the right information is passed around to be able to queue tasks from there", I think the bgfetch spec again provides a nice example, for example see https://wicg.github.io/background-fetch/#update-background-fetch-instances, which is called into from the process_response callback defined by that spec, and which operate on local "in-parallel data", and then queues tasks back on the relevant event-loop to "continue" the algorithm there.

So I essentially think that by defining the "fetch callbacks" as tasks, they are tied to running on an event-loop, whereas the background-fetch spec is a good example where those are rather callbacks running on other in-parallel steps(which then subsequently queue tasks, based on their own locally owned data).

@gterzian
Copy link
Member

@gterzian gterzian commented Jun 8, 2020

To be clear, I think that's problematic, as it means the CSP policy gets enforced from in parallel and it isn't clear at what point it is copied to make that deterministic. (In part this is a separate issue, but I find it useful to think about as it influences what kind of algorithms we should be offering to standards.)

Ok, thanks for pointing that out (and sorry I deleted my comment, because I wanted to actually edit it and re-post, but your response beat me to it).

@gterzian
Copy link
Member

@gterzian gterzian commented Jun 14, 2020

I think this could provide a concrete example of a problem with the current approach around "process response" and similar tasks(if the issue is correct, that is): w3c/ServiceWorker#1517

@annevk
Copy link
Member

@annevk annevk commented Feb 8, 2021

I keep coming back to this as it's such a fundamental issue:

  1. fetch() should not go in parallel itself, that's just wrong: #1163.
  2. In principle most of fetch is designed to be invoked from the main thread as global state is available there. For those wanting to invoke fetch while in parallel, more work needs to be done to adequately capture that global state at the right point in time. I do think we should make that work, to be clear, and policy containers will hopefully help with this.
  3. I do think it makes sense for the fetch algorithm to explicitly take these "callbacks" as parameters. Currently we have five callbacks listed at https://fetch.spec.whatwg.org/#process-request-body. Apart from making these explicit parameters I don't think there is much that needs to change here for main thread usage of fetch.
    1. We need to have an easy way to get at the bytes of a response body. (We could just expose a specification-only byte sequence on responses. We could allow specifications to treat ReadableStream of responses for which end-of-file has run as equivalent to a byte sequence (perhaps with some magic conversion algorithm that does the relevant asserts). Thoughts appreciated.)
  4. I don't have a good solution for "in parallel" callers, especially those that want to process a response incrementally. Perhaps a return value is still a good fit here and the API would be that you "wait" for things to become non-null.
  5. We might also want a return value for main thread callers that is equivalent to the above except that it would be filled in from tasks.
  6. The return value could also expose terminate, soundly coupling it to the fetch.

@jakearchibald @domenic @jyasskin @ricea @gterzian thoughts?

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Feb 8, 2021

Where will those callbacks run? In parallel or on the main thread?

@annevk
Copy link
Member

@annevk annevk commented Feb 8, 2021

The callbacks mentioned in 3 would run on the main thread in tasks queued on the networking task source. That's basically what happens today, except that this provides a clearer coupling with the fetch algorithm (which also means it would not have to queue tasks if they are not provided and such).

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Feb 8, 2021

Hm, it's the task queuing that's caused issues for me. Eg step 11 of https://wicg.github.io/background-fetch/#complete-a-record

@annevk
Copy link
Member

@annevk annevk commented Feb 8, 2021

Right, for in parallel callers (which I think Background Fetch is, otherwise main thread should be fine) 4 applies. Though looking at what you wrote down there you might have to go in parallel again for the fetch itself so the overall abort can still come through. In parallel doesn't support anything like callbacks as in JavaScript so you have to build that up yourself.

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Feb 8, 2021

Fair enough, but…

In parallel doesn't support anything like callbacks

I don't see why it couldn't. It kinda feels like parallel queues already do.

@annevk
Copy link
Member

@annevk annevk commented Feb 8, 2021

I'm not sure I understand what the suggestion is. Would the caller supply a parallel queue? Would fetch return one somehow when asked? Wouldn't that still have to run in parallel from the thread dealing with the "abort all flag" to ensure that's not blocked? Is that fundamentally different from waiting for some bits of a response to become non-null or is it just better ergonomics?

@jakearchibald
Copy link
Collaborator

@jakearchibald jakearchibald commented Feb 8, 2021

I mean just allow prose like step 7 of https://wicg.github.io/background-fetch/#complete-a-record, but in a way that doesn't bounce back to the main thread. I think it's fine to go in parallel from parallel steps.

@annevk
Copy link
Member

@annevk annevk commented Feb 8, 2021

That's certainly feasible, but it means that fetch itself doesn't block. That seems fine though and in fact it might be a good idea to remove the synchronous flag and have callers "wait" instead when they need something like that (and synchronous XMLHttpRequest could request the parallel queue style callbacks and wait on them on the main thread). I guess I'll first work on these "callback" parameters and figure out the possible return value with terminate operation later on.

@domenic
Copy link
Member

@domenic domenic commented Feb 8, 2021

Everything in #536 (comment) makes general sense to me. In particular, I'd love to be able to write spec text that's like the following:

  1. Fetch request, using the following steps to [process the response] response:
    1. ... stuff with response ...

I think it'd be somewhat tempting to be able to write text like the following too:

  1. Fetch request. When the fetch completes with response, continue onto the following steps:
  2. ... stuff with response, but not indented ...

but on balance I think this is inadvisable, as then you can easily get into trouble if you accidentally use "return" or "throw" from those non-idented steps.

I'm thinking how this (indented) pattern would look if we carried it over to https://html.spec.whatwg.org/#fetching-scripts, and I think it would be reasonable. We'd define an explicit callback parameter (e.g. "process the module script") for those, and thread everything through with some extra indenting, instead of using the current "Asynchronously complete this algorithm with X". The most complex it would get would be in https://html.spec.whatwg.org/#fetch-the-descendants-of-a-module-script step 8, which could be formalized nicely with infrastructure like this, at the cost of some complexity. (It could also allow us to explicitly terminate ongoing fetches if one of the fetches fail.)

On an editorial side, for some reason callbacks namedLikeThis seem ickier, so if we made them real optional paramters I'm not sure I'd want to update their names, but that doesn't matter much.

We need to have an easy way to get at the bytes of a response body. (We could just expose a specification-only byte sequence on responses. We could allow specifications to treat ReadableStream of responses for which end-of-file has run as equivalent to a byte sequence (perhaps with some magic conversion algorithm that does the relevant asserts). Thoughts appreciated.)

Exposing a byte sequence seems critical. I think getting the byte sequence should consume the body though, so this would be a byte sequence variant of the spec's current "consume body" algorithm. It could still operate on streams, with relevant asserts. I guess it would need to be called from the main thread, so it'd be called from "process response" hooks?

We might also want a return value for main thread callers that is equivalent to the above except that it would be filled in from tasks.

What would this do besides expose terminate? How would one use it?

@annevk
Copy link
Member

@annevk annevk commented Feb 9, 2021

I haven't made progress on the body question yet.

I don't have real uses for the return value at this point other than exposing terminate, which I think is a plus as it's a shared handle between the fetch algorithm and the caller, which is a much clearer interface than what we have now. (I was thinking we could put other things there too, such as the response or response body as byte sequence, but having worked on this refactor I think I'd like to wait a little bit before adding multiple ways to get at things.)

annevk added a commit that referenced this issue Feb 11, 2021
This makes some rather big changes:

* Requests no longer have a synchronous flag. Blocking a thread is now up to the caller.
* Fetch therefore no longer returns a response directly. In parallel callers will need to pass in "callbacks" that are either invoked on a parallel queue or on an event loop.
* To hold onto these callbacks as well as some other information a fetch params struct is now passed around the various fetch algorithms. This will allow for cleanup around termination and aborting in the future. Potentially some bookkeeping state from request can move there going forward.
* There's a dedicated navigate-redirect fetch algorithm for HTML as the HTTP-redirect fetch algorithm now wants a fetch params instance.
* Some allowance for aborting early on in fetch was removed as all that is run synchronously with the code that invoked fetch to begin with. Closes #1164.
* Algorithms that needed to be adjusted were changed to use the new Infra patterns for parameters. I also tried to improve naming, e.g., makeCORSPreflight rather than CORS-preflight flag.
* "process response done" was removed as it seemed redundant with "response response end-of-body"; possible leftover from trailers.
* Removed duplicate task queueing at the end of main fetch for request's body.

Fixes #536.
@annevk annevk closed this in #1165 Feb 12, 2021
annevk added a commit that referenced this issue Feb 12, 2021
This makes some rather big changes:

* Requests no longer have a synchronous flag. Blocking a thread is now up to the caller.
* Fetch therefore no longer returns a response directly. In parallel callers will need to pass in "callbacks" that are either invoked on a parallel queue or on an event loop.
* To hold onto these callbacks as well as some other information a fetch params struct is now passed around the various fetch algorithms. This will allow for cleanup around termination and aborting in the future. Potentially some bookkeeping state from request can move there going forward.
* There's a dedicated navigate-redirect fetch algorithm for HTML as the HTTP-redirect fetch algorithm now wants a fetch params instance. This also returns a response which fixes a problem where HTTP-redirect fetch would return a network error that was then ignored.
* Some allowance for aborting early on in fetch was removed as all that is run synchronously with the code that invoked fetch to begin with. Closes #1164.
* Algorithms that needed to be adjusted were changed to use the new Infra patterns for parameters. I also tried to improve naming, e.g., makeCORSPreflight rather than CORS-preflight flag.
* "process response done" was removed as it seemed redundant with "response response end-of-body"; possible leftover from trailers.
* Removed duplicate task queuing at the end of main fetch for request's body.

Fixes #536.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
@jyasskin @jakearchibald @domenic @annevk @gterzian and others