Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incumbent/fetching record from Cache behavior #1190

Merged
merged 6 commits into from Oct 13, 2017

Conversation

@jungkees
Copy link
Collaborator

commented Aug 22, 2017

This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.

This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.

Fixes #884.


Preview | Diff

This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.

This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.

Fixes #884.
@jungkees jungkees requested a review from jakearchibald Aug 22, 2017
@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 22, 2017

@jakearchibald

This comment has been minimized.

Copy link
Contributor

commented Aug 25, 2017

Whoop! We now have PR previews!

Copy link
Contributor

left a comment

It isn't a problem caused by this PR, but I think we're using JS objects too much in the cache spec. Ideally we should only create new Request and Response objects when we're about to return them, on the main thread.

@@ -1861,13 +1861,9 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
<section>
<h3 id="cache-constructs">Constructs</h3>

A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of |request| (a {{Request}} object) and |response| (a {{Response}} object).

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

Does it make sense to be storing JS objects here? Would it be better to store the concepts and create the objects just as we return them? (I realise that wasn't changed in this PR)

This comment has been minimized.

Copy link
@wanderview

wanderview Aug 25, 2017

Member

Yea, what Jake recommends matches implementations. We explicitly don't always return the same Response js object from match() even if its stored in the same entry.

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

Absolutely. I changed it to store the request and response structs instead of the JS objects and to create the JS objects in matchAll() and keys() when requested.

1. For each <a>fetching record</a> |entry| of its <a>request to response map</a>, in key insertion order:
1. Add a copy of |entry|.\[[value]] to |responseArray|.
1. [=list/For each=] |item| of the [=context object=]'s [=request response list=]:
1. Add a copy of |item|'s |response| to |responseArray|.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

Given that it's a JS object, we can't really just say "copy", but I think the right answer here is to use concepts rather than objects, where "copy" appears to be fine.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

This could be part of an additional PR though, as this PR doesn't introduce this problem.

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

I changed it to using the internal structs instead of the JS objects, so it seems fine. But as Fetch provides the cloning algorithm, I'll look at it further whether using that'd make it more precise.

1. And then, if an exception was <a lt="throw">thrown</a>, then:
1. Set the <a>context object</a>'s <a>request to response map</a> to |itemsCopy|.
1. Set |cache| to |itemsCopy|.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

Isn't this just setting a local variable? As in, it no longer reverts the operation.

Again, this isn't a new problem, but isn't there a bit of a race condition here? Since we're replacing the whole cache with an older copy, there may be concurrent operations of this resulting in data loss.

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

Isn't this just setting a local variable?

I think |cache| becomes a reference to the request response list. It's confusing though.

isn't there a bit of a race condition here?

I wanted making the cache write atomic (the step 3.3) would do some magic, but it's absolutely a part that I should audit and improve.

</section>

<section algorithm>
<h3 id="batch-cache-operations-algorithm"><dfn>Batch Cache Operations</dfn></h3>

: Input
:: |operations|, an array of {{CacheBatchOperation}} dictionary objects
:: |operations|, a [=list=] of {{CacheBatchOperation}} dictionary objects

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

We never expose this, so it doesn't need to be a dictionary right?

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

Yes. I replaced it with a struct.

1. Let |resultArray| be an empty array.
1. For each |operation| in |operations|:
1. Let |resultList| be an empty [=list=].
1. [=list/For each=] |operation| in |operations|:
1. If |operation|.{{CacheBatchOperation/type}} matches neither "delete" nor "put", <a>throw</a> a <code>TypeError</code>.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

We should probably make this an enum or list of possible values.

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

For now, I made it as possible values for the newly defined cache batch operations's type item.

Note: The cache commit is allowed as long as the response's headers are available.

1. Set |requestResponseList| to the result of running [=Query Cache=] with |operation|.{{CacheBatchOperation/request}}.
1. If |requestResponseList| [=list/is not empty=], [=list/replace=] the [=list/item=] of |cache| that matches |requestResponseList|[0] with |operation|.{{CacheBatchOperation/request}}/|operation|.{{CacheBatchOperation/response}}.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

What if there are multiple matches, shouldn't we be removing those? It might be easiest to remove all matches then just append the new entry.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Aug 25, 2017

Contributor

Oh this is now done in addAll. We should probably just append here in that case.

This comment has been minimized.

Copy link
@jungkees

jungkees Sep 28, 2017

Author Collaborator

I addressed it as you suggested. I think we still can do better for the return value of Batch Cache Operations somehow. I'll look into it as a separate work.

@wanderview

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

Since this PR touches the case where the body stream errors during a put(), it would be nice to add WPT tests to cover that. I think it should be somewhat easy to do now that we have ReadableStream bodies.

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 29, 2017

@jakearchibald, @wanderview, thanks for reviewing. I couldn't make it before I leave for a vacation. I'm off until 6th of Sep. Will follow up on it when I come back.

Jungkee Song
The changes include:
 - Replace CacheBatchOperations dictionary which isn't exposed to
   JavaScript surface with cache batch operations struct.
 - Do not store JS objects in the storage but store request and response
   structs instead.
 - Create and return JS objects in the target realm when requested (from
   matchAll() and keys()).
 - Simplify "put" operation related steps by moving/refactoring the
   post-Batch Cache Operation steps, which clear the invalid items, from
   addAll() and put() into Batch Cache Operations.
 - Move the argument validation steps of cache.keys() out of the
   parallel thread to main thread.
 - Fix cacheStorage.keys() to run the steps async. (For now, it still
   runs just in parallel, but later I plan to use the parallel queue
   concept: https://html.spec.whatwg.org/#parallel-queue).
@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 28, 2017

Sorry for coming back to this late. PTAL.

Copy link
Contributor

left a comment

This is coming along nicely!


A <a>fetching record</a> has an associated <dfn id="dfn-incumbent-record">incumbent record</dfn> (a <a>fetching record</a>). It is initially set to null.
When a [=request response list=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance that the [=context object=] represents, unless specified otherwise.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

Feels like this would be better as a new definition. As in:

The relevant request response list is the instance that the context object represents.

Then it can be linked to when used.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Good idea. Addressed.


Each [=/origin=] has an associated <a>name to cache map</a>.
When a [=name to cache map=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance of the [=context object=]'s associated [=CacheStorage/global object=]'s [=environment settings object=]'s [=environment settings object/origin=], unless specified otherwise.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

As above.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Addressed.

1. Run these substeps <a>in parallel</a>:
1. Let |responseArray| be an empty array.
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
1. Let |realm| be the [=current Realm Record=].

This comment has been minimized.

Copy link
@jakearchibald

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

I referenced it from the example in https://html.spec.whatwg.org/#event-loop-for-spec-authors. But I think the relevant realm of the context object is correct indeed. Addressed as such.

@@ -1861,15 +1862,15 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
<section>
<h3 id="cache-constructs">Constructs</h3>

A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of a request (a [=/request=]) and a response (a [=/response=]).

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

Should request and response here be defined as for="request response list"? Then they can be linked to when used.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

I think the request and the response definitions then should belong to something like a request response pair concept instead of the list itself. I don't see any other particular needs for the definition of the pair though. It seems okay as-is as the request and the response can be identified as the items of the pairs in the list.

1. [=list/For each=] |requestResponse| of |requestResponses|:
1. Add a copy of |requestResponse|'s response to |responses|.
1. [=Queue a task=], on |promise|'s [=relevant settings object=]'s [=responsible event loop=] using the [=DOM manipulation task source=], to perform the following steps:
1. Let |responseArray| be an empty JavaScript array, in |realm|.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

I think we could create a sequence, then use https://heycam.github.io/webidl/#dfn-create-frozen-array to turn it into an array.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

We need to update the IDL to return a FrozenArray rather than a sequence too.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Addressed. But I want to make sure if I used "in realm" correctly in this change. I suppose the created frozen array is actually converted to a JavaScript array by Web IDL. If so, designating the realm when creating a frozen array as I did here makes sense?


Note: The cache commit is allowed when the response's body is fully received.

* To [=process response done=] for |response|, do nothing.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

I don't think we need this line.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Removed.

1. If |r|'s [=request/method=] is not \`<code>GET</code>\` and |options|.ignoreMethod is false, return [=a promise resolved with=] an empty array.
1. Else if |request| is a string, then:
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
1. Let |realm| be the [=current Realm Record=].

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

As above, we might not need this if we use frozenarray.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

In the above case, I changed it to using a frozen array but still left "in realm" when creating the frozen array. If we use frozen array here, don't we need to specify a realm?

1. [=map/For each=] <var ignore>cacheName</var> → |cache| of the [=name to cache map=]:
1. Set |promise| to the result of [=transforming=] itself with a fulfillment handler that, when called with argument |response|, performs the following substeps [=in parallel=]:
1. If |response| is not undefined, return |response|.
1. Return the result of running the algorithm specified in {{Cache/match(request, options)}} method of {{Cache}} interface with |request| and |options| as the arguments (providing |cache| as thisArgument to the `\[[Call]]` internal method of {{Cache/match(request, options)}}.)

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

I don't think you need the slash in [[Call]]

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Addressed.

1. Return true.
1. Return false.
1. Resolve |promise| with true.
1. Abort these steps.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

Could probably roll this into the line above. "Resolve promise with true and abort these steps".

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

Addressed.

1. If |cacheExists| is true, then:
1. Delete a <a>Record</a> {\[[key]], \[[value]]} <var ignore>entry</var> from its <a>name to cache map</a> where |cacheName| matches entry.\[[key]].
1. [=map/Remove=] the [=name to cache map=][|cacheName|].
1. Return true.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Sep 29, 2017

Contributor

I'm not sure we can "return" from in parallel steps.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 11, 2017

Author Collaborator

I'm not sure we can. For this particular case, I removed "in parallel". I think that's okay as the fulfillment handler's already scheduled async in the microtask queue. But for other similar cases, I didn't change them in this PR as I wasn't sure if they can all run in the main thread. I'll look at them as a separate work.

This comment has been minimized.

Copy link
@jakearchibald

jakearchibald Oct 11, 2017

Contributor

Although it's on the microtask queue, it's still blocking the event loop. I think we need to create a promise and return it.

This comment has been minimized.

Copy link
@jungkees

jungkees Oct 13, 2017

Author Collaborator

Yes, actually the fulfillment handler steps were very much incorrect. I made them run in the event loop and create a promise there and resolve/reject the promise from the parallel steps. Also, while changing them, I changed the interface of Batch Cache Operations such that it runs synchronously without returning a promise and the call sites invoke it from a promise job.

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 11, 2017

@jakearchibald, thanks for reviewing. I addressed your comments. PTAL.

@wanderview

This comment has been minimized.

Copy link
Member

commented Oct 11, 2017

I'm sorry, but I don't think I will have time to review this. Just FYI, so you don't wait for me. Thanks for working on this.

@jakearchibald

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2017

@annevk @domenic

We've got a couple of instances in the service worker spec that follow this pattern:

  1. Let realm be the context object's relevant realm.
  2. Then later, in parallel:
  3. Resolve promise with a frozen array created from someList, in realm.

…where someList is an infra list.

Do we need to do this? I thought it would be enough to resolve the promise with someList, and IDL takes care of the FrozenArray creation in the correct realm, but I can't find a direct spec reference for that.

@domenic

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2017

You need to convert lists into FrozenArrays while specifying the realm; there's no way IDL could automatically figure out what kind of object you want to convert it to, or what global you want to create it in.

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 12, 2017

Okay. So, my try seems to be okay here.

@annevk

This comment has been minimized.

Copy link
Member

commented Oct 12, 2017

@jakearchibald you don't resolve from "in parallel". You need to queue a task on an event loop. That event loop will have a realm that you can use to create the frozen array and resolve the promise. (You don't get in parallel access to resolve either.)

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 12, 2017

@annevk, that queue a task step is missing in @jakearchibald's example above.

Resolve promise with a frozen array created from someList, in realm.

is run in a queued task as you pointed indeed.

@jakearchibald

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2017

@annevk "resolve" automatically queues a task on the promise's event loop https://www.w3.org/2001/tag/doc/promises-guide#shorthand-manipulating.

@jakearchibald

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2017

@domenic The return type is defined in IDL, so I thought it be capable of some casting. A promise created in one realm could resolve with an object from another, but that feels like the exception.

I guess I could write my own helper to do this.

@annevk

This comment has been minimized.

Copy link
Member

commented Oct 12, 2017

@jakearchibald that hook is broken. It doesn't allow you to specify the task source.

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 12, 2017

Does it make sense to define some sort of default task source (used when not specified otherwise) for promise jobs? That hook is actually handy and makes it read simple in many cases.

@annevk

This comment has been minimized.

Copy link
Member

commented Oct 12, 2017

Action-at-a-distance leads to harder to understand algorithms, I think. If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

- Adjust variable scope
- Change to early-exit in fetch abort cases
- Fix async steps of fulfillment handlers
- Change the interface of Batch Cache Operations algorithm
 . Change to not return a promise
 . Remove the in parallel steps and make it work synchronously
 . Change the call sites to call it in a created promise's in parallel
   steps
@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 13, 2017

If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

Yes, I think this would help simplifying the caller side steps. I'll take a look when I find time for it.

@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 13, 2017

@jakearchibald, I uploaded another snapshot addressing your additional comments. PTAL.

Copy link
Contributor

left a comment

LGTM! A huge step forward

@jungkees jungkees merged commit c8ab714 into master Oct 13, 2017
@jungkees jungkees deleted the remove-incumbent-fetching-concept branch Oct 13, 2017
@jungkees

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 13, 2017

@jakearchibald, thanks a lot for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.