Reading lists: add batch endpoints #944

tgr · 2018-01-15T11:47:12Z

Adds a set of new endpoints which are identical to the existing
list / lisr entry create / update / delete endpoints, except
they take multiple lists / entries as input and return multiple
outputs. This is a departure from REST semantics but was deemed
important for performance reasons, and does not cause problems
since the endpoints are not cacheable anyway.

Also fix a small documentation error with relative domain names.

Bug: T182052

tgr · 2018-01-15T11:50:12Z

Depends on https://gerrit.wikimedia.org/r/#/c/403361/ which is getting deployed this week. The RL endpoint is marked as experimental though, so it shouldn't be a problem if that patch is not deployed everywhere yet.

Pchelolo · 2018-01-17T19:53:44Z

I didn't go into the details just yet, first a couple of generic questions.

Are the apps already out and using this in the wild? If no, why are we introducing completely new endpoints for batch operations. What's the distinction between creating a single list and creating a batch of lists containing 1 element. I think if we want to support batches we should change the API so that there's only batch endpoints and no duplication between single/batch operations. If yes - we can still support old-style API for backwards compatibility, switch to new batch-only API and then remove the fallback later. This will allow to eliminate all the new endpoints except DELETE, however if we make DELETE accept ids in a BODY (which's totally legal in HTTP spec) we can usde this approach for everything.
As a more drastic approach, there are some patterns that might allow us to remove all of these endpoints entirely - support HTTP multipart batched requests.

Here's some examples:

TLDR of the idea is to have one endpoint /lists/batch or something like it where you can just stash all requests you need. The service then will be smart enough to parse the requests, group them into appropriate batches and exdcute. I'm not saying it's definitely better, just beginning a discussion.

tgr · 2018-01-17T21:01:02Z

Thanks for reviewing!

The apps already have code for the non-batch endpoints, but it is not exposed to users yet. They asked for the existing endpoints to be preserved; I'll ask about removing them in the longer term. It would certainly make the service more manageable.
According to RFC 7231 A payload within a DELETE request message has no defined semantics; sending a payload body on a DELETE request might cause some existing implementations to reject the request. preq does not return the body; also apparently not well-supported in some clients. Arguably not really in the spirit of the specification as well since DELETE should remove the resource referred to by the URI (of course that goes for PUT as well).
Multipart requests sound very interesting, but also a bit scary: apart from the question of client / library support (not really supported by browsers for example) it seems tricky to decide how to batch (on the DB level we want to put operations of the same type in a single query, OTOH trying to optimize create list 1 ; delete list 1 ; create list 1 into create list 1 ; create list 1 ; delete list 1 would end badly.

Pchelolo · 2018-01-17T21:06:48Z

The apps already have code for the non-batch endpoints, but it is not exposed to users yet. They asked for the existing endpoints to be preserved; I'll ask about removing them in the longer term. It would certainly make the service more manageable.

I think long-term it's better to stick with a single set of endpoints, so it would be better if the apps change the code. Short term we can still support single endpoint under the hood, just not advertise them to the clients ( remove from the documentation )

Multipart requests sound very interesting, but also a bit scary: apart from the question of client / library support (not really supported by browsers for example) it seems tricky to decide how to batch (on the DB level we want to put operations of the same type in a single query, OTOH trying to optimize create list 1 ; delete list 1 ; create list 1 into create list 1 ; create list 1 ; delete list 1 would end badly.

That's what's I'm worried about as well, that's why I've said it's a drastic approach :) We can think about specifying that batching of only a single type of requests is allowed and return 400 if that requirement is violated. However even given that not all list-related request types are batchable this might create more confusion than benefit.

I'll read up a bit more on how people around are doing this, cause it's always better to follow some standard that others use then to invent yet another bicycle.

tgr · 2018-01-18T03:07:59Z

Short term we can still support single endpoint under the hood, just not advertise them to the clients ( remove from the documentation )

That seems pretty inconvenient for developers/testers. How about just adding a note that these endpoints are deprecated?

We can think about specifying that batching of only a single type of requests is allowed and return 400 if that requirement is violated. However even given that not all list-related request types are batchable this might create more confusion than benefit.

That sounds simple enough, assuming client support for multipart requests is OK. For read requests batching doesn't really make sense, and for writes all request types are batchable (except setup/teardown but it wouldn't make any sense there). List entry operations can right now only be batched for the same list, but that could probably be relaxed.

coreyfloyd · 2018-02-08T17:43:00Z

The apps already have code for the non-batch endpoints, but it is not exposed to users yet. They asked for the existing endpoints to be preserved; I'll ask about removing them in the longer term. It would certainly make the service more manageable.

I think long-term it's better to stick with a single set of endpoints, so it would be better if the apps change the code. Short term we can still support single endpoint under the hood, just not advertise them to the clients ( remove from the documentation )

@tgr @Pchelolo Just an FYI, they can change their code - it was preferable to not do this, but this is extremely minor change. So I agree lets get this down to one set of APIs. I'll let them know that a breaking change is coming.

Batching only needs to work for creating new lists. It does not need to work for deleting lists. The use case for batching is first sync - when they have a lot of uploading to do. So lets not worry about adding batching in places that we don't need specifically for the first sync use case. So as long as we can POST multiple lists/items, we are good.

Is this helpful to move this forward?

Pchelolo · 2018-02-08T19:53:41Z

Given the comments from @coreyfloyd I think the plan I'd support would be the following:

Remove the batching for deletes cause it's not needed and it doesn't really comply with the endpoint merging plan.
Transform all the needed endpoints SPEC into batching.
Make non-batch requests still accepted under the hood.

For this particular PR I'd like to see point 1 and 2, and then I will make a subsequent PR to make point number 3 happen in (more-or-less) a generic way.

I agree that multipart requests idea was a very long shot, I was not advocating for it, just proposed it to make sure we discuss all the possibilities.

What do you think @tgr ?

tgr · 2018-02-08T23:09:32Z

I would still keep the old APIs for at least a few weeks, it's a lot easier to write new code when the old one is working and can be used for comparing. After the apps finalize what do they need batching for, we can remove the unneeded endpoints. The whole reading list service is marked as experimental so that should not be problematic.

(Also if we want to experiment with multipart HTTP we should keep the the non-batch endpoint for that.)

Pchelolo · 2018-02-09T00:23:02Z

The idea was that we don't advertise the single endpoints in the spec anymore but still support them. But I'm ok with the plan to merge this now and then gradually remove the single endpoints.

@coreyfloyd mentioned Android team wants this to be merged by Monday if I'm not mistaken, but today is too late for a deploy and tomorrow is Friday, do I can merge-deploy early Monday. Sounds good @tgr ?

berndsi · 2018-02-09T04:19:58Z

If we want to hide some endpoints couldn't we just use some flag in the spec file to do so or if that doesn't work remove the entries from the spec file altogether?

Pchelolo · 2018-02-09T06:24:17Z

We've never tried to hide anything before so we don't have a magic config param, but I can easily add it

berndsi · 2018-02-09T06:38:43Z

Removing the entries from the spec file might be ok, too. Up to you and @tgr, I guess. I don't want to make things more complicated than necessary.

tgr · 2018-02-09T07:47:26Z

I just don't see the benefit of not advertising things. Code is always the best information source as it's unlikely to get out of sync; documentation is best kept in the code, and if an endpoint is exposed, it should be documented somewhere, even if we only use it internally. Also the Swagger-generated API sandbox is much more convenient for testing than manually composing curl requests. The endpoint is marked as experimental (and we can more clearly mark it as deprecated) so I don't think we'd mislead anyone by exposing the documentation.

I'll remove the DELETE batch endpoint tomorrow (although IMO it would make more sense to either keep that or remove the PUT endpoints as well - they server the same use case, to batch-upload changes made offline when the app is able to connect again).

coreyfloyd · 2018-02-09T15:17:13Z

I'll remove the DELETE batch endpoint tomorrow (although IMO it would make more sense to either keep that or remove the PUT endpoints as well - they server the same use case, to batch-upload changes made offline when the app is able to connect again).

Removing the PUT batching seems fine to me as well

Adds a set of new endpoints which are identical to the existing list / lisr entry create / update / delete endpoints, except they take multiple lists / entries as input and return multiple outputs. This is a departure from REST semantics but was deemed important for performance reasons, and does not cause problems since the endpoints are not cacheable anyway. Also fix a small documentation error with relative domain names. Bug: T182052

tgr · 2018-02-11T09:49:53Z

Updated.

tgr force-pushed the readinglists-batch branch from 04a254a to 3aef475 Compare January 15, 2018 12:03

tgr force-pushed the readinglists-batch branch from 3aef475 to 5a698dd Compare February 11, 2018 09:49

tgr mentioned this pull request Feb 11, 2018

Adjust max list size and expose in Swagger response #952

Merged

Pchelolo merged commit ac9f91b into wikimedia:master Feb 12, 2018

tgr mentioned this pull request Feb 12, 2018

Provide timestamp to continue sync from #953

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading lists: add batch endpoints #944

Reading lists: add batch endpoints #944

tgr commented Jan 15, 2018

tgr commented Jan 15, 2018

Pchelolo commented Jan 17, 2018

tgr commented Jan 17, 2018

Pchelolo commented Jan 17, 2018

tgr commented Jan 18, 2018

coreyfloyd commented Feb 8, 2018

Pchelolo commented Feb 8, 2018

tgr commented Feb 8, 2018

Pchelolo commented Feb 9, 2018

berndsi commented Feb 9, 2018

Pchelolo commented Feb 9, 2018

berndsi commented Feb 9, 2018

tgr commented Feb 9, 2018

coreyfloyd commented Feb 9, 2018

tgr commented Feb 11, 2018

Reading lists: add batch endpoints #944

Reading lists: add batch endpoints #944

Conversation

tgr commented Jan 15, 2018

tgr commented Jan 15, 2018

Pchelolo commented Jan 17, 2018

tgr commented Jan 17, 2018

Pchelolo commented Jan 17, 2018

tgr commented Jan 18, 2018

coreyfloyd commented Feb 8, 2018

Pchelolo commented Feb 8, 2018

tgr commented Feb 8, 2018

Pchelolo commented Feb 9, 2018

berndsi commented Feb 9, 2018

Pchelolo commented Feb 9, 2018

berndsi commented Feb 9, 2018

tgr commented Feb 9, 2018

coreyfloyd commented Feb 9, 2018

tgr commented Feb 11, 2018