Table of Contents generated with DocToc
Decision:Leave this meta infomation out of the response bodies for the first iteration.
lastrevid
&modified
- We already get an ETag with the revision ID, and a usable timestamp, that may cover the usage of this meta data
title
&pageid
- Used to jump from Wikibase entities to titles for use with the MediaWiki core APIs (deletion & protection etc)
- Deletion is now provided in our REST API
- We could provide further APIs for other functionality such as protection, undeletion etc.
- Could be used for linking, but
Special:EntityPage
is also an option there
- Used to jump from Wikibase entities to titles for use with the MediaWiki core APIs (deletion & protection etc)
ns
- Will be easily available via the WikibaseManifest REST API in the future.
- The ID itself is probably not that useful anyway
Research and decision record from the July 2020 prototyping week.
Decision: /entities/items/Q42
This allows us to:
- Easily separate the payload and response examples and schema for different entity types in the spec.
- Allow a world in which Entity IDs may not have their entity types coded into them in some way.
- Allow for clear seperation between entity endpoints and other endpoints (such as statements, badges and anything to come in the future.)
Decision: New edit endpoints should all equally implement certain “default edit parameters” (e.g. bot, baserevid, summary, tags).
In the past various editing interfaces were introduced without a full parameter set. Parameters were often then immediately requested but then not added promptly due to other commitments. To avoid this, the basic editing parameter set will always be added to edit endpoints.
Decision: Offset parameters (in the Wordpress style) will be used (page, per_page, offset).
- ?page=: page number of results to return
- ?per_page=: number of records to return in one request, Int(1-100)
- ?offset=: arbitrary offset at which to start retrieving posts
- ?limit and ?continue
- Different words for combined quieries ?XXlimit ?XXcontinue
- The word is formed of an abbreviation of the property + ‘limit’ or ‘continue’,
- i.e. prop=duplicatefiles (df) params: dflimit, dfcontinue
Retrieves the 20 rows starting with the 100th row
GET /items?limit=20&offset=100
Simplest form of paging. Became popular with apps using SQL DBs (LIMIT and OFFSET are part of the SQL SELECT Syntax.
- Easiest to implement, almost no coding required other than passing parameters directly to SQL query.
- Stateless on the server.
- Works regardless of custom sort_by parameters
- Not performant for large offset values. The DB needs to scan and count rows starting with 0, and will skip $offset rows.
- Not consistent when new items are inserted to the table (i.e. Page drift) especially noticeable when ordering items by newest first
Retrieves the 20 most recent items
GET /items?limit=20
Next page, client finds the minimum created date of 2018-01-20T00:00:00 from previously returned results.
GET /items?limit=20&created:lte:2018-01-20T00:00:00
- Works with existing filters without additional backend logic.
- Only additional limit parameters needed
- Consistent ordering even when newer items are inserted into the table. Works well when sorting by most recent first.
- Consistent performance even with large offsets.
- Tight coupling of paging mechanism to filters and sorting. Forces API users to add filters even if no filters are intended.
- Does not work for low cardinality fields such as enum strings.
- Complicated for API users when using custom sort_by fields as the client needs to adjust the filter based on the field used for sorting.
Status Quo: This fell out of the SCOPE of the spec, thus no decision was really made.
- URL parameters
- For exact matches: /items?state=active
- For non-exact matches:
- Filters are composed by:
- The property or field name
- The operator such as eq, lte, gte
- The filter value
- Filters are composed by:
Sources:
- https://www.moesif.com/blog/technical/api-design/REST-API-Design-Filtering-Sorting-and-Pagination/
- https://docs.microsoft.com/en-us/graph/query-parameters#filter-parameter
Retrieves all the items where the price is greater than or equal to 10, but less than or equal to 100
GET /items?price[gte]=10&price[lte]=100
- Ease of use for clients. Many query string parsing libraries available to encode nested JSON objects into square brackets
- Simple to parse on server side. The URL parameter key contains both the field name and operator. Easy to GROUP BY (property name, operator) without looking at the URL parameter values.
- No need to escape special characters in the filter value when operator is taken as a literal filter term.
- More work on server side to parse and group the filters. Custom URL parameter binder or parser to split the query string key into two components (field name, operator) may be needed
- Special characters in variable names can be awkward. Custom binder to split the query string key into two components may be needed
- Hard to manage custom combinational filters since there is an implicit AND among filters. How to manage OR filters?
Retrieves all the items where the price is greater than or equal to 10, but less than or equal to 100
GET /items??price=gte:10&price=lte:100
- Easiest to parse on server side especially if duplicate filters are not supported.
- No custom binders needed.
- Many API frameworks already handle URL parameter arrays.
- Multiple price filters will be under the same variable ‘price’ which may be a Sequence or Map.
Literal values need special handling.
Retrieve all items where the user_id is greater than 100.
GET /items?user_id=gt:100
However, if gt:100
was a valid id, we could not find the items where the user_id
equals gt:100
Retrieves all the items containing the terms red chair and the price is greater than or equal to 10 and less than or equal to 100
GET /items?q=title:red chair AND price:[10 TO 100]
- Adding filters and ranges with the search parameter
- Lucene syntax or ElasticSearch Simple Query Strings would be easy to support directly - if needed.
- Most flexible queries for API users
- Almost no parsing required on backend, can pass directly to search engine or database
- Harder for beginners (Lucene syntax)
- Full-text search doesn’t make sense for all resources. E.g. Fuzziness and term boosting doesn’t make sense for time series metric data.
- Requires URL percent encoding i.e. use of cURL or Postman becomes more complicated.
Decision: Use PATCH and json-patch
When deciding on the update strategy of resources in our API we took into consideration two different proposals for sending updates: Payload comparison
The key reasons were:
- Wikidata works with large JSON objects
- json-patch Allows appending to arrays
Payload comparison: https://www.notion.so/Payload-Examples-1cbe38a24f8044fba18d107d390b97fb
- More php tooling and support (not extremely popular or regularly updated but it is an old RFC):
- github.com/mikemccabe/json-patch-php
- github.com/swaggest/json-diff
- Explicit approach with predefined available actions
- Allows appending to arrays
- Multiple operations are performed by sending an array of “operation” objects
- Straightforward, minimal verbosity
- Removal is “implicit”
- i.e. by sending a key with a
null
value
- i.e. by sending a key with a
- No support for extending array values
- i.e. arrays must be fully replaced
- PHP tooling seems to be scarce and mostly abandoned
- Multiple operations are performed by sending one “update” object
Decision: Statements would be exposed both within an entity and also at the top level
Considered options:
- For the statements with a GUID:
- Should we address those always through entities?
- longer path
- the entity id is already in the path and the prefix is redundant
- Should we address those always through entities?
- Or individually, through a /statements path?
Usefulness (end users) Vs. cost of ownership (devs)
The current Action API allows you to get a statement with only the GUID (without the entity id prefix) with no extra work
Not having a top level statement endpoint would mean that “extra work” would be needed by the client in order to access a statement (splitting the statement ID)
Taking this into account we can cover all of our existing APIs with REST APIs only if we do have statements as a top level api module.
Decision: We will not worry about revisionids in paths, as the current API doesn't expose them and per core behaviour this should probably be left for the revisions endpoint
Both choices seem to be okay according to stackoverflow https://stackoverflow.com/a/12737540/4746236
Mediawiki core REST API only has revisionid when looking specifically at revisions (not pages) https://www.mediawiki.org/wiki/API:REST_API
- In the query:
- In the Path:
Decision: We will initially implement conditional requests for GETs (consistent with MediaWiki core). We are undecided around conditional requests for edits and for now and have aligned with MediaWiki core not using headers for edit requests, instead returning a revision id in the payload.
- TL;DR use HTTP spec to prevent the “lost update problem”
- MediaWiki core REST API makes use of conditional requests for GETs.
- ETag + HTTP 409 CONFLICT
- Preconditions in headers
- If-Match
- If-Unmodified-Since
- Should happen for all sub-paths of entities (which all rely on entity state)
- (Possibly even require it via HTTP 428 PRECONDITION REQUIRED)
Sources:
- https://api.wikimedia.org/wiki/Documentation/Best_practices/Conditional_requests
- https://dzone.com/articles/concurrency-control-in-rest-api-with-spring-framew
- https://www.novatec-gmbh.de/en/blog/managing-concurrency-in-a-distributed-restful-environment-with-spring-boot-and-angular2/
Decision: Plural for everything. For consistency's sake.
We want to prevent a dispute where there does not seem to be a "right" or "wrong". A topic into which energy would be better invested is the linking between resources in responses which is related in that it would absolve the clients from "thinking about" and processing the URIs for the most part.
- “Plurals denote collection resources.”
- “Neither way is right or wrong, go with what you like best.”
- MediaWiki core REST API doesn’t appear to use plurals everywhere,
/page/{title}/links/language
- Wikimedia (“restbase”) API has a mix
/page/html/{title}/{revision}
/page/segments/{title}/{revision}
Sources:
- https://restfulapi.net/rest-api-design-tutorial-with-example/
- https://restapitutorial.com/lessons/restquicktips.html
- https://stackoverflow.com/questions/6845772/rest-uri-convention-singular-or-plural-name-of-resource-while-creating-it/6846057#6846057
In the wikibase action API, e.g. wbgetentities
, the returned entity collection is structured as a map, indexed by entity ids. This can speed up retrieval of specific, known entities.
We considered this behavior for this API entity collection end points as well. However, the argument was brought forward that this could conflict with the need of ordered results on these end points (see Filtering rows). We deferred further discussions at this point with a note that requests for specific entities best should be performed against the individual resource, with a reference to the possibility to mitigate possibly resulting performance concerns (a notorious reason to group requests against certain know entities into one request) on other layers.