Setting standards for basic querying/filtering #18

SiBell · 2019-10-17T10:40:12Z

Key parameters are as follows:

Time Window
Time of day/week/month/year
Spatial bounding box
Point and radius
Equality filter
Thresholds
Pagination
Response format

Two more we didn't discuss in the meeting, but might be worthy of adding:

Limit (i.e. limit=1 to get just the first/last reading)
Sort (i.e. ascending or descending, can be used with limit e.g. to get either the first n or last n observations)

Please provide suggestions for how to do each and we'll pick a favourite for each.

SiBell · 2019-10-18T16:40:34Z

Ok here's my stab at this. I've done everything but Pagination and Response Format. Let me know what you think.

General rules

Use camelCase, because the JSON response also uses camelCase. For example the JSON response for a Platform might have an inDeployment property, and thus to filter by deployment when requesting a list of Platforms you could use inDeployment=weather-stations.
A double underscore __ prefixes a modifier, e.g. dateTime__gt.

Modifiers include:

gt - greater than
lt - less than
gte - greater than or equal to
lte - less than or equal to

Time window

Used to filter the data temporally.

Keys

dateTime__gt
dateTime__lt
dateTime__gte
dateTime__lte

Usage

The dateTime must be in ISO8601 format.

Defaults to UTC unless specified otherwise.

None, 1 or 2 of the keys can be provided.

Required validation

Returns error response if:

Dates are not in ISO8601 format.
dateTime__gt is after dateTime__lt (and other similar scenarios).

Examples

?dateTime__gte:2019-10-18

?dateTime__gt:2019-10-18T15:03:34.614Z

?dateTime__gt:2019-10-18T15:03:34.614+04

?dateTime__gte:2019-10-18&dateTime__lt:2019-10-24

Filter by component of date/time

Keys

minuteOfHour
hourOfDay
dayOfWeek
dayOfMonth
dayOfYear
monthOfYear
year

Usage

Assumes UTC is being used.

hourOfDay uses 24H clock

Several of these can be used together

Required validation

Check values fall within expected range, e.g.

minuteOfHour between 0 and 59
dayOfWeek should be between 1 and 7
dayOfMonth between 1 and 31

Examples

?minuteOfHour=30

?hourOfDay=22

?dayOfWeek=2 (i.e. for Tuesday)

?dayOfMonth=2

?dayOfYear=301

?monthOfYear=11

?year=2019

Spatial window

Keys

latitude__gt
latitude__lt
latitude__gte
latitude__lte
longitude__gt
longitude__lt
longitude__gte
longitude__lte
height__gt
height__lt
height__gte
height__lte

Usage

Latitudes and longitudes are given in WGS84 datum.

Height is in meters above or below the WGS 84 reference ellipsoid (same as GSOJSON).

None, 1 or 2 height keys can be used in a request.

None, 1 or 2 latitude keys can be used in a request.

None or 2 longitude keys can be used in a request.

Required validation

Values fall within expected range, e.g latitude between -90 and +90.
If there's only 1 longitude key then return an error. This is to avoid 180th meridian issues.
Can't use a latitude__lt with latitude__lte, and likewise for longitude, and for gt with gte.

Examples

?latitude__gt=52

?longitude__gt=-8.5&longitude__lte=2

?height___lt=10

Point and radius

Keys

proximityCentre
proximityRadius

Usage

Allows the user to find all resources (e.g. Platforms, observations, etc) within a given distance of a point.

The proximityCentre is the centre given in the form longitude,latitude, height. The height is optional. When height is given the filtered region turns from a circle into a sphere.

Longitude and latitude are in WGS84. Height is in metres.

proximityRadius is the distance from the proximityCentre in metres.

Required validation

Return error response if:

Only one of the keys is provided. Both proximityCentre and proximityRadius are required together.
The longitude and latitude used in the proximityCentre aren't valid coordinates.
The longitude and latitude aren't specified for proximityCentre. At least the longitude and latitude are required, with the height being optional.

Examples

?proximityCentre=-1.9,52.2&proximityRadius=1000

?proximityCentre=-1.9,52.2,10&proximityRadius=1000

Equality filter

Keys

Depends on the resource.

Usage

Certain resources will be filterable by certain attributes.

Required validation

Only certain keys will be valid (depending on the resource). e.g. hairColour wouldn't be a valid parameter when querying an endpoint that serves a list of sensors.

Examples

If, for example, you needed to find people whose hair colour is brown then your request might look like this:

https://api.example.com/people?hairColour=brown

More examples:

?inDeployment=weather-stations

?isHostedBy=lamppost-101

?age=18

Thresholds

Keys

Depends on the resource.

Usage

Applies modifiers to keys that are specific to the resource being queried.

Required validation

Only certain keys and modifiers are valid for certain resources.

Examples

/people?age__lt=18

/observations?value__gte=30.5&observedProperty=air-temperature

Limit

Keys

limit

Usage

For endpoints returning a collection of resources this parameter will limit the number of resources returned.

Can be used in combination with the sortBy and sortOrder keys to get just the "last n" or "first n" resources in the collection.

Required validation

Value can't be less than or equal to 0.

Examples

?limit=100

?limit=1&sortOrder=asc&sortBy=age

Sort

Keys

sortBy
sortOrder

Usage

For endpoints returning a collection of resources this parameter will sort the resources returned.

Sorts both numerical fields and also strings (i.e. alphabetically).

Can be used in combination with the limit key to get just the "last n" or "first n" resources in the collection.

sortOrder defaults to asc if the sortBy key is provided without sortOrder.

Required validation

sortOrder values can only be asc or desc.
Returns error if sortOrder is provided without sortBy.

Examples

?sort=desc

?limit=1&sortOrder=asc&sortBy=age

aarepuu · 2019-10-21T08:40:17Z

Good work Simon. I agree most of the things. There are couple of things I would add/specify.

Filter by component of date/time

dayOfWeek should be one of MO, TU, WE, TH, FR, SA, SU to be explicit on the days, some countries start the week with Sunday.
we should also support comma separated list for filters
- ?dayOfWeek=MO,WE,FR
we should also support ranges for filters
- ?dayOfWeek=MO-FR

Additionally ranges and comma separated list should also apply for other filters that are numeric.

aarepuu · 2019-10-21T14:25:53Z

Here's what I think for pagination.

Pagination

Keys

limit
offset

Usage

Used for pagination of results.
Both are represented as integers and are not required parameters.
If not specified it defaults to limit=10 and offset=0.

Can be used in combination with the sortBy and sortOrder keys.

Required validation

limit has to be integer greater than or equal to 1.
offset has to be integer greater than or equal to 0.
~~limit must be specified when using offset~~

Examples

?limit=100
?limit=10&offset=10
?limit=10&offset=10&sortOrder=asc&sortBy=age

Not completely sure if we should make limit compulsory when using offset or just use default limit=10 when not specified?

lukeshope · 2019-10-25T14:26:12Z

I agree with all of this, and also with Aare's comments on days of the week and ranges. Thanks both.

In the interests of making this more widely applicable, I think it would be worth nailing down what the general case is. Specifically:

how are we dereferencing when using filters that correspond to IDs such as your inDeployment example; do we require full IRIs for example, inDeployment=https://birmingham.uo.ac.uk/api/deployment/weather-stations, or do we allow relative paths following RFC3986 as in JSON-LD? Personal preference is allow either, but this does add complexity
do we allow filtering on properties in nested objects? Personal preference is this optional and implementation dependent
if so, what is the general form for the query parameter in that case? See below, it's complicated

As an example:

{
  "@type": "Sensor",
  [...],
  "madeObservation": {
    "@type": "ObservationCollection",
    "member": [{
      "@type": "Observation",
      "hasResult": {
        "@type": "Temperature",
        "value": 12.0
      },
      "resultTime": "2019-10-25T15:14:00Z"
    }]
  }
}

In the above example:

would the query parameter be value to apply a threshold, e.g. ?value__gte=10.0?
would we only allow the value filter when querying against the IRI of the ObservationCollection?
if not, what would the behaviour be if I used a query filter on a Sensor or SensorCollection, which contained ObservationCollections and Observations as nested objects?
what would happen if two query parameters with the same name both existed within different nested objects (e.g. a name on a platform, and also a name on a sensor attached to it); which would the name filter apply to? My preference would be that the filter applies to the highest level name only

lukeshope · 2019-10-25T14:36:00Z

I think we might also have to consider, for the sake of search functionality...

Wildcard matching

Keys

name__contains
name__containsAny
name__containsAll
Depends on the instance

Usage

An optional filter to be supported by some implementations
Functions as an or or and filter, depending on whether __containsAny or __containsAll is used
The contains variant only allows one word to be specified, or a phrase if surrounded by double quotes
The containsAny and containsAll variant allows multiple words or phrases to be specified, joined by a plus symbol

Validation

Double quotes must be in complete pairs
Double quotes may not be used as part of the filter itself
Only one word or phrase passed when using contains

Examples

http://example.com/platforms?name__contains="Room 2.048"
http://example.com/platforms?name__contains=2.048
http://example.com/platforms?name__containsAll=Room+2.048
http://example.com/platforms?name__containsAny=2.060+2.048
http://example.com/platforms?name__containsAny="Room 2.060"+"Room 2.048"

Disclaimer: The above is similar to how it's already been implemented in some Newcastle APIs, happy to look at alternatives

nharris172 · 2019-10-25T14:38:59Z

Will case be considered? can we have icontains

lukeshope · 2019-10-28T15:09:49Z

Will case be considered? can we have icontains

Personally, I don't think case sensitivity is necessary. Would be good to know if people feel strongly the other way.

SiBell · 2019-10-28T22:40:35Z

Oh boy, this gets "fun" quickly!

My thoughts:

Happy to go case insensitive with the query string parameters. I'm struggling to think of a situation where it would cause us any issues. I'll probably use camelCase in any documentation, but keep the api itself insensitive.
Let's allow full OR relative URIs. Surely in most cases our code will be constructing the full URI from the base and relative parts on the fly anyway.
I'd say ?value__gte=10.0 in your nested example is fine. If we start doing member.hasResult.value__gte=10.0 things are going to get pretty gnarly for the end user. I personally can't see myself allowing all that many filterable properties on a given endpoint so the risk of collisions is fairly low. I'd agree that the filter should apply to the highest level for Luke's name example.
I agree with Aare's points on dayOfWeek and MO,WE,FR and MO-FR. Presumable these are case insensitive too, i.e. MO-FR is the same as mo-fr.
Aare's Pagination approach looks good to me. Although I don't have much experience with Pagination.
For containsAll and containsAny could you use a comma separated approach, e.g. name__containsAny=2.060,2.048? Feels slightly odd seeing double quotes in a URL. Are these for looking for substrings in a longer string, or for searching for elements in a property that's an array, or both? Either way it's different to Aare's dayOfWeek=MO,TU example which is more about filtering discrete values right?

lukeshope · 2019-12-03T18:22:27Z

Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.

Any objections?

geoanorak · 2019-12-03T18:34:26Z

None from me! Sent from Yahoo Mail on Android On Tue, 3 Dec 2019 at 18:22, Luke Smith<notifications@github.com> wrote: Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc. Any objections? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SiBell · 2019-12-03T18:52:35Z

Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.

SiBell · 2019-12-04T11:12:31Z

One more to add to this list before we get this written up, which follows on from my previous issue.

Can we have an exists condition?

E.g.

To get a list of all sensors not yet hosted on a platform:

GET /sensors?isHostedBy__exists=false

Or perhaps all the observations for which the featureOfInterest is defined:

GET /observations?hasFeatureOfInterest__exists=true

Just reads a bit nicer than our previous isDefined suggestion.

lukeshope · 2019-12-04T11:16:11Z

No inherent problem with __exists, but we need to clarify how we would handle null values in that case. A null value would exist, but wouldn't be defined.

SiBell · 2019-12-04T11:33:50Z

Good point. Guess this depends on whether we're showing null values to the user or not.

E.g.

{
  "id": "sensor-123",
  "observes": "air-temperature",
  "inDeployment": null
}

vs.

{
  "id": "sensor-123",
  "observes": "air-temperature"
}

I was veering towards the latter, in which case any null values that may exist in the backend database don't "exist" to the end user and therefore __exists on its own would suffice.

However, if there's merit in showing null values then my preference would actually be to stick with just __isDefined and drop __exists.

Interested to hear peoples thoughts on this.

lukeshope · 2019-12-04T11:49:59Z

The only place I can see a clear rationale for having null values is in the observation value itself, where for example we might have an 'alarm' timeseries, and null means no alarm.

I admit I can't think of any other places where null would be useful. We can certainly discourage the use of null values in serialisations in favour of omission.

Maybe there isn't a problem in that case, and we should just allow __exists and __isDefined, but neither are mandatory and implementations could be free to implement none, one or both combinations in their filters.

Does that work?

lukeshope · 2019-12-04T11:52:04Z

Actually, I suppose it should be __exists and __defined for consistency?

SiBell · 2019-12-04T11:54:52Z

Yep works for me, and yes __defined is better than __isDefined.

EttoreHector · 2019-12-04T11:56:39Z

Even in the case of an alarm, wouldn't boolean values work without the need to introduce null values?

…

On Wed, 4 Dec 2019 at 11:52, Luke Smith ***@***.***> wrote: Actually, I suppose it should be __exists and __defined for consistency? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18?email_source=notifications&email_token=AB6X6YPKVJIFNDYTCMAHVLDQW6KWLA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4YBOI#issuecomment-561610937>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6X6YIWEG2UKHBNAKKJJNLQW6KWLANCNFSM4JBXS5HQ> .

lukeshope · 2019-12-04T11:58:43Z

Perhaps, if the alarm were binary. But there might be an enumeration of alarms presented as an array for example:

[
  "https://example.org/alarm/low-temperature",
  "https://example.org/alarm/no-signal"
]

In the above case, either an empty array or null would be appropriate for when there are no alarms.

I'm not suggesting any of this is the right way to do it, just trying to retain flexibility as much as possible. Open to being convinced otherwise :-)

lukeshope · 2020-01-07T11:40:21Z

Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document.

I've made some progress implementing this into a JS library, but not quite ready to share yet.

If you get chance to look at a transforming the above mess into a simple HTML table for the actual document, I'd really appreciate it.

SiBell · 2020-01-07T12:20:01Z

Sure, I can wack these in a HTML table. Might have to be 2 tables then a few specific examples:

Table 1: Special keys

e.g.

key	description	example
limit	limit the number of records returns	?limit=1
proximityradius	the distance from the proximitycentre in metres	?proximityradius=1000

Table 2: Modifiers

e.g.

modifier	description	example
gt	greater than	?datetime__gt=2019-01-01
contains	For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes	?name__contains=west

Special Examples

More detailed description of how to query with spatial windows, time windows, pagination, by proximity, etc.

SiBell · 2020-01-09T17:44:10Z

I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers".

e.g. change:

monthofyear=10

for:

resultTime__monthofyear=10

Reason:

It's more consistent with how we apply other time-based filters, e.g. resultTime__gte=2019.
If you have a resource, e.g. observations, that have more than one time-based properties, e.g. an observation might have a timestamp for the time of measurement (resultTime), and one for when it arrived at the server (arrivalTime), then this approach lets you choose which one to filter by.

The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing.

Any objections?

EttoreHector · 2020-01-09T21:17:34Z

It makes sense. Just one (likely silly) question. Are we considering to apply multiple filters to the same parameter on the same query? If we are, what would be the sintax, e.g. resultTime__gte=2017&resultTime__monthofyear=10 ?

…

On Thu, 9 Jan 2020 at 17:44, Si Bell ***@***.***> wrote: I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. *minuteofhour*, *hourofday*, etc should actually be "modifiers". e.g. change: monthofyear=10 for: resultTime__monthofyear=10 Reason: 1. It's more consistent with how we apply other time-based filters, e.g. resultTime__gte=2019. 2. If you have a resource, e.g. observations, that have more than one time-based properties, e.g. an observation might have a timestamp for the time of measurement (*resultTime*), and one for when it arrived at the server (*arrivalTime*), then this approach lets you choose which one to filter by. The upshot is that the only special parameter keys we're left with are those for pagination, i.e. *limit*, *offset*, *sortorder*, *sortby*, and those for circular bounding area: *proximityradius* and *proximityradius*. This is no bad thing. Any objections? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18?email_source=notifications&email_token=AB6X6YMWYMMKPUMGKY4QRXDQ45O6XA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRE3DI#issuecomment-572673421>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6X6YMICFBIYXIWJXXW3DLQ45O6XANCNFSM4JBXS5HQ> .

lukeshope · 2020-01-10T09:20:38Z

No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form.

I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive.

The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTime__monthOfYear__gte=10 for October through December.

Perhaps based on the above we need to clarify the terminology slightly. resultTime is a selector (picking a specific value within the JSON response), monthOfYear is a sub-selector (picking a part of that value), and gte is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTime__gte__monthOfYear would be invalid. Does that make sense?

EttoreHector · 2020-01-10T10:11:03Z

It makes sense.

…

On Fri, 10 Jan 2020, 09:20 Luke Smith, ***@***.***> wrote: No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form. I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive. The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example resultTime__monthOfYear__gte=10 for October through December. Perhaps based on the above we need to clarify the terminology slightly. resultTime is a selector (picking a specific value within the JSON response), monthOfYear is a sub-selector (picking a part of that value), and gte is a modifier? That way the order would always be selector, sub-selector, modifier, and resultTime__gte__monthOfYear would be invalid. Does that make sense? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18?email_source=notifications&email_token=AB6X6YLI5SFUQVZX2UJHS7TQ5A4WPA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEITHIEI#issuecomment-572945425>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6X6YI7GNKADOC3ZLPR373Q5A4WPANCNFSM4JBXS5HQ> .

SiBell · 2020-01-10T13:28:53Z

I agree that Ettore's suggestion is right.

Happy to allow sub-selectors too. I'll add it to the docs. Although I'm struggling to think of a use-case other than time-based selectors, but definitely worth having it as an option.

SiBell · 2020-01-10T17:27:31Z

Query string Parameters

Query string parameters allow greater control over the resources returned when making a request.

They come in particularly useful when making GET requests.

For example the following request doesn't have any query string parameters:

GET https://api.urbanobservatory.com/observations

Whereas the following does:

GET https://api.urbanobservatory.com/observations?madeBySensor=thermometer-abc123

The latter lets us filter the observations returned to just those made by the sensor with id: thermometer-abc123.

This simple example has the form:

selector=value

i.e. madebysensor is the selector, and thermometer-abc123 is the value.

We also accept more complex query string parameters of the form:

selector__modifier=value

This modifer allow you to perform more than just an equality filter.

The following example has no modifier:

GET .../observations?resultTime=2020-01-09T18:05:24.969Z

This would only get observations recorded at that exact millisecond, but what if you wanted all observations since the start of 2020, well that's where a modifier (in this case gte) can help. Here's how the request would look:

GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z

The modifier always follows a double underscore: __. It acts upon the selector listed before the __.

We also have sub-selectors, that focus on a specific component of a selector. They're particularly useful for dealing with timestamps. The format is as follows:

selector__subselector=value

For example:

resultTime__monthOfYear=10

Where resultTime is the selector, and monthOfYear is the sub-selector. In this example it allows you to only retrieve observations with a resultTime within the month of October.

You can also use a subselector in combination with a modifier using the following format:

selector__subselector__modifier=value

e.g.

resultTime__monthOfYear__gte=10

This would retrieve any observations with a resultTime in October, November or December.

N.B. Not every observatory will support all of these formats, and each endpoint may only have a small number of query string parameters it accepts. However, when available, this is the format each observatory will abide by.

N.B. query string selectors, sub-selectors and modifiers and values are case-insensitive unless specifically defined otherwise.

Special selectors

Typically the selector is a property of the resource being returned, e.g. resultTime or madeBySensor. However, there are some special selectors that provided further functionality. They are listed in the following table.

key	description	examples
limit	Limits the number of records returns. Commonly used with the offset, sortorder and sortby parameters. MUST be an integer value ≥ 1.	`limit=1` or `limit=100&offset=200&sortorder=asc&sortby=resultTime`.
offset	Commonly used for pagination in combination with the limit parameter to skip the first n resources. MUST be an integer value ≥ 0.	`limit=10&offset=30`
sortorder	Used in combination sortby to sort the returned resources by the property provided. Use `asc` for ascending and `desc` for descending.	`sortorder=desc&sortby=resultTime`
sortby	Used in combination with sortorder to sort the returned resources by the property provided.	`sortorder=asc&sortby=madeBySensor`
proximitycentre	MUST be used in combination with proximityradius. Sets the centre of a circular or spherical (if height is given) bounding area. I.e. only resources within the spatial area are returned. Uses the format: longitude,latitude, height. Height is optional. Longitude and latitude use WGS84. Height is in metres.	`proximitycentre=-1.9,52.2&proximityradius=1000` or `proximitycentre=-1.9,52.2,200&proximityRadius=100`
proximityradius	MUST be used in combination with proximitycentre. Sets the distance from the proximitycentre in metres.	`proximityradius=1000`

Sub-selectors

N.B. sub-selectors that deal with times and dates assume that the timezone is UTC.

sub-selector	description	examples
minuteofhour	Filters by the minute of the hour. Integer values between 0 and 59.	`resultTime__minuteOfHour=30`
hourofday	Filters by the hour of day. An integer value between 0 and 23. I.e. a 24 hour clock.	`resultTime__hourOfDay=22`
dayofweek	Filters by the day of the week. Valid values are: mo, tu, we, th, fr, sa, su. For multiple days use a comma separate list, e.g. `mo,we,fr`, or a range `mo-fr`.	`resultTime__dayofweek=mo` or `resultTime__dayofweek=mo,we,fr` or `dayofweek=mo-fr`
dayofmonth	Filters by the day of the month. Integer values between 0 and 31	`resultTime__dayofmonth=2`
dayofyear	Filters by the day of the year.	`resultTime__dayofyear=301`
monthofyear	Filters by the month of year. Integer values between 1 and 12	`resultTime__monthofyear=11`
year	Filter resources to just a single year.	`resultTime__year=2019`

Modifiers

modifier	description	examples
none	Format: `key=value`. When no modifier is present, and assuming the parameter key isn't listed in the table above (e.g. it's not limit, offset, etc), then the key is a property that exists on the resources being requested. Only those resources that have a matching value for this property will be returned.	`inDeployment=weather-stations-in-schools` or `isHostedBy=lamppost-32`
gt	greater than	`resultTime__gt=2019-01-01`
gte	greater than or equal to	`height__gte=10`
lt	less than	`latitude__lt=60`
lte	less than or equal to	`value__lte=20`
contains	For wildcard matching. Only allows one word to be specified, or a phrase if surrounded by double quotes	`name__contains=west` or `name__contains="Room 2.048"`
containsany	allows multiple words or phrases to be specified, joined by a `+` symbol.	`name__containsAny=2.060+2.048` or `name__containsAny="Room 2.060"+"Room 2.048"`
containsall	allows multiple words or phrases to be specified, joined by a `+` symbol.	`name__containsAll=Room+2.048`
exists	Used to check if a resource property exists or not.	`isHostedBy__exists=false`
defined	Used to check if a resource property has been defined or not. In most cases it will behave the same as as exists, the only time is may differ is if resources can have properties will of value of null, in which case that property would exist, but would not be defined.	`value__defined=false`

Specific Examples

Time window

The following gets observations with a resultTime between two dates.

GET .../observations?resultTime__gte=2020-01-01T00:00:00.000Z&resultTime__gte=2020-01-01T12:00:00.000Z

The following only gets observations in the year 2020 on weekdays.

GET .../observations?resultTime__dayofweek=mo-fr&resultTime__year=2020

Spatial Bounding box

The following retrieves any observations within a bounding box (in this case around Birmingham city centre).

GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481

And now for only observations above 1 m.

GET .../observations?latitude__lte=52.495768&latitude__gte=52.464492&longitude__lte=-1.875352&longitude__gte=-1.928481&height__gt=1

Proximity

The following retrieves any observations within 1000 m from the centre of Birmingham:

GET .../observations?proximitycentre=1.895007,52.477096&proximityradius=1000

Pagination

Let's say you want all air-temperature observations from a platform called mobile-sensing-van in the year 2019. There's potentially thousands of observations available and therefore we want to get them in chunks. Our initial request looks like this.

GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=0&sortby=resultTime&sortorder=asc

This returns exactly 100 observations, and therefore there's still more observations to retrieve, so we adjust the offset and make the following request:

GET .../observations?observedProperty=air-temperature&platform=mobile-sensing-van&resultTime__year=2019&limit=100&offset=100&sortby=resultTime&sortorder=asc

In this scenario we'd keep incrementing the offset by 100 until we no longer received 100 observations back.

EttoreHector · 2020-01-24T12:23:47Z

List of Endpoints

The List of endpoints suggested by Simon in his examples are:

Deployments

<base_url>/deployments/<deployment_name>
where the response contains the deployment specified by <deployment_name>

<base_url>/deployments
where the response contains a list of all the deployments

Platforms

<base_url>/deployments/<deployment_name>/platforms
where the response contains the list of all the platforms in the specified deployment <deployment_name>

Observations

<base_url>/observations?startDate=<start_date>&endDate=<end_date>
where the response contains a list of ALL the observations recorded between <start_date> and <end_date>

Comments and proposals

1. Would it make sense to query for all the platform belonging to any deployment, hence having an endpoint such as

<base_url>/platforms ?

Or is it preferable to always specify the deployment as in:

<base_url>/deployments/<deployment_name>/platforms/<platform_name> ?

(either one or both endpoints would have to be added to the list proposed by Simon).

2. If someone wants to retrieve a single platform, should he/she use

<base_url>/platforms/<platform_name>

or

<base_url>/deployments/<deployment_name>/platforms/<platform_name> ?

(the second endpoint being possibly redundant if the platform name is kept unique across all the deployments)

3. Does it make sense to ask for ALL observations within a specified time window regardless of the sensor that made it, or at least the ObservedProperty it refers to, as Simon suggested in his example?

Should we consider instead a more specific query when it comes to retrieving observations, like:

<base_url>/sensors/<sensor_name>/observations?startDate=<start_date>&endDate=<end_date>

and / or

<base_url>/observedproperty/<property_name>/observations?startDate=<start_date>&endDate=<end_date> ?

4. How would the endpoint for querying a given sensor look like?

<base_url>/sensors/<sensor_name> (assuming unique sensor names across all platforms / systems / deployment)

or

<base_url>/platform/<platform_name>/sensors/<sensor_name> (assuming unique sensor names only within a platform)

or

<base_url>/deployments/<deployment_name>/sensors/<sensor_name> (assuming unique sensor names within an entire deployment)

or else...?

SiBell · 2020-01-24T12:49:04Z

My preference would be that all your suggestions are valid, because some endpoint structures will be better suited to particular clients/frontends that others.

For example, I will be handling much of my authorisation at the deployment level, e.g. only certain users will have admin rights to a particular private deployment. Therefore, when I create a front-end that allows admin users to, for example, remove a sensor from a deployment I will want the deployment ID in the URL. For any URL starting <base_url>/deployments/<deployment_id>/ my API server will verify that the user actually has access rights to this deployment.

Alternatively, the web app that we'll build for the general public to use will be better off using endpoints such as <base_url>/observations or <base_url>/platforms. I just make sure that the observations or platforms returned haven't come from private deployments.

I think we need to pick a small handful of endpoints that we MUST support. The obvious one being <base_url>/observations. Then if we want to support more then we can do so, trying our best to be consistent so that we don't end up with one observatory using /deployments/ and another /deployment/.

EttoreHector · 2020-01-25T22:19:48Z

Thank you, Simon. All that you say makes sense to me. I only have some reservation on the <base_url>/observations endpoint. If we end up having a lot of sensors (which is something we surely aim to), such an endpoint could potentially be loaded with huge data retrieval (if some of the sensor have a very high frequency reading and the time windows is - even mistakenly - too wide). I see why, for consistency, we may want to include <base_url>/observations among the list of endpoints agreed upon. However I think it would be better to agree on something that is a little bit more limiting when it comes to observations, like <base_url>/sensors/<sensor_name>/observation?startDate=<start_date>& endDate=<end_date> or <base_url>/observedProperty/<property_name>/observations?startDate=<start_date>& endDate=<end_date> What you think?

…

On Fri, 24 Jan 2020 at 12:49, Si Bell ***@***.***> wrote: My preference would be that all your suggestions are valid, because some endpoint structures will be better suited to particular clients/frontends that others. For example, I will be handling much of my authentication at the deployment level, e.g. only certain users will have admin rights to a particular private deployment. Therefore, when I create a front-end that allows admin users to, for example, remove a sensor from a deployment I will want the deployment ID in the URL. For any URL starting <base_url>/deployments/<deployment_id>/ my API server will verify that the user actually has access rights to this deployment. Alternatively, the web app that we'll build for the general public to use will be better off using endpoints such as <base_url>/observations or <base_url>/platforms. I just make sure that the observations or platforms returned haven't come from private deployments. I think we need to pick a small handful of endpoints that we ALL support. The obvious one being <base_url>/observations. Then if we want to support more then we can do so, trying our best to be consistent so that we don't end up with one observatory using /deployments/ and another /deployment/. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18?email_source=notifications&email_token=AB6X6YLEHFU5PRP2AA6CJ4TQ7LPUDA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ2V6TI#issuecomment-578117453>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6X6YN3B2XAXYZ2CWF5XR3Q7LPUDANCNFSM4JBXS5HQ> .

SiBell · 2020-01-25T23:02:19Z

This is where pagination should come to the rescue. So that even if they make a request that would match millions of observations, we only return a maximum of 1000 (for example). Most databases should have some limit, offset and sort functionality to help with this.

What we haven't decided on yet is how we tell the user they have hit our maximum limit and presumably provide a URL for them to get the next 1000.

It would be very easy for me to support endpoints such <base_url>/sensors/<sensor_name>/observation and <base_url>/observedProperty/<property_name>/observations as well as <base_url>/observations, so more than happy to add these to the MUST list.

It's worth saying that a single sensor could in theory upload thousands of observations everyday by itself, e.g. if it sampled every second, therefore we'd almost certainly need pagination on these additional endpoints too.

lukeshope · 2020-01-26T12:46:55Z

I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there.

There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platforms under https://api.example.com/silly-sausages then I should be able to.

We do need is some agreement on how we manage Collections (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollections. Examples of how we do that would be either hydra:Collection or rdf:Bag.

It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other.

We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor.

In theory, this would/could look something like...

GET https://api.example.com/

{
  "@context": {
    "@base": "https://api.example.com/",
    "uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
    "title": "http://purl.org/dc/terms/title",
    "collections": {
      "@id": "uo:EntrypointCollections",
      "@container": "@id"
    }
  },
  "collections": {
    "/sensors": {
      "@type": ["@id", "uo:Collection", "uo:SensorCollection"],
      "title": "All sensors available in Newcastle upon Tyne"
    }
  }
}

Is this discussion best split into a new issue? Not sure we're talking about filtering anymore...

EttoreHector · 2020-01-27T09:42:24Z

Thank you, Luke. I agree we need an entry point and then just follow the links to get the resources we want. I guess I was assuming that the structure of the tree that stems from the entry point would be the same for all observatories. This is what I meant by "and agreed list of endpoints".

…

On Sun, 26 Jan 2020, 12:46 Luke Smith, ***@***.***> wrote: I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there. There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my Platforms under https://api.example.com/silly-sausages then I should be able to. We do need is some agreement on how we manage Collections (if we call them that... this might for example be a collection of platforms, thus paginated, as Si refers to) that aren't ObservationCollections. Examples of how we do that would be either hydra:Collection <https://www.hydra-cg.com/spec/latest/core/#collections> or rdf:Bag <https://www.w3.org/TR/rdf-schema/#ch_bag>. It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other. We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit https://api.example.com it gives me links to a collection of sensors, a collection of platforms, a collection of observations, etc. It wouldn't need to give me all of those necessarily, you might not have a collection of all observations from all sensors (which could be huge, but might be useful), you might only have collections of observations under each sensor. In theory, this would/could look something like... GET https://api.example.com/ { ***@***.***": { ***@***.***": "https://api.example.com/", "uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/", "title": "http://purl.org/dc/terms/title", "collections": { ***@***.***": "uo:EntrypointCollections", ***@***.***": ***@***.***" } }, "collections": { "/sensors": { ***@***.***": ***@***.***", "uo:Collection", "uo:SensorCollection"], "title": "All sensors available in Newcastle upon Tyne" } } } — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18?email_source=notifications&email_token=AB6X6YJJ5VU5AGZUE3SQY63Q7WA4BA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5TBDI#issuecomment-578498701>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6X6YIGRUFXN5M5C3QLY6LQ7WA4BANCNFSM4JBXS5HQ> .

SiBell · 2020-01-27T13:35:12Z

@LukeSSmith I've created a new issue on Collections and Pagination, as I agree it makes sense to start a new thread for this.

Being able to reach all the endpoints by following links makes perfect sense, but surely there's benefit to keeping some consistency between observatories? E.g. so that any scripts or front-ends that use an observatory's API would work just as well with other observatories' without having to change much more than the base url.

SiBell · 2020-02-03T19:36:00Z

At the risk of getting carried away, I have another two modifiers that would be useful:

__in, e.g. ?inDeployments__in=weather-stations,aq-sensors&observedProperty__in=air-temperature,water-temperature.
__begins, e.g. ?name__begins=michae. So a bit like __contains, but the substring must be at the start. Comes in handy for autocomplete form fields.

SiBell · 2020-03-31T09:37:59Z

An __includes modifier would come in handy for selecting resources for which the provided item occurs with an array property.

For example an observation might have a flag property {flag: ['persistence', 'upperbound']}.

Then to query for all observations that have been flagged as breaching a climatic upper bound you can use:

/observations?flag__includes=upperbound

SiBell · 2020-04-27T14:42:19Z

I've found myself using a query parameter called search. E.g.

/platforms?search=lamppost

It behaves a little bit like the __contains except it searches across more than one field. In my case it will typically search both the id and the name for any keyword matches. Mentioning it in case it's something others see themselves using and therefore worthy of adding to the docs.

SiBell · 2020-06-02T11:22:11Z

Another addition, as discussed on the technical call today: not. For when we want to exclude something, or perform the opposite of a filter.

For example:

/observations?unit__not=uo:kelvin

Will exclude observations given in the unit Kelvin.

Another example:

/observations?resultTime__not__gte=2020-01-01

This would be the opposite of resultTime__gte. Although this is a bad example as we could just use resultTime__lt.

We'd also want to be able to provide a comma-separated list e.g:

/observations?unit__not=uo:kelvin,uo:fahrenheit

Although thinking about it, the right way to do this might be in combination with the __in modifier mentioned above, i.e.

/observations?unit__not__in=uo:kelvin,uo:fahrenheit

Because the __in modifier basically implies that the query parameter value will be an array.

aarepuu added the help wanted Extra attention is needed label Oct 18, 2019

aarepuu added this to the MVP functionality milestone Oct 21, 2019

aarepuu pinned this issue Oct 21, 2019

lukeshope added the needs-documenting Need to document the outcome label Dec 3, 2019

EttoreHector unpinned this issue Jan 24, 2020

SiBell mentioned this issue Jan 27, 2020

Collections and Pagination #20

Open

This was referenced Feb 29, 2020

Query string values that mean "not set yet". #15

Closed

URL query strings #10

Closed

Setting standards for basic querying/filtering #18

Setting standards for basic querying/filtering #18

Comments

SiBell commented Oct 17, 2019 • edited

SiBell commented Oct 18, 2019

General rules

Time window

Keys

Usage

Required validation

Examples

Filter by component of date/time

Keys

Usage

Required validation

Examples

Spatial window

Keys

Usage

Required validation

Examples

Point and radius

Keys

Usage

Required validation

Examples

Equality filter

Keys

Usage

Required validation

Examples

Thresholds

Keys

Usage

Required validation

Examples

Limit

Keys

Usage

Required validation

Examples

Sort

Keys

Usage

Required validation

Examples

aarepuu commented Oct 21, 2019

Filter by component of date/time

aarepuu commented Oct 21, 2019 • edited

Pagination

Keys

Usage

Required validation

Examples

lukeshope commented Oct 25, 2019

lukeshope commented Oct 25, 2019 • edited

Wildcard matching

Keys

Usage

Validation

Examples

nharris172 commented Oct 25, 2019

lukeshope commented Oct 28, 2019

SiBell commented Oct 28, 2019

lukeshope commented Dec 3, 2019

geoanorak commented Dec 3, 2019 via email

SiBell commented Dec 3, 2019

SiBell commented Dec 4, 2019 • edited

lukeshope commented Dec 4, 2019

SiBell commented Dec 4, 2019

lukeshope commented Dec 4, 2019

lukeshope commented Dec 4, 2019

SiBell commented Dec 4, 2019 • edited

EttoreHector commented Dec 4, 2019 via email

lukeshope commented Dec 4, 2019

lukeshope commented Jan 7, 2020

SiBell commented Jan 7, 2020

SiBell commented Jan 9, 2020

EttoreHector commented Jan 9, 2020 via email

lukeshope commented Jan 10, 2020

SiBell commented Oct 17, 2019 •

edited

aarepuu commented Oct 21, 2019 •

edited

lukeshope commented Oct 25, 2019 •

edited

SiBell commented Dec 4, 2019 •

edited

SiBell commented Dec 4, 2019 •

edited

EttoreHector commented Jan 24, 2020 •

edited

SiBell commented Jan 24, 2020 •

edited

lukeshope commented Jan 26, 2020 •

edited

SiBell commented Apr 27, 2020 •

edited