New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting standards for basic querying/filtering #18
Comments
Ok here's my stab at this. I've done everything but Pagination and Response Format. Let me know what you think. General rules
Modifiers include:
Time windowUsed to filter the data temporally. Keys
UsageThe dateTime must be in ISO8601 format. Defaults to UTC unless specified otherwise. None, 1 or 2 of the keys can be provided. Required validationReturns error response if:
Examples
Filter by component of date/timeKeys
UsageAssumes UTC is being used. hourOfDay uses 24H clock Several of these can be used together Required validationCheck values fall within expected range, e.g.
Examples
Spatial windowKeys
UsageLatitudes and longitudes are given in WGS84 datum. Height is in meters above or below the WGS 84 reference ellipsoid (same as GSOJSON). None, 1 or 2 height keys can be used in a request. None, 1 or 2 latitude keys can be used in a request. None or 2 longitude keys can be used in a request. Required validation
Examples
Point and radiusKeys
UsageAllows the user to find all resources (e.g. Platforms, observations, etc) within a given distance of a point. The proximityCentre is the centre given in the form Longitude and latitude are in WGS84. Height is in metres. proximityRadius is the distance from the Required validationReturn error response if:
Examples
Equality filterKeysDepends on the resource. UsageCertain resources will be filterable by certain attributes. Required validation
ExamplesIf, for example, you needed to find people whose hair colour is brown then your request might look like this:
More examples:
ThresholdsKeysDepends on the resource. UsageApplies modifiers to keys that are specific to the resource being queried. Required validation
Examples
LimitKeys
UsageFor endpoints returning a collection of resources this parameter will limit the number of resources returned. Can be used in combination with the sortBy and sortOrder keys to get just the "last n" or "first n" resources in the collection. Required validation
Examples
SortKeys
UsageFor endpoints returning a collection of resources this parameter will sort the resources returned. Sorts both numerical fields and also strings (i.e. alphabetically). Can be used in combination with the limit key to get just the "last n" or "first n" resources in the collection. sortOrder defaults to asc if the sortBy key is provided without sortOrder. Required validation
Examples
|
Good work Simon. I agree most of the things. There are couple of things I would add/specify. Filter by component of date/time
Additionally ranges and comma separated list should also apply for other filters that are numeric. |
Here's what I think for pagination. PaginationKeys
UsageUsed for pagination of results. Can be used in combination with the sortBy and sortOrder keys. Required validation
Examples
Not completely sure if we should make limit compulsory when using offset or just use default limit=10 when not specified? |
I agree with all of this, and also with Aare's comments on days of the week and ranges. Thanks both. In the interests of making this more widely applicable, I think it would be worth nailing down what the general case is. Specifically:
As an example: {
"@type": "Sensor",
[...],
"madeObservation": {
"@type": "ObservationCollection",
"member": [{
"@type": "Observation",
"hasResult": {
"@type": "Temperature",
"value": 12.0
},
"resultTime": "2019-10-25T15:14:00Z"
}]
}
} In the above example:
|
I think we might also have to consider, for the sake of search functionality... Wildcard matchingKeys
Usage
Validation
Examples
Disclaimer: The above is similar to how it's already been implemented in some Newcastle APIs, happy to look at alternatives |
Will case be considered? can we have |
Personally, I don't think case sensitivity is necessary. Would be good to know if people feel strongly the other way. |
Oh boy, this gets "fun" quickly! My thoughts:
|
Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc. Any objections? |
None from me!
Sent from Yahoo Mail on Android
On Tue, 3 Dec 2019 at 18:22, Luke Smith<notifications@github.com> wrote:
Personally I'm happy with all of the above. I think enough time has passed, and we should now consider writing this up into the standards doc, and schematising the query parameters etc.
Any objections?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Aye let's get it written up. Let me know if you want me to do any of it. Looks like we can click "edit" on any of the posts above, so should be relatively quick to copy the markdown over into the working document. |
One more to add to this list before we get this written up, which follows on from my previous issue. Can we have an exists condition? E.g. To get a list of all sensors not yet hosted on a platform:
Or perhaps all the observations for which the featureOfInterest is defined:
Just reads a bit nicer than our previous isDefined suggestion. |
No inherent problem with |
Good point. Guess this depends on whether we're showing null values to the user or not. E.g.
vs.
I was veering towards the latter, in which case any null values that may exist in the backend database don't "exist" to the end user and therefore __exists on its own would suffice. However, if there's merit in showing null values then my preference would actually be to stick with just __isDefined and drop __exists. Interested to hear peoples thoughts on this. |
The only place I can see a clear rationale for having I admit I can't think of any other places where Maybe there isn't a problem in that case, and we should just allow Does that work? |
Actually, I suppose it should be |
Yep works for me, and yes |
Even in the case of an alarm, wouldn't boolean values work without the need
to introduce null values?
…On Wed, 4 Dec 2019 at 11:52, Luke Smith ***@***.***> wrote:
Actually, I suppose it should be __exists and __defined for consistency?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AB6X6YPKVJIFNDYTCMAHVLDQW6KWLA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF4YBOI#issuecomment-561610937>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6X6YIWEG2UKHBNAKKJJNLQW6KWLANCNFSM4JBXS5HQ>
.
|
Perhaps, if the alarm were binary. But there might be an enumeration of alarms presented as an array for example: [
"https://example.org/alarm/low-temperature",
"https://example.org/alarm/no-signal"
] In the above case, either an empty array or I'm not suggesting any of this is the right way to do it, just trying to retain flexibility as much as possible. Open to being convinced otherwise :-) |
I've made some progress implementing this into a JS library, but not quite ready to share yet. If you get chance to look at a transforming the above mess into a simple HTML table for the actual document, I'd really appreciate it. |
Sure, I can wack these in a HTML table. Might have to be 2 tables then a few specific examples: Table 1: Special keys e.g.
Table 2: Modifiers e.g.
Special Examples More detailed description of how to query with spatial windows, time windows, pagination, by proximity, etc. |
I'm in the process of putting all parameters in a table now, just wondering if the time based parameters, i.e. minuteofhour, hourofday, etc should actually be "modifiers". e.g. change:
for:
Reason:
The upshot is that the only special parameter keys we're left with are those for pagination, i.e. limit, offset, sortorder, sortby, and those for circular bounding area: proximityradius and proximityradius. This is no bad thing. Any objections? |
It makes sense.
Just one (likely silly) question. Are we considering to apply multiple
filters to the same parameter on the same query? If we are, what would be
the sintax, e.g. resultTime__gte=2017&resultTime__monthofyear=10 ?
…On Thu, 9 Jan 2020 at 17:44, Si Bell ***@***.***> wrote:
I'm in the process of putting all parameters in a table now, just
wondering if the time based parameters, i.e. *minuteofhour*, *hourofday*,
etc should actually be "modifiers".
e.g. change:
monthofyear=10
for:
resultTime__monthofyear=10
Reason:
1. It's more consistent with how we apply other time-based filters,
e.g. resultTime__gte=2019.
2. If you have a resource, e.g. observations, that have more than one
time-based properties, e.g. an observation might have a timestamp for the
time of measurement (*resultTime*), and one for when it arrived at the
server (*arrivalTime*), then this approach lets you choose which one
to filter by.
The upshot is that the only special parameter keys we're left with are
those for pagination, i.e. *limit*, *offset*, *sortorder*, *sortby*, and
those for circular bounding area: *proximityradius* and *proximityradius*.
This is no bad thing.
Any objections?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AB6X6YMWYMMKPUMGKY4QRXDQ45O6XA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRE3DI#issuecomment-572673421>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6X6YMICFBIYXIWJXXW3DLQ45O6XANCNFSM4JBXS5HQ>
.
|
No objections from me, this sounds like a really sensible idea, and would provide options for granular filtering on other date-time based data too in its generic form. I think Ettore's suggestion is right, that would be a valid query for results in October 2017, 2018, 2019 etc., with query constraints being additive. The other aspect to consider is multiple 'modifiers', which I suggest we allow but don't mandate as a minimum. I'm thinking for example Perhaps based on the above we need to clarify the terminology slightly. |
It makes sense.
…On Fri, 10 Jan 2020, 09:20 Luke Smith, ***@***.***> wrote:
No objections from me, this sounds like a really sensible idea, and would
provide options for granular filtering on other date-time based data too in
its generic form.
I think Ettore's suggestion is right, that would be a valid query for
results in October 2017, 2018, 2019 etc., with query constraints being
additive.
The other aspect to consider is multiple 'modifiers', which I suggest we
allow but don't mandate as a minimum. I'm thinking for example
resultTime__monthOfYear__gte=10 for October through December.
Perhaps based on the above we need to clarify the terminology slightly.
resultTime is a selector (picking a specific value within the JSON
response), monthOfYear is a sub-selector (picking a part of that value),
and gte is a modifier? That way the order would always be selector,
sub-selector, modifier, and resultTime__gte__monthOfYear would be
invalid. Does that make sense?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AB6X6YLI5SFUQVZX2UJHS7TQ5A4WPA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEITHIEI#issuecomment-572945425>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6X6YI7GNKADOC3ZLPR373Q5A4WPANCNFSM4JBXS5HQ>
.
|
I agree that Ettore's suggestion is right. Happy to allow sub-selectors too. I'll add it to the docs. Although I'm struggling to think of a use-case other than time-based selectors, but definitely worth having it as an option. |
Query string ParametersQuery string parameters allow greater control over the resources returned when making a request. They come in particularly useful when making GET requests. For example the following request doesn't have any query string parameters:
Whereas the following does:
The latter lets us filter the observations returned to just those made by the sensor with id: thermometer-abc123. This simple example has the form:
i.e. We also accept more complex query string parameters of the form:
This modifer allow you to perform more than just an equality filter. The following example has no modifier:
This would only get observations recorded at that exact millisecond, but what if you wanted all observations since the start of 2020, well that's where a modifier (in this case
The modifier always follows a double underscore: We also have sub-selectors, that focus on a specific component of a selector. They're particularly useful for dealing with timestamps. The format is as follows:
For example:
Where You can also use a subselector in combination with a modifier using the following format:
e.g.
This would retrieve any observations with a resultTime in October, November or December. N.B. Not every observatory will support all of these formats, and each endpoint may only have a small number of query string parameters it accepts. However, when available, this is the format each observatory will abide by. N.B. query string selectors, sub-selectors and modifiers and values are case-insensitive unless specifically defined otherwise. Special selectorsTypically the selector is a property of the resource being returned, e.g. resultTime or madeBySensor. However, there are some special selectors that provided further functionality. They are listed in the following table.
Sub-selectorsN.B. sub-selectors that deal with times and dates assume that the timezone is UTC.
Modifiers
Specific ExamplesTime windowThe following gets observations with a resultTime between two dates.
The following only gets observations in the year 2020 on weekdays.
Spatial Bounding boxThe following retrieves any observations within a bounding box (in this case around Birmingham city centre).
And now for only observations above 1 m.
ProximityThe following retrieves any observations within 1000 m from the centre of Birmingham:
PaginationLet's say you want all air-temperature observations from a platform called mobile-sensing-van in the year 2019. There's potentially thousands of observations available and therefore we want to get them in chunks. Our initial request looks like this.
This returns exactly 100 observations, and therefore there's still more observations to retrieve, so we adjust the offset and make the following request:
In this scenario we'd keep incrementing the offset by 100 until we no longer received 100 observations back. |
List of EndpointsThe List of endpoints suggested by Simon in his examples are: Deployments
Platforms
Observations
Comments and proposals1. Would it make sense to query for all the platform belonging to any deployment, hence having an endpoint such as
Or is it preferable to always specify the deployment as in:
(either one or both endpoints would have to be added to the list proposed by Simon). 2. If someone wants to retrieve a single platform, should he/she use
or
(the second endpoint being possibly redundant if the platform name is kept unique across all the deployments) 3. Does it make sense to ask for ALL observations within a specified time window regardless of the sensor that made it, or at least the ObservedProperty it refers to, as Simon suggested in his example? Should we consider instead a more specific query when it comes to retrieving observations, like:
and / or
4. How would the endpoint for querying a given sensor look like?
or
or
or else...? |
My preference would be that all your suggestions are valid, because some endpoint structures will be better suited to particular clients/frontends that others. For example, I will be handling much of my authorisation at the deployment level, e.g. only certain users will have admin rights to a particular private deployment. Therefore, when I create a front-end that allows admin users to, for example, remove a sensor from a deployment I will want the deployment ID in the URL. For any URL starting Alternatively, the web app that we'll build for the general public to use will be better off using endpoints such as I think we need to pick a small handful of endpoints that we MUST support. The obvious one being |
Thank you, Simon. All that you say makes sense to me. I only have some
reservation on the
<base_url>/observations
endpoint. If we end up having a lot of sensors (which is something we
surely aim to), such an endpoint could potentially be loaded with huge data
retrieval (if some of the sensor have a very high frequency reading and the
time windows is - even mistakenly - too wide).
I see why, for consistency, we may want to include
<base_url>/observations among
the list of endpoints agreed upon. However I think it would be better to
agree on something that is a little bit more limiting when it comes to
observations, like
<base_url>/sensors/<sensor_name>/observation?startDate=<start_date>&
endDate=<end_date>
or
<base_url>/observedProperty/<property_name>/observations?startDate=<start_date>&
endDate=<end_date>
What you think?
…On Fri, 24 Jan 2020 at 12:49, Si Bell ***@***.***> wrote:
My preference would be that all your suggestions are valid, because some
endpoint structures will be better suited to particular clients/frontends
that others.
For example, I will be handling much of my authentication at the
deployment level, e.g. only certain users will have admin rights to a
particular private deployment. Therefore, when I create a front-end that
allows admin users to, for example, remove a sensor from a deployment I
will want the deployment ID in the URL. For any URL starting
<base_url>/deployments/<deployment_id>/ my API server will verify that
the user actually has access rights to this deployment.
Alternatively, the web app that we'll build for the general public to use
will be better off using endpoints such as <base_url>/observations or
<base_url>/platforms. I just make sure that the observations or platforms
returned haven't come from private deployments.
I think we need to pick a small handful of endpoints that we ALL support.
The obvious one being <base_url>/observations. Then if we want to support
more then we can do so, trying our best to be consistent so that we don't
end up with one observatory using /deployments/ and another /deployment/.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AB6X6YLEHFU5PRP2AA6CJ4TQ7LPUDA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ2V6TI#issuecomment-578117453>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6X6YN3B2XAXYZ2CWF5XR3Q7LPUDANCNFSM4JBXS5HQ>
.
|
This is where pagination should come to the rescue. So that even if they make a request that would match millions of observations, we only return a maximum of 1000 (for example). Most databases should have some limit, offset and sort functionality to help with this. What we haven't decided on yet is how we tell the user they have hit our maximum limit and presumably provide a URL for them to get the next 1000. It would be very easy for me to support endpoints such <base_url>/sensors/<sensor_name>/observation and <base_url>/observedProperty/<property_name>/observations as well as <base_url>/observations, so more than happy to add these to the MUST list. It's worth saying that a single sensor could in theory upload thousands of observations everyday by itself, e.g. if it sampled every second, therefore we'd almost certainly need pagination on these additional endpoints too. |
I worry we're going down the wrong path with a list of endpoints. A REST API shouldn't have a list of endpoints, because it's driven by hypermedia, meaning it doesn't matter what the web addresses are because you follow the links to get there. There's absolutely nothing wrong with the endpoints you've suggested, it looks sensible as a way of implementation. But if I wanted to file my We do need is some agreement on how we manage It's also entirely possible that you might have all your platforms in one API (a lamp post API, say) and all your sensors in another (an air quality API, say) and all your historic observations in another (an observation collection API, say) and they would all just link to each other. We also need an entrypoint that directs clients to these collections as a starting point. In other words, when I hit In theory, this would/could look something like...
{
"@context": {
"@base": "https://api.example.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"title": "http://purl.org/dc/terms/title",
"collections": {
"@id": "uo:EntrypointCollections",
"@container": "@id"
}
},
"collections": {
"/sensors": {
"@type": ["@id", "uo:Collection", "uo:SensorCollection"],
"title": "All sensors available in Newcastle upon Tyne"
}
}
} Is this discussion best split into a new issue? Not sure we're talking about filtering anymore... |
Thank you, Luke.
I agree we need an entry point and then just follow the links to get the
resources we want. I guess I was assuming that the structure of the tree
that stems from the entry point would be the same for all observatories.
This is what I meant by "and agreed list of endpoints".
…On Sun, 26 Jan 2020, 12:46 Luke Smith, ***@***.***> wrote:
I worry we're going down the wrong path with a list of endpoints. A REST
API shouldn't have a list of endpoints, because it's driven by hypermedia,
meaning it doesn't matter what the web addresses are because you follow the
links to get there.
There's absolutely nothing wrong with the endpoints you've suggested, it
looks sensible as a way of implementation. But if I wanted to file my
Platforms under https://api.example.com/silly-sausages then I should be
able to.
We do need is some agreement on how we manage Collections (if we call
them that... this might for example be a collection of platforms, thus
paginated, as Si refers to) that aren't ObservationCollections. Examples
of how we do that would be either hydra:Collection
<https://www.hydra-cg.com/spec/latest/core/#collections> or rdf:Bag
<https://www.w3.org/TR/rdf-schema/#ch_bag>.
It's also entirely possible that you might have all your platforms in one
API (a lamp post API, say) and all your sensors in another (an air quality
API, say) and all your historic observations in another (an observation
collection API, say) and they would all just link to each other.
We also need an entrypoint that directs clients to these collections as a
starting point. In other words, when I hit https://api.example.com it
gives me links to a collection of sensors, a collection of platforms, a
collection of observations, etc. It wouldn't need to give me all of those
necessarily, you might not have a collection of all observations from all
sensors (which could be huge, but might be useful), you might only have
collections of observations under each sensor.
In theory, this would/could look something like...
GET https://api.example.com/
{
***@***.***": {
***@***.***": "https://api.example.com/",
"uo": "https://urbanobservatory.github.io/standards/vocabulary/latest/",
"title": "http://purl.org/dc/terms/title",
"collections": {
***@***.***": "uo:EntrypointCollections",
***@***.***": ***@***.***"
}
},
"collections": {
"/sensors": {
***@***.***": ***@***.***", "uo:Collection", "uo:SensorCollection"],
"title": "All sensors available in Newcastle upon Tyne"
}
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18?email_source=notifications&email_token=AB6X6YJJ5VU5AGZUE3SQY63Q7WA4BA5CNFSM4JBXS5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ5TBDI#issuecomment-578498701>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6X6YIGRUFXN5M5C3QLY6LQ7WA4BANCNFSM4JBXS5HQ>
.
|
@LukeSSmith I've created a new issue on Collections and Pagination, as I agree it makes sense to start a new thread for this. Being able to reach all the endpoints by following links makes perfect sense, but surely there's benefit to keeping some consistency between observatories? E.g. so that any scripts or front-ends that use an observatory's API would work just as well with other observatories' without having to change much more than the base url. |
At the risk of getting carried away, I have another two modifiers that would be useful:
|
An For example an observation might have a flag property Then to query for all observations that have been flagged as breaching a climatic upper bound you can use:
|
I've found myself using a query parameter called
It behaves a little bit like the |
Another addition, as discussed on the technical call today: For example:
Will exclude observations given in the unit Kelvin. Another example:
This would be the opposite of We'd also want to be able to provide a comma-separated list e.g:
Although thinking about it, the right way to do this might be in combination with the
Because the |
Key parameters are as follows:
Two more we didn't discuss in the meeting, but might be worthy of adding:
Please provide suggestions for how to do each and we'll pick a favourite for each.
The text was updated successfully, but these errors were encountered: