Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset aspects [RDSAT] #60

Closed
jpullmann opened this issue Jan 18, 2018 · 9 comments
Closed

Dataset aspects [RDSAT] #60

jpullmann opened this issue Jan 18, 2018 · 9 comments

Comments

@jpullmann
Copy link

Dataset aspects [RDSAT]

Provide recommendations and mechanisms for data providers to describe datasets with a formal description of aspects (e.g. instrument/sensor used, spatial feature, observable property, quantity kind).

Finer grained semantics will also allow dataset dimensions to be described, and distributions described using these semantics - for example how a dataset is composed of multiple subsets, such as a set of image bands or tiles, or parameterised filtering/subsetting services

This requirement applies to catalogues of DCAT records, and is thus related to the concept of profiles, which are expected to define classification dimensions (use of controlled vocabularies in mandatory properties)


Related requirements: Profiles listing [RPFL] Distribution schema [RDIS] 
Related use cases: Support associating fine-grained semantics for datasets and resources within a dataset [ID7] Profile support for input functions [ID46] Europeana profile ecosystem: representing, publishing and consuming application profiles of the Europeana Data Model (EDM) [ID37] Summarization/Characterization of datasets [ID33] 
@makxdekkers
Copy link
Contributor

I am wondering how 'general' these aspects are. Some of the ones mentioned in the issue seem to be only relevant to particular types of datasets. A lot of datasets are not the result of sensoring. As to dimensions etc., the European StatDCAT-AP (for description of statistical data), includes properties for dimensions, attributes and measures.

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 11, 2018

I expect that QB DataStructureDefinition might provide some suitable predicates or patterns for this.
If we can figure out how to use them.

Details of the internal structure and semantics are the essence of QB's DSD capability.
While the overall structure description might go too far for a pure 'discovery' application, the methods that QB uses to describe dimensions or axes of a dataset certainly are, particularly the semantic classification.
This part of QB for example: https://www.w3.org/TR/vocab-data-cube/#dsd-dimensions

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Mar 15, 2018

Makx comment above that sensor data is not ubiquitous. The SSN Vocabulary anticipated that issue, and defines the term 'Observation' as

Observation- Act of carrying out an (Observation) Procedure to estimate or calculate a value of a property of a FeatureOfInterest.

The definitions of Procedure and Sensor are similarly general:

Procedure - A workflow, protocol, plan, algorithm, or computational method specifying how to make an Observation, create a Sample, or make a change to the state of the world

Sensor - Device, agent (including humans), or software (simulation) involved in, or implementing, a Procedure.

This means that there would be no inconsistency in using some terms from the SSN/SOSA vocabulary to describe some dataset aspects even when classic physical sensors are not involved. In particular the following might be useful

There are no general domain/range constraints associated with these properties, so no entailment risks AFAICT. These SSN/SOSA properties would likely match DCAT's goal of providing a 'basic framework for describing datasets' - useful for discovery, though well short of the details needed for actual use, which is the spot that QB aims for.

@dr-shorthair dr-shorthair added this to the Data aspects - semantics milestone Mar 16, 2018
@dr-shorthair
Copy link
Contributor

I've begun an inventory of predicates from some well-known vocabularies that might be reusable, or at least might guide us here - see https://github.com/w3c/dxwg/wiki/Data-aspects---semantics

@jpullmann
Copy link
Author

Comment on behalf of Øystein Åsnes on "Fine-grained semantics for datasets and resources within a dataset [ID7]":

  • use URI references to "main concepts of the dataset"
  • dedicated property in DCAT-AP-NO (URI=dct:subject, domain = dcat:dataset, range=skos:Concept, cardinality=0..n)
  • SKOS schemata will be used for concept vocabularies
  • for fine-grained semantics (of both, a dataset and distribution) consider Project Open Metadata Schema properties:

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Apr 19, 2018

describedBy - reference to (JSON) schema
describedByType - media/mime type

These are certainly useful, but they do not provide direct access to the dataset semantics. You would have to look inside another artefact (the schema), which you may or may not be able to process.

My goal here is to enable some fine-grained semantics to be directly visible in the discovery metadata, particularly things like variables and target features. The predicates mentioned above might be seen as merely specializations of dcat:theme, but since these names are already standardized in vocabularies from W3C (and elsewhere) they could be adopted straight away. This level of detail wouldn't suit everyone, but can be done in a common way for those who want it.

@andrea-perego
Copy link
Contributor

Just to note that describedBy is actually coming from POWDER-S (wdrs:describedBy):

https://www.w3.org/2007/05/powder-s#describedby

This property has also been registered in the IANA Link Relations registry, together with its inverse describes, defined in RFC6892.

@dr-shorthair dr-shorthair removed this from the Data aspects - semantics milestone Aug 21, 2018
@davebrowning davebrowning modified the milestones: DCAT CR, DCAT Backlog Mar 14, 2019
@davebrowning davebrowning removed this from To do in DCAT revision Apr 9, 2019
@andrea-perego andrea-perego added this to To do in DCAT revision via automation Sep 26, 2019
@andrea-perego
Copy link
Contributor

As there has been no further discussion on this issue, I propose to close it.

@andrea-perego andrea-perego added the due for closing Issue that has been addressed and it is going to be closed if there are no objection within 6 days label Oct 29, 2020
@andrea-perego andrea-perego moved this from To do to In progress in DCAT Sprint: Requirements Nov 3, 2020
@andrea-perego
Copy link
Contributor

As there has been no further discussion on this issue, I propose to close it.

Noting no objections, I am closing this issue.

DCAT revision automation moved this from To do to Done Mar 13, 2021
DCAT Sprint: Requirements automation moved this from In progress to Done Mar 13, 2021
DCAT Sprint: Space and Time automation moved this from In progress to Done Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:Catalog dcat:CatalogRecord dcat:Dataset dcat:Distribution dcat:theme dcat due for closing Issue that has been addressed and it is going to be closed if there are no objection within 6 days provenance requirement
Development

No branches or pull requests

7 participants