Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets vs. Catalog relation [RDSCR] #62

Closed
jpullmann opened this issue Jan 18, 2018 · 14 comments
Closed

Datasets vs. Catalog relation [RDSCR] #62

jpullmann opened this issue Jan 18, 2018 · 14 comments
Labels
dcat:Catalog dcat:CatalogRecord dcat:Dataset dcat due for closing Issue that is going to be closed if there are no objection within 6 days requirement

Comments

@jpullmann
Copy link

Datasets vs. Catalog relation [RDSCR]

Clarify the relationships between Datasets and zero, one or multiple Catalogs, e.g. in scenarios of copying, harvesting and aggregation of Dataset descriptions among Catalogs.


Related use cases: Datasets and catalogues [ID35] 
@chris-sepa
Copy link

Surely this issue only relates to Datasets with a relationship to multiple Catalogs.
Zero relationships - not in a Catalog so not a DCAT Dataset and out of scope.
One relationship - this is what the current DCAT vocabulary describes.

@makxdekkers
Copy link
Contributor

@chris-sepa In my opinion, it should be perfectly legal to declare something the be a dcat:Dataset without requiring that it is in a dcat:Catalog.

@chris-sepa
Copy link

@makxdekkers True, anybody can say anything about any thing. Perhaps I should have just said - not in a Catalog so out of scope.
In thinking about this I've realised I've been assuming that an Application Profile is associated with a Catalog and that all Dataset records in that catalog would conform to the same profile. Therefore a standalone Dataset record wouldn't describe the profile it conformed to, so not very useful. However, where data set metadata is harvested from multiple Catalogs they may well have been created to conform to differing profiles. This would lead to a need to associate profile information with a Dataset (or a CatalogRecord?). Alternatively, a Catalog constructed from harvested records from other catalogs could be partitioned into sub-catalogs for each different profile.

@agbeltran
Copy link
Member

@makxdekkers we were looking at this issue and the associated use case that you had proposed, and we were not clear on how DCAT restricts a dataset to belong to a catalog - while the text in the spec is very catalog-centric, the RDF doesn't seem to impose any restrictions; were we missing something? And if there is currently a restriction on a dataset having to belong to a catalog, what did you have in mind for addressing this? Thanks

@makxdekkers
Copy link
Contributor

@agbeltran No, you didn't miss anything. The point came up in a discussion around the EU DCAT-AP and it was noted that the vocabulary is called the Data Catalog Vocabulary, not the Dataset Description Vocabulary, which seems to imply that a Dataset in the sense of DCAT must be in a Catalog. @chris-sepa made that point in #62 (comment).
In the case of the EU DCAT-AP, it is made clear in the conformance statement that the description of the catalogue is mandatory for conforming implementations, so this issue might be for profiles to address.
It's not a major issue for me, and it would be fine for me to say nothing about it.

@davebrowning davebrowning added this to the DCAT Backlog milestone Mar 14, 2019
@fellahst
Copy link

The subclassing of dcat:Catalog from dcat:Dataset is problematic and a mistake in my opinion. It reduces its reusability, While Catalog class can reuse many properties of Dataset, it does not mean that it should be a subclass of Dataset. In Geoplatform, a Catalog can refer to different types of assets (closely matching to dcat:Resource) including maps, layers, services, reports etc. We define the term Portfoli as a subclass of dcat:Catalog. Catalog could also be specialized as a Collection, Series or Aggregation of assets, e.g. map series, datasets series, image collection etc..

@smrgeoinfo
Copy link
Contributor

I think the idea is that a catalog is essentially a dataset in which each item is a resource description. Definition of different subclasses of dcat:Resource would enable catalogs describing different kinds of resources.

@fellahst
Copy link

I am sorry I have a hard time conceptualizing a Catalog as a dataset. For me, it is a more collection of resources (its part). A collection of maps (map series) or a collection of services does not have the attributes of dataset, or collection. I would suggest making dcat:Catalog subclass of Resource to avoid future problems.

@kcoyle
Copy link
Contributor

kcoyle commented Oct 29, 2019

However, the definition in the document is: "A collection of data, published or curated by a single agent, and available for access or download in one or more representations." which would include a collection of maps. This would also make a digital library or a digital archive a dataset. This is definitely distinct from the computer science use of the term, which is for a set of files of data points (and would not include a set of documents or digital images).

@dr-shorthair
Copy link
Contributor

The more general notion of 'dataset' was one of the very first things that we resolved when we started work two years ago. See #64, #98, #351 I have no problem in conceiving of a map collection, or digital library as a 'dataset', since both items and datasets can be members of datasets.

The special thing about a 'catalog' is that it is primarily a collection of metadata records, which are a kind of data item.

@fellahst
Copy link

fellahst commented Oct 29, 2019

What about a catalog of physical assets, e.g trucks, equipment for emergency management? Physical assets could be modeled as a subclass of Resource, they are not datasets (even if they describe metadata about these physical assets). The catalog would be a logical grouping of these assets. It is not obvious to me that this catalog can be seen as a dataset. Based on the current definition of dcat:Dataset, everything seems to fall under this category (including service because they are metadata about the service). I think we need more clarity about its scope.

@dr-shorthair I conceptualize a digital Map and layer as a representation of a dataset. You can have multiple maps derived from the same dataset. I think Map and Dataset are disjoint concepts. Map and Layer are typically accessible through a service. Is a Map service a subclass of data service? Is every web service a data service?

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Oct 29, 2019

We intentionally left that door open. While DCAT is about datasets and data-services, the introduction of the dcat:Resource class intentionally provides a potential extension point for the kind of applications you describe. The goal was to reflect the potential for the DCAT model for catalogs to be generalized beyond data, though actually doing any extensions was beyond the scope of DCAT2.

@andrea-perego andrea-perego added the due for closing Issue that is going to be closed if there are no objection within 6 days label Oct 29, 2020
@riccardoAlbertoni
Copy link
Contributor

+1 to close this.

@andrea-perego
Copy link
Contributor

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:Catalog dcat:CatalogRecord dcat:Dataset dcat due for closing Issue that is going to be closed if there are no objection within 6 days requirement
Projects
None yet
Development

No branches or pull requests