Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset type [RDST] #64

Closed
jpullmann opened this issue Jan 18, 2018 · 17 comments

Comments

@jpullmann
Copy link
Contributor

commented Jan 18, 2018

Dataset type [RDST]

Provide a mechanism to indicate the type of data being described and recommend vocabularies to use given the dataset type indicated.

Providing examples of scope will provide guidance, without being unnecessarily restrictive. The key requirement is interoperability, achieved by using standardised vocabulary terms. It it unclear whether a canonical registry is required or whether communities should constrain choice via DCAT profiles.


Related requirements: Dataset aspects [RDSAT] 
Related use cases: Scope or type of dataset with a DCAT description [ID8] Modelling resources different from datasets [ID20] 
@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2018

Some existing type vocabularies:

DCMI Type vocabulary - http://dublincore.org/documents/dcmi-terms/#section-7

ISO 19115 Scope Code vocabulary - http://registry.it.csiro.au/def/isotc211/MD_ScopeCode https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#MD_ScopeCode

PARSE.Insight content-types recommended for Re3data - http://www.re3data.org/schema http://gfzpublic.gfz-potsdam.de/pubman/item/escidoc:1397899 item 15 contentType

The last one takes some burrowing to find inside a PDF, so I've copied the contents of the key table here:

15 contentType

PARSE.Insight type | Examples of File Formats

Standard office documents | text documents, spreadsheets, presentations
Networkbased data | websites, email, chat history, etc.
Databases | DBASE, MS Access, Oracle, MySQL, etc.
Images | JPEG, JPEG2000, GIF, TIF, PNG, SVG, etc.
Structured graphics | CAD, CAM, 3D, VRML, etc.
Audiovisual data | WAVE, MP3, MP4, Flash, etc.
Scientific and statistical data formats | SPSS, FITS, GIS, etc.
Raw data | device specific output
Plain text | TXT in various encodings
Structured text | XML, SGML, etc.
Archived data | ZIP, RAR, JAR, etc.
Software applications | modelling tools, editors, IDE, compilers, etc.
Source code | scripting, Java, C, C++, Fortran, etc.
Configuration data | parameter settings, logs, library files
Other | -

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2018

Interesting that GeoDCAT-AP models all services as types of dcat:Catalog [1] .
In DCAT 1.0 it looks like it was expected that dcat:Distribution would be used -
from https://w3c.github.io/dxwg/dcat/#vocabulary-overview

  • dcat:Distribution represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data.

from https://w3c.github.io/dxwg/dcat/#class-distribution

  • a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed
    --

[1] - https://www.w3.org/TR/dcat-ucr/#ID20

@andrea-perego

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2018

Interesting that GeoDCAT-AP models all services as types of dcat:Catalog [1] .
In DCAT 1.0 it looks like it was expected that dcat:Distribution would be used -

I think these two approaches concern two different use cases, and they are not mutually exclusive (GeoDCAT-AP uses both).

In DCAT, services (besides dcat:Catalog's) are not first-class citizens, but they are considered indirectly (via dcat:Distribution) only if they give access to data concerning a given dataset record.

This is also how GeoDCAT-AP models services if they are linked to from distributions, as shown in UC-18.

The other use case is about having records on services as a standalone entity (which in any case gives access to data). This includes information who is the service publisher, which type of service it is (a discovery, view, download, transformation, etc. service), etc.

A clarification About how GeoDCAT-AP models this, @dr-shorthair : GeoDCAT-AP uses dcat:Catalog only for catalogue/discovery services; all the other ones are modelled with dctype:Service.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2018

GeoDCAT-AP uses dcat:Catalog only for catalogue/discovery services;

OK, good.

@andrea-perego

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2018

@dr-shorthair said:

Some existing type vocabularies: [...]

I include here also the resource types used in DataCite, along with the corresponding Dublin Core and DCAT classes (when available), documented in the DataCite to DCAT-AP mapping exercise we carried out at JRC:

DataCite Dublin Core DCAT
Audiovisual dctype:MovingImage dcat:Dataset
Collection dctype:Collection dcat:Dataset
Dataset dctype:Dataset dcat:Dataset
Event dctype:Event
Image dctype:Image dcat:Dataset
InteractiveResource dctype:InteractiveResource dcat:Dataset
Model dcat:Dataset
PhysicalObject dctype:PhysicalObject
Service dctype:Service
Software dctype:Software dcat:Dataset
Sound dctype:Sound dcat:Dataset
Text dctype:Text dcat:Dataset
Workflow dcat:Dataset
@kcoyle

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2018

If you need others, here's a list of about 100 different intellectual resource types:

http://id.loc.gov/vocabulary/marcgt.html

Many are physical document-oriented, but some could be useful such as:

  • article
  • map
  • remote sensing image

There's probably a need to reach out to other lists as well, depending on how precise one wishes to be.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2018

dct:type is the standard predicate for 'soft-typing' a resource - i.e. providing a link to a classifier using a predicate other than rdf:type. The existential qualifier 'someValuesOf' may be used to associated dct:type with dcat:Dataset.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2018

PR #103 provides a hook for a 'soft-typing' classifier.
Needs to be accompanied by description of how to use it, which vocabularies to use.

@makxdekkers

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2018

I don't think DCAT needs to prescribe which vocabularies to use. The most appropriate typology may be domain-dependent. I agree that some form of guidance might be useful.

@nicholascar

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2018

The ScopeCode from ISO19115 lists:

dataset-like things:
aggregate
application
collection
coverage
dataset
document
model
nonGeographicDataset
product
repository
sample
series
software
tile

Other, non dataset-like, things:
attribute
attributeType
collectionHardware
collectionSession
dimensionGroup
feature
featureType
propertyType
fieldSession
service
metadata
initiative

This listing is clearly similar to DataCite but with the addition of properties, features etc. In my previous agencies, we have tended to use only the dataset-like types.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

Resolved in meeting https://www.w3.org/2018/02/07-dxwgdcat-minutes
Partially implemented in #108
More documentation and additional guidance needs to be added to the document, particularly regarding use of controlled lists that do not use URIs to identify their members.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2018

Individual (Linked Data) URIs for DataCite resource types are available here:
http://registry.it.csiro.au/def/datacite/resourceType
e.g. http://registry.it.csiro.au/def/datacite/resourceType/DataPaper

Individual (Linked Data) URIs for Re3data PARSE.Insight content-types are available here: http://registry.it.csiro.au/def/re3data/contentType
e.g. http://registry.it.csiro.au/def/re3data/contentType/doc

Individual (Linked Data) URIs for ISO 19115 MD_ScopeCode values are available here: http://registry.it.csiro.au/def/isotc211/MD_ScopeCode
e.g. http://registry.it.csiro.au/def/isotc211/MD_ScopeCode/document

Individual (Linked Data) URIs for MARC Genre/Terms are available here:
https://id.loc.gov/vocabulary/marcgt.html
e.g. https://id.loc.gov/vocabulary/marcgt/doc

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2018

Example in document Overview expanded to cover the other well known resource types vocabularies
PR #308

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2019

Questions on the public list from Clemens Portele [1] and Luca Trani [2] both raise (in part) the question of DCAT scope. The addition of the dcat:Resource superclass provides a extension point for further types to be added to a catalogue following the DCAT catalogue pattern. However, the DCAT vocabulary will be limited to the two sub-classes dcat:Dataset and dcat:DataService. Further specialization can and should be done in DCAT profiles.

This philosophy needs to be made explicit with the DCAT document.

[1] https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Oct/0002.html
[2] https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Dec/0000.html

@smrgeoinfo

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2019

If one modeled a new sub class of dcat:Resource to catalog software, reusing dcat: classes and properties as appropriate, wouldn't that be an extension, not a profile?

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2019

The distinction between 'extension' and 'profile' is not particularly useful under the open-world-assumption. The fact that only two sub-classes of dcat:Resource are defined in DCAT (i.e. dcat:Dataset and dcat:DataService) does not imply that these exhaust the scope of dcat:Resource. So if additional subclasses are defined in a derived vocabulary, these do not necessary extend the absolute scope, they can merely identify additional subsets of dcat:Resource that may or may not have members in common with the subclasses already identified.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2019

If #650 is merged, then I think we can close this issue (woo-hoo!).

@agbeltran agbeltran closed this Jan 31, 2019

DCAT revision automation moved this from In progress to Done Jan 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.