Skip to content

DCAT Identifiers

Riccardo Albertoni edited this page Nov 28, 2018 · 22 revisions

[ Still in progress]

Requirements

From the use case Use Case 5.11 Modeling identifiers and making them actionable [ID11]

  • effective use across platform
  • actionable independently from the platforms
    • the identifier can be encoded as HTTP URIs, which seems to be the most effective way of making them actionable
    • otherwise "the type" can help so that a common identifier type registry would ensure interoperability.
  • Distinguishing primary and secondary identifiers

Issues

Proposal 2 (It revises and reassembles proposal 1)

DCAT should rely on HTTP URIs, which is an effective way of making identifiers actionable.

Primary and secondary identifiers can be specified following the indications available in dcat ap guidelines, which recommends

  • Assign a stable identifier to the dataset in the catalogue where the dataset is first published. This should be the primary identifier of the dataset. Include this identifier as the value of dct:identifier.
  • In the case of duplicates, other locally minted identifiers or external identifiers, like Datacite, DOI, ELI etc., will be assigned to the dataset. As long as they are globally unique and stable, these identifiers should be included as values to adms:identifier.
  • Harvesting systems should not delete or change the value of adms:identifier and only use it to compare harvested metadata to detect duplicates.

When identifiers are not HTTP dereferenceable, common identifier type can be specified for the sake of interoperability.

Representing the primary identifier for a dataset

A stable primary identifier is set by using dct:identifier.

Example 1: An example of HTTP dereferenceable ID used in the catalogue where the dataset is first published

<https://example.org/id> a dcat:Dataset;      
	dct:identifier "https://example.org/id"^^xsd:anyURI
 ...
.

Harvesting systems should not delete or change the value of adms:identifier.

Example 2: An example in the catalogue that has harvested the dataset

<https://othercatalog.org/id> a dcat:Dataset;
        # dct:identifier shouldn't be changed by harvesters
	dct:identifier "https://example.org/id"^^xsd:anyURI
 ...
.

Representing secondary identifiers for a dataset

DCAT specifies secondary identifiers by adms:identifier

Example 3:

<https://example.org/id> a dcat:Dataset;
	...
        # Secondary ID
	adms:identifier <https://example.org/iddoi>

<https://example.org/iddoi> rdf:type adms:Identifier ; 
   skos:notation "https://doi.org/10.1109/5.771073"^^xsd:anyURI; 
   # reading https://www.w3.org/TR/skos-reference/#notations more than one skos:notation can be set, 
   skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI  ;
   # the authority/agency defining the identifier scheme, used if the agency has no URI
   adms:schemeAgency "International DOI Foundation" ;
   # the authority/agency defining the identifier scheme, used if the agency has URI
   dct:creator  ex:InternationalDOIFundation.

ex:InternationalDOIFundation a dct:Agent;
    rdfs:label "International DOI Foundation";
    foaf:homepage <https://www.doi.org/> .

DCAT uses adms:schemeAgency and dct:creator to represent the authority that defines the identifier scheme (e.g., DOI foundation in the example), adms:schemeAgency is used when the authority has no URI associated. DCAT does not represent the authority responsible for assigning and maintaining identifiers using that scheme (e.g., IEEE ) as naming the registrant goes against the philosophy of DOI where the sub-spaces are abstracted from the organisation that registers them, with the advantage that DOIs don't change when the organisation changes or the responsibility for that sub-space is handed over to someone else.

When the HTTP dereferenceable ID returns rdf/owl, the use owl:sameAs might be consider

<https://example.org/id> a dcat:Dataset;
	...
	owl:sameAs <https://doi.org/10.1109/5.771073>

Indicating common identifier types

If indentifiers are not HTTP dereferenceable, common identifier types can be served as RDF datatype or custom OWL datatype, see 'ex:type' in the following

Example 4:

<https://example.org/id> a dcat:Dataset;
	...
	adms:identifier <https://example.org/sid>

<https://example.org/sid> rdf:type adms:Identifier ; 
 # the actual id
	 skos:notation "PA 1-060-815"^^ex:type ;
 # Human readable schema agency 
	 adms:schemaAgency "US Copyright Office" ; 
	 dcterms:issued "2001-09-12"^^xsd:date . 

If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; thus indicating a separate identifier scheme in 'type' is redundant. For example, DOI is registered as a namespace in the 'info' URI scheme (see faq #11), so according to RFC-3986 URI it should be encoded as in the following

Example 5:

<https://example.org/id> a dcat:Dataset;
	dct:identifier  "info:doi/10.1109/5.771073"^^xsd:anyURI
...
.

or

<https://example.org/sid> rdf:type adms:Identifier ; 
 # the actual id
	 skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI ;
 ...
 .

Otherwise, examples of common types for identifier scheme (arXiv, ect) are defined in DataCite schema and FAIRsharing Registry.

Proposal 1

This proposal relies on dct:identifier and adms:identifier as the former is included in DCAT 1, and the latter is already included in different DCAT APs.

Part of this proposal is inspired by the Recommendation available in dcat ap guidelines, which recommends

  • Assign a stable identifier to the dataset in the catalogue where the dataset is first published. This should be the primary identifier of the dataset. Include this identifier as the value of dct:identifier.
  • In the case of duplicates, other locally minted identifiers or external identifiers, like Datacite, DOI, ELI etc., will be assigned to the dataset. As long as they are globally unique and stable, these identifiers should be included as values to adms:identifier.
  • Harvesting systems should not delete or change the value of adms:identifier and only use it to compare harvested metadata to detect duplicates.

Compatibility with Google Schema identifier can be addressed proposing proper mappings.

Representing the primary identifier for a dataset

A stable primary identifier is set by using dct:identifier.

Example 1: An example of HTTP dereferenceable ID used in the catalogue where the dataset is first published

<https://example.org/id> a dcat:Dataset;      
	dct:identifier "https://example.org/id"^^xsd:anyURI
 ...
.

If the dataset hasn't an HTTP dereferenceable ID, it must have at least a proxy dereferenceable URI as in the following example

Example 2:

<https://example.org/proxyid> a dcat:Dataset;
	dct:identifier  "id"^^type

'type' can be any RDF recognized datatype IRIs or it can also be custom OWL datatype specified for indicating identifier scheme.

If a registered URI type is used (following RFC-3986), the identifier scheme is part of the URI; thus indicating a separate identifier scheme in 'type' is redundant. For example, DOI is registered as a namespace in the 'info' URI scheme (see faq #11), so it would appear that to formally encode a DOI as an rfc 3986 URI, see below.

Example 2.1:

<https://example.org/proxyid> a dcat:Dataset;
	dct:identifier  "info:doi/10.1109/5.771073"^^xsd:anyURI

Otherwise common types for identifier scheme (arXiv, ect) are defined in DataCite schema and FAIRsharing Registry.

Question 2: should we find a way to associate to the type a way to resolve the id?

Example 3: An example in the catalogue that has harvested the dataset

<https://othercatalog.org/id> a dcat:Dataset;
        # dct:identifier shouldn't be changed by harvesters
	dct:identifier "https://example.org/id"^^xsd:anyURI
 ...
.

Representing a secondary (not HTTP dereferenceable) identifier

DCAT specifies secondary identifiers by adms:identifier

Example 4:

<https://example.org/id> a dcat:Dataset;
	...
	adms:identifier <https://example.org/sid>

<https://example.org/sid> rdf:type adms:Identifier ; 
 # the actual id
	 skos:notation "PA 1-060-815"^^ex:USCO ;
 # Human readable schema agency 
	 adms:schemaAgency "US Copyright Office" ; 
 # machine readable schema agency 
	 dct:creator <https://www.copyright.gov/>
	 dcterms:issued "2001-09-12"^^xsd:date . 

Some guidelines for the most common identifier schemas should be suggested in order to avoid unnecessarily fancy distinct representations for the same ids. Specific datatypes can be considered to foster harmonization..

For example, checking real RDF fragments, DOIs are represented quite differently see examples below.

Example 5: from DCAT-AP-IT guidelines

<http://dati.gov.it/resource/AltroIdentificativo/altroidentificativoDataset1>
     a               adms:Identifier ;
     skos:notation   "doi:10.1109/5.771073";
     ...
 .
  

or

Example 6: from csiro-dap-examples.ttl

dap:doi-P366-2003SEPT
  rdf:type adms:Identifier ;
  dct:creator <https://researchdata.ands.org.au/> ;
  skos:notation "10.4225/08/598dc08d07bb7" ;
  adms:schemeAgency "International DOI Foundation" ;
.

Representing HTTP dereferenceable secondary identifier

Option 1: Using ADMS Identifies

Example 7:

<https://example.org/id> a dcat:Dataset;
	...
        # Secondary ID
	adms:identifier <https://example.org/iddoi>

<https://example.org/iddoi> rdf:type adms:Identifier ; 
   skos:notation "https://doi.org/10.1109/5.771073"^^xsd:anyURI; 
   # reading https://www.w3.org/TR/skos-reference/#notations more than one skos:notation can be set, 
   skos:notation "info:doi/10.1109/5.771073"^^xsd:anyURI  ;
   # the authority/agency defining the identifier scheme, used if the agency has no URI
   adms:schemeAgency "International DOI Foundation" ;
   # the authority/agency defining the identifier scheme, used if the agency has URI
   dct:creator  ex:InternationalDOIFundation.

ex:InternationalDOIFundation a dct:Agent;
    rdfs:label "International DOI Foundation";
    foaf:homepage <https://www.doi.org/> .

DCAT uses adms:schemeAgency and dct:creator to represent the authority that defines the identifier scheme (e.g., DOI foundation in the example), adms:schemeAgency is used when the authority has no URI associated. DCAT does not represent the authority responsible for assigning and maintaining identifiers using that scheme (e.g., IEEE ) as naming the registrant goes against the philosophy of DOI where the sub-spaces are abstracted from the organisation that registers them, with the advantage that DOIs don't change when the organisation changes or the responsibility for that sub-space is handed over to someone else.

When the HTTP dereferenceable ID returns rdf/owl, the use owl:sameAs might be consider

<https://example.org/id> a dcat:Dataset;
	...
	owl:sameAs <https://doi.org/10.1109/5.771073>

  • does make sense to use owl:sameAs when the id is not returning owl/rdf? I am not sure we should recommend it.

Further material

Definition and third parties examples with ADMS

In ADMS this is expressed using the adms:Identifier class with the following properties:

  • the identifier is represented as skos:notation, datatyped with the identifier scheme (including the version number if appropriate);
  • the agency that manages the identifier is set using
  • date on which the identifier was issued is represented with further properties such as dcterms:issued .

An important point to note is that properties of adms:Identifier are properties of the Identifier, not the resource that it identifies or the agency that issued it.

1 <http://business.data.gov.uk/id/company/04285910>
2 	a rov:RegisteredOrganization ;
3 		... ....
9 		adms:identifier <http://example.com/id/oc04285910> ;
10 		org:registeredSite <http://example.com/id/rs04285910> .

# The actual registration
11 <http://example.com/id/li04285910> a adms:Identifier ;
 # textual identifier
12 		skos:notation "04285910"^^ex:idType ;
13 		adms:schemaAgency "UK Companies House" ;
14 		dcterms:issued "2001-09-12"^^xsd:date .

# A supplementary identifier (Open Corporates)
15 <http://example.com/id/oc04285910> a adms:Identifier ;
16 		skos:notation "http://opencorporates.com/companies/gb/04285910"^^ex:OCid ;
17 		dcterms:issued "2010-10-21T15:09:59Z"^^xsd:dateTime ;
18 		dcterms:modified "2012-04-26T15:16:44Z"^^xsd:dateTime ;
19 		dcterms:creator <http://opencorporates.com/companies/gb/07444723> .

Example from DCAT-AP-IT

<http://dati.gov.it/resource/Dataset/ContrattiSPC_agid>
         a                dcatapit:Dataset , dcat:Dataset ;
         dct:identifier   "agid:D.1" ;
         # Secondary identifier
         adms:identifier  <http://dati.gov.it/resource/Identifier/ContrattiSPC_agid_altroID> ;

Example in DXWG GitHub space csiro-dap-examples.ttl

dap:atnf-P366-2003SEPT
  rdf:type dcat:Dataset ;
 ...
  dct:description "Parkes multibeam high-latitude pulsar survey" ;
  dct:identifier "https://doi.org/10.4225/08/598dc08d07bb7"^^xsd:anyURI ;
  dct:identifier "ivo://au.csiro.atnf/P366-2003SEPT"^^xsd:anyURI ;
  dct:license <https://creativecommons.org/licenses/by/4.0/> ;
  dct:modified "2017-07-30T08:55:55Z"^^xsd:dateTime ;
  dct:relation [
      dct:identifier "PH0090_0011.sf" ;
    ] ;
  dct:relation [
      dct:identifier "PH0090_0021.sf" ;
    ] ;
  dct:relation [
      dct:identifier "PH0090_0031.sf" ;
    ] ;
  dct:rights [
      rdf:type dct:RightsStatement ;
      rdfs:comment "All Rights (including copyright) CSIRO 2017." ;
    ] ;
  dct:temporal [
      rdf:type dct:PeriodOfTime ;
      rdf:type time:ProperInterval ;
      time:hasBeginning [
          rdf:type time:Instant ;
          time:inXSDDate "2003-09-01"^^xsd:date ;
        ] ;
      time:hasEnd [
          rdf:type time:Instant ;
          time:inXSDDate "2003-12-31"^^xsd:date ;
        ] ;
    ] ;
  dct:title "Parkes observations for project P366 semester 2003SEPT" ;
  dcat:contactPoint dap:MartaBurgay-vcard ;
  dcat:keyword "pulsar" ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:P366-2003SEPT> ;
  dcat:theme <http://registry.it.csiro.au/def/keyword/anzsrc/astronomical-and-space-sciences-not-elsewhere-classified> ;
  prov:wasGeneratedBy dap:P366 ;
.

dap:doi-P366-2003SEPT
  rdf:type adms:Identifier ;
  dct:creator <https://researchdata.ands.org.au/> ;
  skos:notation "10.4225/08/598dc08d07bb7" ;
  adms:schemeAgency "International DOI Foundation" ;
.

... To do ?!...

Clone this wiki locally