Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collectionCode - Darwin Core Hour Input Form 2/14/2017 11:48:24 #40

Open
iDigBioBot opened this issue Feb 14, 2017 · 7 comments
Open

collectionCode - Darwin Core Hour Input Form 2/14/2017 11:48:24 #40

iDigBioBot opened this issue Feb 14, 2017 · 7 comments
Labels
answered form submission term - record-level Pertaining to a term not organized in any specific Darwin Core class.

Comments

@iDigBioBot
Copy link
Collaborator

A user submitted this information via the Darwin Core Hour webform:
Timestamp: 2/14/2017 11:48:24
Please provide a topic of interest: How is the term "collectionCode" supposed to be used? Are there any existing standards recommendations?
Are you capable of and interested in participating: No
Who else would you recommend to participate in the presentation:
What resources can you point to:
Your name:
Your email:

@pzermoglio pzermoglio added form submission new term - record-level Pertaining to a term not organized in any specific Darwin Core class. labels Feb 14, 2017
@tucotuco tucotuco self-assigned this Feb 25, 2017
@tucotuco tucotuco added answered and removed new labels Feb 25, 2017
@tucotuco tucotuco changed the title Darwin Core Hour Input Form 2/14/2017 11:48:24 collectionCode - Darwin Core Hour Input Form 2/14/2017 11:48:24 Feb 25, 2017
@tucotuco
Copy link
Member

Darwin Core provides several terms to help people distinguish data sets, namely institutionCode, collectionCode, datasetName, and their related identifiers institutionID, collectionID, and datasetID. The institutionCode is meant to hold the official acronym for an organization, such as "MVZ" for the institution "Museum of Vertebrate Zoology". This acronym, along with a catalog number, is commonly used to identify cataloged material in scientific publications.

Practices vary within and among institutions in terms of how cataloging is done, and how specimens are identified. In one institution, the catalog number might contain information to designate which collection in that institution the specimen belongs to, for example "Herp 2371", while in another, the catalog number might not contain this information, for example, "2371". The collectionCode is meant to allow specimens in institutions that follow the latter practice to distinguish specimens from different collections within that institution when sharing with the rest of the world. Thus, institutionCode = "MVZ", collectionCode = "Herp", catalogNumber = "2371" is sufficient to identify the specimen of interest from among many at the Museum of Vertebrate Zoology with catalog number "2371".

The datasetName allows institutions to further separate subsets of data, or to name them explicitly. For example, the University of British Columbia Beaty Biodiversity Museum (institutionCode = "UBCBBM") has the Cowan Tetrapod Collection (collectionCode = "CTC"), within which are several distinct data sets, including one with datasetName = "Cowan Tetrapod Collection - Avian". As another example, the University of Kansas (institutionCode = "KU") has a herpetological collection (collectionCode = "KUH") as a single data set, the name of which is spelled out in datasetName = "University of Kansas Biodiversity Institute Herpetology Collection".

The corresponding identifier fields institutionID, collectionID, and datasetID are meant to contain globally unique and persistent identifiers for the three corresponding concepts. The first two of these terms, institutionID and collectionID would best be populated with references to entries in a registry of institutions and collections, such as the Global Registry of Biodiversity Repositories (http://grbio.org), for example, institutionCode = "NHMO", institutionID = "http://grbio.org/cool/2knt-7f1r", collectionCode = "BI", collectionID = "http://grbio.org/cool/wes0-t2ie".

The datasetID is best populated with an identifier for a published data set in which the record can be found. As such, a publication reference such as a Digital Object Identifier (DOI) is a good candidate, for example datasetID = "https://doi.org/10.15468/aomfnb" for records in the 2015 eBird Observation Dataset (see http://www.gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e).

@debpaul
Copy link
Contributor

debpaul commented Aug 31, 2017

Documentation page added. See https://github.com/tdwg/dwc-qa/wiki/Institutions-and-Collections. @tucotuco this is assigned to you so I will leave it up to you when you would like to close.

@garymotz
Copy link
Collaborator

garymotz commented Sep 1, 2017

Is it generally best practice to use a shortened URL for institutionID = "http://grbio.org/cool/2knt-7f1r" or is institutionID = "http://grbio.org/institution/natural-history-museum-university-oslo" acceptable as well?

Is the major intent to ensure that the value is a resolvable URI, regardless of whether or not it is a shortened or more-or-less human readable URL?

@dagendresen
Copy link

I am not concerned with the human-readability of the identifier (dwc:institutionID or any dwc:nnn-ID term). I would choose the short cooluri form from GRBio rather than the longer URL form. I value a long-term persistent resolvable identifier much more than human-readability!

Using VIAF numbers as institution identifiers might perhaps also be useful:
institutionID = http://viaf.org/viaf/113146937739813830943/

VIAF is coming from the library community. VIAF numbers are permanent, but one institution (or person) might end up with more than one VIAF code. ISNI numbers are curated to be persistent and ensure than one institution or person have only one ISNI code. ORCID are a subset of the ISNI codes.

Might it be possible to aspire to assigning ISNI numbers for all institutions in GRBio that do not yet have such a number...? And later on to aspire to recommend "older" identifier systems used for biodiversity institutions and people to be linked (and possibly resolved) to the corresponding unique ISNI number...?

http://www.isni.org/
http://www.gbif.no/news/2016/bibsys-november-2016.html

@tucotuco
Copy link
Member

tucotuco commented Sep 1, 2017 via email

@godfoder
Copy link

godfoder commented Sep 1, 2017

TDWG NCD Co-Convener (w/ @debpaul) Here.

For NCD (Standards track), our work is likely to directly borrow the terms, definitions, and examples from darwin core where there are existing elements, so there should be no duplication of effort or conflicts here.

For NCD (Implementation track), I like the idea of promoting the use of ISNI style identifiers. I was hoping to promote the use of ORCIDs for identifying people, so having an equivalent identifier for the collection and institution seems like a natural fit.

At least for institutions, it seems like the libraries may well have already done our work for us and issued institution identifiers for many places. Issuance of collection identifiers might be more problematic, but possibly also something that could be done with less curatorial control (uris, arks, handles, uuids) where there is already a strong institution identifier in place to provide context.

@tucotuco
Copy link
Member

tucotuco commented Sep 1, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered form submission term - record-level Pertaining to a term not organized in any specific Darwin Core class.
Projects
None yet
Development

No branches or pull requests

7 participants