New term

Daphnisd · 2018-02-19T10:02:45Z

Submitter: John Wieczorek, following request initiated by Daphnis de Pooter (@Daphnisd)
Justification (why is this change necessary?): There is currently no simple way to capture the verbatim scientificName given in an identification/determination - it has to be separated out in parts and corrected.
Proponents (who needs this change): OBIS, Global Names, SANBI

Proposed new attributes of the term:

Term name (in lowerCamelCase): verbatimIdentification
Organized in Class (e.g. Location, Taxon): Identification
Definition of the term: A string representing the taxonomic identification as it appeared in the original record.
Usage comments (recommendations regarding content, etc.): This term is meant to allow the capture of an unaltered original identification/determination, including identification qualifiers, hybrid formulas, uncertainties, etc. This term is meant to be used in addition to scientificName (and identificationQualifier etc.), not instead of it.
Examples: Peromyscus sp., Ministrymon sp. nov. 1, Anser anser X Branta canadensis, Pachyporidae?
Refines (identifier of the broader term this term refines, if applicable): None
Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): None
ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG, if applicable): not in ABCD

From Daphnis de Pooter (@Daphnisd)
Term needed to record how a taxon was originally recorded in an unprocessed dataset.
Motivation: tdwg/dwc-qa#109

The text was updated successfully, but these errors were encountered:

mdoering · 2018-02-19T11:52:52Z

I would suggest to review all "verbatim" terms and come up with a general strategy.
In theory all terms can be interpreted and there is a need to deal with both verbatim and interpreted/cleaned values. GBIF or ALA for example have a long list of interpreted terms. We decided against new terms though and rather use the same term in a different context.

I can't find the issue now, but it was also suggested to have a new rowType that could indicate verbatim values.

cgendreau · 2018-02-19T14:24:52Z

@mdoering it overlaps with discussion in gbif/occurrence#24

mdoering · 2018-02-19T14:25:57Z

Thanks @cgendreau thats exactly what I was looking for!

ansell · 2018-02-20T05:24:03Z

A new convention is my preference so far, per the discussion in gbif/occurrence#24

claudenozeres · 2018-02-20T11:06:15Z

I agree with Daphnis that a new term and/or strategy is needed to make this more explicit. In the past for marine datasets, practice was to publish using a valid name from an interpreted original. Despite the recommendation to submit using the original (because it gets too messy, WoRMS can't always suggest matches for obvious names). Then the original (verbatim) information is lost. Currently this is a challenge for me with specimen label names (although I imagine it is similar for observation records). So the matter of original names sometimes gets mixed in with issues of identification. What I need is to record verbatim name. Displaying a valid scientificName comes after, because as it appears in the ALA/GBIF discussion, this can be open to interpretation. Having verbatim as part of a history extension (rather than core) does seems fragile to me.

mdoering · 2018-02-20T12:17:51Z

I would argue DwC should not have any specific verbatim terms but rather recommend other ways of dealing with data provenance. Often we also have longer lineages with multiple steps that alter the content so a single verbatim term is difficult to apply. For example W3C offers a rather complete PROV Ontology although we should probably look for sth far more simple.

qgroom · 2018-02-20T13:34:36Z

I tend to agree with Markus. Essentially every field could have a verbatim term and it would be better if we could chain versions of an observation together. My only doubt is that Darwin Core has to be kept reasonably simple otherwise people will not use it. Therefore, I'm OK with maintaining some verbatim fields as long as there was a gold standard way to handle these data.

mdoering · 2018-02-20T13:45:01Z

Well, we could also create a verbatim term for each dwc term. There is not really a restriction on number of terms, just increased complexity. But if verbatim terms always have a prefix "verbatim" its not adding much to the confusion. It might even help cause we could get the existing ones out of the way when presenting terms

qgroom · 2018-02-20T13:57:59Z

As Markus points out the problem is provenance. As data associated with an observation/specimen get amended the chain of provenance is lost if you only have one verbatim field. If I understand Markus correctly you could have no verbatim fields because every field would be verbatim and you would link versions together to determine provenance.

debpaul · 2018-05-07T17:35:22Z

From a data-mining standpoint, we need verbatim data to do things things like automatically find matching references between a dwc record and an old publication in BHL. If for example, the verbatim locality, verbatim taxon name, are not shared, then it will be much for difficult for computer algorithms to make the connections between the two datasets. I'm not sure it matters what we call it as long as it's clear that it's the "original text" in this case, as found in or on the label / field notebook / ledger. So it seems you are all saying we could / ought to use Identification History (and other such extensions) to share this type of information? What about verbatimCountry? this comes up all the time.

Daphnisd · 2018-05-07T19:11:40Z

I don't think using a separate extension for this is an option for us, as it would not be compatible with event core in IPT.

ansell · 2018-05-08T23:52:32Z

Adding a single "verbatim" extension to a Darwin Core Archive isn't going to satisfy every use case if provenance over time is required, but those use cases also won't be satisfied by a single verbatimScientificName field.

In a possibly more serious provenance case, the ALA created an issue for itself, GBIF, and the community, a number of years ago with its choice to overwrite the original occurrenceID obtained from scientists with an internal opaque GUID when sending this data to GBIF, but still shows the original occurrenceID on ALA websites/downloads and stores that in the ALA datasets. I have been told by the person who made that decision that it should/must not be fixed (for various reasons). However, without a standard way to express the verbatimOccurrenceID, I also can't provide any workarounds to enable the original data to passthrough unhindered.

Having a standardised way of providing one or more verbatim or historical Darwin Core Archive extension files would allow users to optionally read what the original author provided, or read what other evolutions of the record contained. The current GBIF-only convention only allows for a single verbatim extension based on a static file name, which won't work for historical contexts where you want to track evolution of a dataset over time. Having an accepted convention that uses metadata rather than file names, whether it is based on the (overly complex) W3C PROV vocabulary, or another system, is essential to me for providing a workaround for the ALA occurrenceID mistake in future, which will (likely already has) hit some users just as badly as rewrites of scientificName to use the taxonomy or merged taxonomies which are currently accepted by a particular organisation.

I don't agree that we should add more verbatim terms to Darwin Core Terms solely to satisfy existing systems that aren't designed for a "verbatim extensions" model that we haven't developed yet. However, given the verbatim prefix already exists in Darwin Core Terms, it wouldn't be creating a new convention, just continuing an old convention, to create verbatimOccurrenceID and/or verbatimScientificName.

mdoering · 2018-05-09T07:16:47Z

If the old convention is continued, how bad would it be to create a verbatim term for every term in Darwin Core? At least we had a consistent model then

peterdesmet · 2018-05-09T08:06:58Z

Couldn't this be done with a dwcverbatim: namespace?

baskaufs · 2019-01-03T17:34:55Z

I have suggested an approach for recording verbatim information involving the W3C SKOS-XL standard in the issue tdwg/tag#22. The actual process of getting from a provided verbatim string to full metadata associated with SKOS-XL instances is fleshed out more in my comment on TNC Issue 24.

ianengelbrecht · 2019-03-12T07:53:29Z

Could I suggest that a strategy for verbatim terms be created as a separate Github issue? Returning to the request for dwc:verbatimScientificName in itself, this would be useful. The documentation for dwc:scientificName says 'This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term' (although the example does include a case that includes the identification qualifier). The BDQ TG2 tests and assertions includes TG2-VALIDATION_POLYNOMIAL_NOTSTANDARD, which as currently defined will return NOT_COMPLIANT for any dwc:scientificName values that include a qualifier. We should also be able to represent identifications such as 'Harpactira sp.' in our datasets, and we also have the case of informal names for undescribed species, such as Harpactira sp. 'blue', manuscript names, etc.

ianengelbrecht · 2019-03-12T08:00:24Z

In a possibly more serious provenance case, the ALA created an issue for itself, GBIF, and the community, a number of years ago with its choice to overwrite the original occurrenceID obtained from scientists with an internal opaque GUID when sending this data to GBIF, but still shows the original occurrenceID on ALA websites/downloads and stores that in the ALA datasets. I have been told by the person who made that decision that it should/must not be fixed (for various reasons). However, without a standard way to express the verbatimOccurrenceID, I also can't provide any workarounds to enable the original data to passthrough unhindered.

@ansell it seems that the practice of creating or overwriting GUIDs is a pervasive problem, probably resulting from a misunderstanding of the purpose of GUIDs in the first place. IMO overwriting dwc:occurrenceID is a misapplication of the standard. Should we modify the standard to cope with its misapplication? Not a route I would advocate for.

ianengelbrecht · 2020-02-04T10:10:00Z

I see there is an verbatimScientificName field, and an accompanying verbatimScientificNameAuthorship field in a dataset I just downloaded from GBIF.

tucotuco · 2020-09-05T19:00:45Z

I see there is an verbatimScientificName field, and an accompanying verbatimScientificNameAuthorship field in a dataset I just downloaded from GBIF.

Those must be the dwc:scientificName and dwc:scientitifNameAuthorship data from the originally published source.

I am reviewing all existing Darwin Core issues to try to move them forward or abandon them as the Vocabulary Maintenance Specification demands. This particular issue had a lot of activity, and in the meantime the community has apparently arrived at practical solutions.

I would like to establish if there is still demand for a new term dwc:verbatimScientificName. If there is, someone please follow the process and provide evidence of demand from at least two independent parties and a term definition following the template provided in Guidelines for contributing.

Observation: I think this term would be best organized in the Identification class and have a name that explicitly makes the role of the name apparent, such as "verbatimIdentification".

qgroom · 2020-09-06T07:10:30Z

Observation: I think this term would be best organized in the Identification class and have a name that explicitly makes the role of the name apparent, such as "verbatimIdentification".

I agree.
This issue was part of the inspiration for the discussion on verbatim data we wrote in the publication below. We concluded that versioning was a much better approach.

Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129

nielsklazenga · 2020-09-24T07:58:24Z

Observation: I think this term would be best organized in the Identification class and have a name that explicitly makes the role of the name apparent, such as "verbatimIdentification".

tdwg/tnc#24 (comment)

tucotuco · 2021-04-19T01:55:43Z

I have changed the title of the issue and prepended a templated term change request to the original comment so as not to have to make a separate issue and relate it to the discussion in this one. Help is needed to know what the equivalent XPATH is in ABCD, if any.

nielsklazenga · 2021-04-19T02:12:07Z

@tucotuco , there is no equivalent for this term in ABCD 2.06.

tucotuco · 2021-04-19T02:57:13Z

Thank you @nielsklazenga. Term definition updated and ready to be prepared for public comment.

afuchs1 · 2021-05-26T06:08:25Z

The Australasian Herbarium Information Systems Committee (HISCOM) endorses the addition of this term to Darwin Core, but proposes to add to the usage notes that verbatimIdentification is best used in addition to scientificName (and identificationQualifier etc.), not instead of it.

tucotuco · 2021-05-26T17:13:16Z

@afuchs1 That seems a perfectly reasonable amendment to me. If there is no conflicting view, I will add it to the final usage comment. In the meantime, I have put a link to your suggestion in the usage section of the first comment.

hollyel · 2021-05-28T17:41:29Z

This term will be useful to the paleo collections community for expressing original IDs and the full extent of our knowledge despite nomenclatural uncertainty (e.g., "Genus sp. nov. 1" as illustrated by one of the existing examples). At least with our current systems, this kind of uncertainty and complexity can lead to unexpected results when our data go to aggregators and get matched to taxonomic backbones. - Holly Little, Erica Krimmel (@ekrimmel), and Talia Karim (@tkarim) (on behalf of the Paleo Data Working Group)

EstebanMH-SiB · 2021-05-28T19:07:55Z

We endorse this proposal on behalf of @SiBColombia

tucotuco · 2021-08-25T03:24:40Z

Done.

tucotuco added Term - add Class - Taxon labels May 7, 2018

mdoering mentioned this issue Dec 8, 2018

Should we have parsed authorship properties? tdwg/tnc#24

Closed

nielsklazenga mentioned this issue Jan 5, 2019

Consider use of SKOS-XL for labels across TDWG vocabularies tdwg/tag#22

Closed

ianengelbrecht mentioned this issue Apr 13, 2020

Change term - identificationQualifier #244

Open

tucotuco added Class - Identification Process - need evidence for demand Process - need templated change request and removed Class - Taxon labels Sep 5, 2020

tucotuco changed the title ~~New term verbatimScientificName~~ New term - verbatimIdentification Apr 19, 2021

tucotuco added Process - ready for public comment and removed Process - need templated change request labels Apr 19, 2021

edwbaker mentioned this issue Apr 20, 2021

New term - verbatimLabel #32

Closed

tucotuco added the normative label Apr 30, 2021

tucotuco added this to the The Rush of the April Fools milestone Apr 30, 2021

claudenozeres mentioned this issue May 26, 2021

Dealing with unnamed/cryptic species iobis/Project-team-Genetic-Data#7

Open

tucotuco added Process - under public review and removed Process - ready for public comment labels May 26, 2021

tucotuco added Process - ready for public comment and removed Process - ready for public comment labels May 26, 2021

tucotuco added Process - prepare for Executive review and removed Process - under public review labels Jun 2, 2021

tucotuco added Process - in Executive review and removed Process - prepare for Executive review labels Jul 1, 2021

tucotuco added Process - implement and removed Process - in Executive review labels Jul 23, 2021

tucotuco added Process - complete and removed Process - implement labels Aug 25, 2021

tucotuco closed this as completed Aug 25, 2021

LocoDelAssembly mentioned this issue Jul 8, 2024

IdentificationQualifier is not mapped as DwC import field SpeciesFileGroup/taxonworks#2430

Open

sformel-usgs mentioned this issue Jul 10, 2024

New Term - verbatimMeasurementType #518

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New term - verbatimIdentification #181

New term - verbatimIdentification #181

Daphnisd commented Feb 19, 2018 •

edited by tucotuco

Loading

mdoering commented Feb 19, 2018

cgendreau commented Feb 19, 2018

mdoering commented Feb 19, 2018

ansell commented Feb 20, 2018

claudenozeres commented Feb 20, 2018

mdoering commented Feb 20, 2018

qgroom commented Feb 20, 2018

mdoering commented Feb 20, 2018

qgroom commented Feb 20, 2018

debpaul commented May 7, 2018

Daphnisd commented May 7, 2018

ansell commented May 8, 2018

mdoering commented May 9, 2018

peterdesmet commented May 9, 2018

baskaufs commented Jan 3, 2019

ianengelbrecht commented Mar 12, 2019

ianengelbrecht commented Mar 12, 2019

ianengelbrecht commented Feb 4, 2020

tucotuco commented Sep 5, 2020

qgroom commented Sep 6, 2020

nielsklazenga commented Sep 24, 2020

tucotuco commented Apr 19, 2021

nielsklazenga commented Apr 19, 2021

tucotuco commented Apr 19, 2021

afuchs1 commented May 26, 2021

tucotuco commented May 26, 2021

hollyel commented May 28, 2021

EstebanMH-SiB commented May 28, 2021

tucotuco commented Aug 25, 2021

New term - verbatimIdentification #181

New term - verbatimIdentification #181

Comments

Daphnisd commented Feb 19, 2018 • edited by tucotuco Loading

New term

mdoering commented Feb 19, 2018

cgendreau commented Feb 19, 2018

mdoering commented Feb 19, 2018

ansell commented Feb 20, 2018

claudenozeres commented Feb 20, 2018

mdoering commented Feb 20, 2018

qgroom commented Feb 20, 2018

mdoering commented Feb 20, 2018

qgroom commented Feb 20, 2018

debpaul commented May 7, 2018

Daphnisd commented May 7, 2018

ansell commented May 8, 2018

mdoering commented May 9, 2018

peterdesmet commented May 9, 2018

baskaufs commented Jan 3, 2019

ianengelbrecht commented Mar 12, 2019

ianengelbrecht commented Mar 12, 2019

ianengelbrecht commented Feb 4, 2020

tucotuco commented Sep 5, 2020

qgroom commented Sep 6, 2020

nielsklazenga commented Sep 24, 2020

tucotuco commented Apr 19, 2021

nielsklazenga commented Apr 19, 2021

tucotuco commented Apr 19, 2021

afuchs1 commented May 26, 2021

tucotuco commented May 26, 2021

hollyel commented May 28, 2021

EstebanMH-SiB commented May 28, 2021

tucotuco commented Aug 25, 2021

Daphnisd commented Feb 19, 2018 •

edited by tucotuco

Loading