-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New term - verbatimIdentification #181
Comments
I would suggest to review all "verbatim" terms and come up with a general strategy. I can't find the issue now, but it was also suggested to have a new rowType that could indicate verbatim values. |
@mdoering it overlaps with discussion in gbif/occurrence#24 |
Thanks @cgendreau thats exactly what I was looking for! |
A new convention is my preference so far, per the discussion in gbif/occurrence#24 |
I agree with Daphnis that a new term and/or strategy is needed to make this more explicit. In the past for marine datasets, practice was to publish using a valid name from an interpreted original. Despite the recommendation to submit using the original (because it gets too messy, WoRMS can't always suggest matches for obvious names). Then the original (verbatim) information is lost. Currently this is a challenge for me with specimen label names (although I imagine it is similar for observation records). So the matter of original names sometimes gets mixed in with issues of identification. What I need is to record verbatim name. Displaying a valid scientificName comes after, because as it appears in the ALA/GBIF discussion, this can be open to interpretation. Having verbatim as part of a history extension (rather than core) does seems fragile to me. |
I would argue DwC should not have any specific verbatim terms but rather recommend other ways of dealing with data provenance. Often we also have longer lineages with multiple steps that alter the content so a single verbatim term is difficult to apply. For example W3C offers a rather complete PROV Ontology although we should probably look for sth far more simple. |
I tend to agree with Markus. Essentially every field could have a verbatim term and it would be better if we could chain versions of an observation together. My only doubt is that Darwin Core has to be kept reasonably simple otherwise people will not use it. Therefore, I'm OK with maintaining some verbatim fields as long as there was a gold standard way to handle these data. |
Well, we could also create a verbatim term for each dwc term. There is not really a restriction on number of terms, just increased complexity. But if verbatim terms always have a prefix "verbatim" its not adding much to the confusion. It might even help cause we could get the existing ones out of the way when presenting terms |
As Markus points out the problem is provenance. As data associated with an observation/specimen get amended the chain of provenance is lost if you only have one verbatim field. If I understand Markus correctly you could have no verbatim fields because every field would be verbatim and you would link versions together to determine provenance. |
From a data-mining standpoint, we need verbatim data to do things things like automatically find matching references between a dwc record and an old publication in BHL. If for example, the verbatim locality, verbatim taxon name, are not shared, then it will be much for difficult for computer algorithms to make the connections between the two datasets. I'm not sure it matters what we call it as long as it's clear that it's the "original text" in this case, as found in or on the label / field notebook / ledger. So it seems you are all saying we could / ought to use Identification History (and other such extensions) to share this type of information? What about verbatimCountry? this comes up all the time. |
I don't think using a separate extension for this is an option for us, as it would not be compatible with event core in IPT. |
Adding a single "verbatim" extension to a Darwin Core Archive isn't going to satisfy every use case if provenance over time is required, but those use cases also won't be satisfied by a single In a possibly more serious provenance case, the ALA created an issue for itself, GBIF, and the community, a number of years ago with its choice to overwrite the original Having a standardised way of providing one or more verbatim or historical Darwin Core Archive extension files would allow users to optionally read what the original author provided, or read what other evolutions of the record contained. The current GBIF-only convention only allows for a single verbatim extension based on a static file name, which won't work for historical contexts where you want to track evolution of a dataset over time. Having an accepted convention that uses metadata rather than file names, whether it is based on the (overly complex) W3C PROV vocabulary, or another system, is essential to me for providing a workaround for the ALA I don't agree that we should add more verbatim terms to Darwin Core Terms solely to satisfy existing systems that aren't designed for a "verbatim extensions" model that we haven't developed yet. However, given the |
If the old convention is continued, how bad would it be to create a verbatim term for every term in Darwin Core? At least we had a consistent model then |
Couldn't this be done with a |
I have suggested an approach for recording verbatim information involving the W3C SKOS-XL standard in the issue tdwg/tag#22. The actual process of getting from a provided verbatim string to full metadata associated with SKOS-XL instances is fleshed out more in my comment on TNC Issue 24. |
Could I suggest that a strategy for verbatim terms be created as a separate Github issue? Returning to the request for dwc:verbatimScientificName in itself, this would be useful. The documentation for dwc:scientificName says 'This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term' (although the example does include a case that includes the identification qualifier). The BDQ TG2 tests and assertions includes TG2-VALIDATION_POLYNOMIAL_NOTSTANDARD, which as currently defined will return NOT_COMPLIANT for any dwc:scientificName values that include a qualifier. We should also be able to represent identifications such as 'Harpactira sp.' in our datasets, and we also have the case of informal names for undescribed species, such as Harpactira sp. 'blue', manuscript names, etc. |
@ansell it seems that the practice of creating or overwriting GUIDs is a pervasive problem, probably resulting from a misunderstanding of the purpose of GUIDs in the first place. IMO overwriting dwc:occurrenceID is a misapplication of the standard. Should we modify the standard to cope with its misapplication? Not a route I would advocate for. |
I see there is an verbatimScientificName field, and an accompanying verbatimScientificNameAuthorship field in a dataset I just downloaded from GBIF. |
Those must be the dwc:scientificName and dwc:scientitifNameAuthorship data from the originally published source. I am reviewing all existing Darwin Core issues to try to move them forward or abandon them as the Vocabulary Maintenance Specification demands. This particular issue had a lot of activity, and in the meantime the community has apparently arrived at practical solutions. I would like to establish if there is still demand for a new term dwc:verbatimScientificName. If there is, someone please follow the process and provide evidence of demand from at least two independent parties and a term definition following the template provided in Guidelines for contributing. Observation: I think this term would be best organized in the Identification class and have a name that explicitly makes the role of the name apparent, such as "verbatimIdentification". |
I agree. Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129 |
|
I have changed the title of the issue and prepended a templated term change request to the original comment so as not to have to make a separate issue and relate it to the discussion in this one. Help is needed to know what the equivalent XPATH is in ABCD, if any. |
@tucotuco , there is no equivalent for this term in ABCD 2.06. |
Thank you @nielsklazenga. Term definition updated and ready to be prepared for public comment. |
The Australasian Herbarium Information Systems Committee (HISCOM) endorses the addition of this term to Darwin Core, but proposes to add to the usage notes that |
@afuchs1 That seems a perfectly reasonable amendment to me. If there is no conflicting view, I will add it to the final usage comment. In the meantime, I have put a link to your suggestion in the usage section of the first comment. |
This term will be useful to the paleo collections community for expressing original IDs and the full extent of our knowledge despite nomenclatural uncertainty (e.g., "Genus sp. nov. 1" as illustrated by one of the existing examples). At least with our current systems, this kind of uncertainty and complexity can lead to unexpected results when our data go to aggregators and get matched to taxonomic backbones. - Holly Little, Erica Krimmel (@ekrimmel), and Talia Karim (@tkarim) (on behalf of the Paleo Data Working Group) |
We endorse this proposal on behalf of @SiBColombia |
Done. |
New term
Proposed new attributes of the term:
scientificName
(andidentificationQualifier
etc.), not instead of it.Peromyscus sp.
,Ministrymon sp. nov. 1
,Anser anser X Branta canadensis
,Pachyporidae?
From Daphnis de Pooter (@Daphnisd)
Term needed to record how a taxon was originally recorded in an unprocessed dataset.
Motivation: tdwg/dwc-qa#109
The text was updated successfully, but these errors were encountered: