Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_CLASSIFICATION_CONSISTENT #123

Open
ArthurChapman opened this issue Jan 18, 2018 · 36 comments
Open

TG2-VALIDATION_CLASSIFICATION_CONSISTENT #123

ArthurChapman opened this issue Jan 18, 2018 · 36 comments
Labels
Consistency CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Jan 18, 2018

TestField Value
GUID 2750c040-1d4a-4149-99fe-0512785f2d5f
Label VALIDATION_CLASSIFICATION_CONSISTENT
Description Is the combination of higher classification taxonomic terms consistent using bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:kingdom
dwc:phylum
dwc:class
dwc:order
dwc:superfamily
dwc:family
dwc:subfamily
dwc:tribe
dwc:subtribe
dwc:genus
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if all of the fields dwc:kingdom dwc:phylum, dwc:class, dwc:order, dwc:superfamily, dwc:family, dwc:subfamily, dwc:tribe, dwc:subtribe, dwc:genus are EMPTY; COMPLIANT if the combination of values of higher classification taxonomic terms (dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:superfamily, dwc:family, dwc:subfamily, dwc:tribe, dwc:subtribe, dwc:genus) are consistent with the lowest ranking matched element in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Consistency
Term-Actions CLASSIFICATION_CONSISTENT
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}
Specification Last Updated 2023-09-18
Examples [dwc:kingdom="", dwc:phylum="", dwc:class="", dwc:order="Myrtales", dwc:family="Myrtaceae": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="The combination of values of higher classification taxonomic terms (dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family) can be unambiguously resolved by the bdq:sourceAuthority"]
[dwc:kingdom="", dwc:phylum="Chordata", dwc:class="", dwc:order="Rhopalocera", dwc:family="Muricidae": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="The combination of values of higher classification taxonomic terms (dwc:kingdom, dwc:phylum, dwc:class, dwc:order, dwc:family) cannot be unambiguously resolved by the bdq:sourceAuthority"]
Source TG2-Gainesville
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes A fail condition may arise either from the taxon terms being internally inconsistent (not all of the information can be true at the same time), or from the vocabulary being incapable of resolving the combination of classification values. Additional tests could be devised against a taxonomic authority to report the distinct failure conditions. This test specifically does not consider the content of dwc:higherClassification.
@ArthurChapman ArthurChapman added TG2 Validation NAME VOCABULARY Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT labels Jan 18, 2018
@tucotuco
Copy link
Member

tucotuco commented Aug 25, 2018

Discussed at TDWG 2018 DQIG meeting that there are two distinct potential causes for ambiguity. One potential cause is as in the original example, where an incorrect name somewhere in the higher classification terms throws doubt on what is correct and what is not. The other potential cause is a name (or combination) in the given values and ranks that matches more than one combination in the target authority. We are looking for an example of this. Perhaps a family-level homonym?

@ArthurChapman
Copy link
Collaborator Author

Should we incorporate some of the @tucotuco comment above into the Notes?

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 5, 2018

Most definitely, once we have an example of a multiple match.

@ArthurChapman
Copy link
Collaborator Author

Someone at TDWG mentioned that there was only one homonym at the family level or higher. Not sure what it is - but it would surprise me if there was only one.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 5, 2018

@ArthurChapman it would certainly surprise me as well! Taxonomists are devious.

@tucotuco
Copy link
Member

tucotuco commented Sep 5, 2018 via email

@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Sep 5, 2018 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 5, 2018

Thanks @tucotuco. I had forgotten how useful Tony Rees' IRMNG is.

ArthurChapman added a commit that referenced this issue Oct 5, 2020
In accord with #189 removed blank lines at end of file for CLASSIFICATION AMBIGUOUS #123
@mjy
Copy link

mjy commented Feb 18, 2021

Perhaps not the place for this, but it's the first example I looked at. Are tests that depend on an external authority and thus not computable without reference to that authority grouped in some way or identified as such?

For example, given the test data testdata_VALIDATION_CLASSIFICATION_AMBIGUOUS_#123.csv I can not write code to perform (all) the tests unless I resolve a request against bdq:sourceAuthority. This seems to represent a class of tests that will change results, potentially, when the authority changes (as opposed to the data in the CSV), and thus much more difficult to implement consistently?

@tucotuco
Copy link
Member

@mjy Your observation is correct. There isn't currently a label for the tests that require a source authority, but that seems like a useful label to add. All of the tests that do require a source authority should have the label "Parametrized", but not all of the tests with the label "Parametrized" necessarily require a source authority.

I would think that the best way forward on tests of this nature is to use values from the source authority expected to remain highly stable.

@ArthurChapman
Copy link
Collaborator Author

I think we have a "Vocabulary" label - not sure we have been consistent with it, I'd have to check.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 22, 2021

@ArthurChapman is right most of the time in saying when bdq:sourceAuthority is a Parameter, the VOCABULARY tag is present. There are 25 tests that have the Parameter "bdq:sourceAuthority" and all but four have a VOCABULARY tag. The four are

#50
#62
#73
#76

All of these have the tag "ISO/DCMI standard" (and there are 13 tests that have that tag). In reviewing the tests, maybe we do have an anomaly or two.

Take #48: It has the ISO/DCMI STANDARD" tag, but no "bdq:sourceAuthority" as there is only ONE, so it does not have "Parameterized" nor does it have "VOCABULARY", even though there is one.

@mjy 's view from a developer's perspective is less subtle than our reasoning? Is there is a case for a) removing the "ISO/DCMI STANDARD" tag, b) including a "bdq:sourceAuthority" and c) if relevant, including a "VOCABULARY" tag when there is one? Or maybe just adding a new tag "EXTERNAL SOURCE" or equivalent wherever there is a need to refer to external sources?

Thoughts?

@tucotuco
Copy link
Member

tucotuco commented Feb 26, 2021 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 28, 2021

Thanks @tucotuco. I agree with you that ISO 3166 'tests'  should have the VOCABULARY, and I have added those tags where missing.

#50 has a Parameter "bdq:sourceAuthority", and the Notes (as per our way of documenting), has "[bdq:sourceAuthority default for country shapes = spatial UNION of terrestrial boundaries from gadm.org and EEZs from marineregions.org", but only a Reference to the codes (ISO 3166...). This does seem anomalous. Do we add a second Parameter "bdq:sourceAuthority2" and then assign the default it in the notes?

Ditto #73.

#76 - I agree that it doesn't use a vocabulary but it seems here is where a Reference is appropriate and not bdq:sourcAuthority as the test doesn't specifically look up something?

#46 - I agree and have added "bdq:sourceAuthority" to Parameters. This looks like an omission. We keep finding such things :|

@tucotuco
Copy link
Member

tucotuco commented Feb 28, 2021 via email

@tucotuco
Copy link
Member

I believe this test requires the elements dwc:subfamily and dwc:genericName. These are new since the test was first formulated and no update included them.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 23, 2022
…ASSIFICATION_CONSISTENT with GBIF and WoRMS authorities. Includes minimal integration test.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 23, 2022
…g/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT.  Needs more work, not passing all validation cases.  Added line number to log4j test configuration.
@Tasilee
Copy link
Collaborator

Tasilee commented Aug 23, 2022

Thanks @tucotuco - added but is dwc:genericName classed as a "higher classification taxonomic term"?

@chicoreus
Copy link
Collaborator

78640f09-8353-411a-800e-9b6d498fb1c9 duplicates #95 replacing with 2750c040-1d4a-4149-99fe-0512785f2d5f

chicoreus added a commit that referenced this issue Aug 24, 2022
…95 in the data for validating tests.  Will need update upstream in @Tasilee's spreadsheet.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 24, 2022
@chicoreus
Copy link
Collaborator

@Tasilee good catch. The information elements should include dwc:genus, but not dwc:genericName, as dwc:genericName is a parse of the generic name portion of dwc:scientificName, not the placement of the taxon in the classification.

@tucotuco
Copy link
Member

Yes, good catch, my bad.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Sep 13, 2022
…STENT to match parentage of higher taxa in the source authority with their parentage in the presented data, including matching on synonyms. Added SciNameUtils.isSameClassificationInAuthority() to check parentage against authority, along with BooleanWithComment to carry both the result and a comment from this check. Modified SciNameUtils.sameOrSynonym to check name as synonym of otherName and otherName as synonym of name.
@Tasilee
Copy link
Collaborator

Tasilee commented Jun 13, 2023

Restructured Parameter(s) and Source authority

@chicoreus
Copy link
Collaborator

Will need to include the new terms dwc:superfamily, dwc:tribe, dwc:subtribe tdwg/dwc#65 tdwg/dwc#45 tdwg/dwc#46

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 4, 2023

Added the terms dwc:superfamily, dwc:tribe, dwc:subtribe to the Information elements and Expected response, and updated Specification Last Updated.

@Tasilee Tasilee removed the NEEDS WORK label Jul 4, 2023
@Tasilee
Copy link
Collaborator

Tasilee commented Jul 4, 2023

Amended Source Authority values to align with @chicoreus syntax

From

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] |
| | API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]

to

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}

@chicoreus
Copy link
Collaborator

Minor update to specification, changed one instance of genericName to be the expected classification term genus.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 13, 2023
…ifications. Addressed tdwg/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT.  Adding superfamily, tribe, subtribe as parameters.  Adding support for checking these along with subfamily.  Updating GBIF api to current version to obtain support for superfamily, tribe, subtribe, adding these to local NameUsage class.  Updating GBIF name parser to current version, adding handling for new threading exception thrown from parse methods.  Removed checked stub method.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 15, 2023
…aining tests tdwg/bdq#70 VALIDATION_TAXON_UNAMBIGUOUS and tdwg/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT.  Metadata, including source authority values, updated.  Some cleanup of other comments, and consistency of comments in defaults class.
@ArthurChapman
Copy link
Collaborator Author

Added to the Notes

"Note: that for this test to work, the lowest ranking element must be present and the higher ranking elements be consistent with it."

Do we need to reword the Expected Response?

This follows implementation tests by @chicoreus where:

kingdom="Animalia";
phylum="Arthropoda";
phylclass="Insecta";
order="Lepidoptera";
superfamily="Papilionoidea";
family="Lycaenidae";
subfamily="Poritiinae";
tribe = "Poritiini";
subtribe = "";
genus="Poritia";

is COMPLIANT, but

kingdom="Animalia";
phylum="Arthropoda";
phylclass="Insecta";
order="Lepidoptera";
superfamily="Papilionoidea";
family="Lycaenidae";
subfamily="Poritiinae";
tribe = "Poritiini";
subtribe = "";
genus="";

is NOT_COMPLIANT

It was agreed through email discussion that this is what we want to happen.

@ArthurChapman
Copy link
Collaborator Author

Expected Response changed (following ZOOM of 2023-08-29) and Specification Date updated

"..... are consistent with the lowest ranking matched element in the bdq:sourceAuthority"

And the last added part of the notes deleted.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 7, 2023

This test should have Data Quality Dimension "Consistency" rather than "Conformance". Edited.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 7, 2023 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 10, 2023

Thanks @chicoreus. However, this would be the only Test with a Warning Type of "Inconsistent" that had a Data Quality Dimension of "Consistency". Given the one-to-one mappings of Data Quality Dimension to Warning Type suggest strongly for removal of Warning Type, this would be the one outlier.

Retaining Warning Type under the circumstances would seem highly inefficient, at best.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 18, 2024
…adding to unit test to confirm that 'consistent with the lowest ranking matched element' is handled as specified, and fixing some cases where superfamily from previous test case was passed forward to next.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consistency CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Development

No branches or pull requests

6 participants