Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_LICENSE_STANDARD #38

Open
iDigBioBot opened this issue Jan 5, 2018 · 19 comments
Open

TG2-VALIDATION_LICENSE_STANDARD #38

iDigBioBot opened this issue Jan 5, 2018 · 19 comments
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 3136236e-04b6-49ea-8b34-a65f25e3aba1
Label VALIDATION_LICENSE_STANDARD
Description Does the value of dcterms:license occur in bdq:sourceAuthority?
TestType Validation
Darwin Core Class Record-level
Information Elements ActedUpon dcterms:license
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dcterms:license is EMPTY; COMPLIANT if the value of the term dcterms:license is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions LICENSE_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Creative Commons 4.0 Licenses or CC0 {[https://creativecommons.org/]} { Regular Expression [((http(s){0,1}://creativecommons.org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}://creativecommons.org/publicdomain/zero/1.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})))$
Specification Last Updated 2023-09-17
Examples [dcterms:license="https://creativecommons.org/licenses/by/4.0/": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dcterms:license matches a term in bdq:sourceAuthority"]
[dcterms:license="GPL": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dcterms:license does not match a term in the bdq:sourceAuthority"]
Source John Wieczorek
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes The license at the record level might be derived from the license of the data set from which the record is retrieved. This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. The canonical form of the Creative Commons license IRI has nothing after the version e.g. https://creativecommons.org/licenses/by/4.0/, but may be followed by deed or legalcode e.g. https://creativecommons.org/licenses/by/4.0/deed and this may be followed by a language code. However, only some two letter language codes have translations, and some translations are identified by a longer string than the two letter language code. Errors in the language code, or specifying a language code for which a translation doesn't exist returns a 404 error instead of redirecting to the more general license IRI. As of 2024-02-28 deed.mi doesn't exist yet, but legalcode.mi does.
@iDigBioBot
Copy link
Collaborator Author

Comment by Christian Gendreau (@cgendreau) migrated from spreadsheet:
I would say more "parsable" than valid since the validity depends on the context

@pzermoglio pzermoglio changed the title TG2-VALIDATION_DCTERMSLICENSE_NOTSTANDARD TG2-VALIDATION_DCLICENSE_NOTSTANDARD Jan 19, 2018
@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 19, 2018
@chicoreus
Copy link
Collaborator

Correcting namespace for license in information element s/dc/dcterms/

This may require renaming the test. See the Darwin Core RDF guide for discussion of use of dcterms:license for non-literals (IRIs to resources) and xmpRights:usageTerms for literals.

See also #99 and #133 which may also need renaming.

@ArthurChapman
Copy link
Collaborator

Corrected dc:license to dc:terms:license throughout

@ArthurChapman
Copy link
Collaborator

@chicoreus Perhaps we could just change names of the tests to ...LICENSE... rather then ...DCLICENSE...

Whatever we do, it will be synonymised in Vocabulary.

@chicoreus
Copy link
Collaborator

@ArthurChapman names to LICENSE makes sense to me.

@ArthurChapman ArthurChapman changed the title TG2-VALIDATION_DCLICENSE_NOTSTANDARD TG2-VALIDATION_LICENSE_NOTSTANDARD Sep 7, 2018
@ArthurChapman
Copy link
Collaborator

Closed by mistake

@ArthurChapman ArthurChapman reopened this Sep 8, 2018
@tucotuco tucotuco added the Parameterized Test requires a parameter label Nov 5, 2018
Tasilee added a commit that referenced this issue Oct 7, 2020
In accordance with #189, added testdata_VALIDATION_LICENSE_NOTSTANDARD_#38.csv for #38
@Tasilee Tasilee changed the title TG2-VALIDATION_LICENSE_NOTSTANDARD TG2-VALIDATION_LICENSE_STANDARD Mar 22, 2022
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 12, 2022

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 16, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted"

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
@chicoreus
Copy link
Collaborator

Updated notes from "fail" to more specific "This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. "

@chicoreus
Copy link
Collaborator

Per both the current examples in Darwin Core for dcterms:license and section 3.3. of the Darwin Core RDF guide, dcterms:license should only take IRI values such as https://creativecommons.org/licenses/by-sa/4.0/ , not string literals such as CC BY-SA.

See: https://dwc.tdwg.org/rdf/#33-imported-dublin-core-terms-that-have-non-literal-objects-and-corresponding-terms-that-have-literal-objects-normative

The compliant example in this test incorrectly uses string literal values.

The source authority specified only allows for the version 4 CC licenses. This may be desirable, but, that may be too narrow a scope. It might be desirable to specify the full list of CC licences as the source authority at: https://creativecommons.org/licenses/list.en If not, we should be explicit about why the limitation to the 4.0 versions.

We should also be explicit about whether forms other than the canonical IRI are acceptable. Creative Commons specifies that the form https://creativecommons.org/licenses/by-sa/4.0/ as canonical, with additional variants including https://creativecommons.org/licenses/by-sa/4.0/legalcode and translations such as https://creativecommons.org/licenses/by-sa/4.0/legalcode.en and plain text https://creativecommons.org/licenses/by-sa/4.0/legalcode.txt and RDF forms https://creativecommons.org/licenses/by-sa/4.0/rdf

The examples in Darwin Core include a non-canonical form ending with legalcode.

Translations do not exist in all languages, for example, https://creativecommons.org/licenses/by-sa/4.0/legalcode.cy currently returns a 404 error, not the Welsh translation, or a redirect up to /legalcode for that license.

The specified source authority doesn't provide a list of all variants of the IRIs, so it would either need to change to point to the list of licenses https://creativecommons.org/licenses/list.en (where there are links to each extant translation, but mo links to the canonical IRIs) or implementations would need to determine how to handle evaluating whether a specified IRI for a translation is compliant or not by both checking the pattern http[s]{0,1}://creativecommons.org/licenses/(by|by-nc|by-nc-nd|by-nc-sa |by-nd|by-sa)/4.0/(legal-code(.[a-z]{2}){0.1} followed by a lookup to see if the requested IRI returns a 404 error or not.

In either case, specifying the source authority isn't sufficient information to determine if the if the value of the term dcterms:license is in the bdq:sourceAuthority;

@chicoreus
Copy link
Collaborator

Propose we change the source authority to:

bdq:sourceAuthority default = "Creative Commons 4.0 Licenses or CC0 " {[https://creativecommons.org/]} { Regular Expression [
(http(s){0,1}://creativecommons.org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}://creativecommons.org/publicdomain/zero/1.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})
]}

Haven't confirmed that this regex syntax is correct, but should be close, also need to doublecheck language list for public domain dedication. Will need to be more complex to be robust, as deed.mi doesn't exist yet, but legalcode.mi does.

All of deed, legalcode, no ending are valid, with the canonical form of the license IRI having nothing after the version number.

Only some two letter language codes have translations, and some translations are identified by a longer string than the two letter language code. Errors in the language code, or specifying a language code for which a translation doesn't exist returns a 404 error instead of redirecting to the more general license IRI.

@chicoreus
Copy link
Collaborator

Syntax corrected regex:

^(http(s){0,1}://creativecommons.org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|mi|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})|(http(s){0,1}://creativecommons.org/publicdomain/zero/1.0/((deed|legalcode)(.(id|eu|da|de|en|es|fr|fy|hr|it|lv|lt|ni|no|pl|pt|ro|si|fi|sv|tr|cs|el|ru|uk|ar|jp|zh-hans|zh-hant|ko)){0,1})))$

@ArthurChapman
Copy link
Collaborator

That looks like a solution!

@ArthurChapman
Copy link
Collaborator

Your other explanation would be good to put in the Notes.

@chicoreus
Copy link
Collaborator

Regex still not quite right. This one tested and works:

^(http(s){0,1}://creativecommons[.]org/licenses/(by|by-sa|by-nc|by-nc-sa|by-nd|by-nc-nd)/4[.]0/((deed|legalcode)(.){0,1}){0,1})|(http(s){0,1}://creativecommons[.]org/publicdomain/zero/1[.]0/((deed|legalcode)(.){0,1}){0,1})$

@chicoreus
Copy link
Collaborator

@ArthurChapman added the substance of the comment above to the notes.

chicoreus added a commit to FilteredPush/rec_occur_qc that referenced this issue Feb 27, 2024
…ng with unit test and default method. Uses the proposed regex for the test to identify CC 4.0 licenses and CC0. Expect the sourceAuthority for the test to be changed to accomodate this, but discussion is ongoing.
@Tasilee
Copy link
Collaborator

Tasilee commented Mar 22, 2024

Thanks @ArthurChapman. @chicoreus : How can you render the "|"s in the regular expression into a form acceptable to the github table format? I've tried to add the Source Authority as specified.

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Mar 22, 2024

To include a pipe in Markdown text -use "\|"

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 25, 2024

Thanks @ArthurChapman - done I hope.

chicoreus added a commit to FilteredPush/rec_occur_qc that referenced this issue Jul 29, 2024
…sulting from review of validation results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests OTHER Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Development

No branches or pull requests

5 participants