Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition expressed machine readible #1255

Open
bertvannuffelen opened this issue Sep 21, 2020 · 4 comments
Open

Definition expressed machine readible #1255

bertvannuffelen opened this issue Sep 21, 2020 · 4 comments
Labels
dcat:Dataset dcat DesignPrinciples Somehow related to the design principles, e.g.levels of machine readability, ontological commitment feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round

Comments

@bertvannuffelen
Copy link

The definition of Dataset

A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

expresses a cardinality constraint between Dataset and Agent.
Is that cardinality constraint somewhere machine readable expressed? (e.g. as OWL or SHACL)

Or is it left to the implementers to determine on which property the constraint is imposed?

@dr-shorthair
Copy link
Contributor

We did not use much OWL axiomatization. That was a deliberate choice, since there are many usage patterns.

SHACL was only in its infancy when we started in DCATv2 so it was not one of the planned deliverables.
However, for the same reason I think we would be reluctant to add much in the way of cardinality constraints.

It is highly likely that specific community implementations would want to tighten this up, but these would be 'community profiles' of DCAT.

On the specific case - if you have an example where there are multiple agent's responsible for a datasets, then I would suggest encapsulating those into a (virtual) compound-agent.

@bertvannuffelen
Copy link
Author

I understand.

Now I become philosophical: it is interesting to see that there is a very strict statement in the human readable definition, but that this cannot be translated in a machine readable form without the feeling that it would constrain the usage of the term more than the human readable definition is intends.

It seems that we encounter here in a knowledge representation challenge: we agree with an intention but we cannot agree on the formal representation. So how do I can proof conformance in this case?

As I wrote, this answer is pure for the joy of the discussion. The original question has been answered.

@kcoyle
Copy link
Contributor

kcoyle commented Sep 21, 2020

The word "generally" inserted in that sentence can resolve the question.

In the Dublin Core community we have learned to prefer that vocabulary definitions, especially of vocabularies that are likely to be re-used in various contexts, should adhere to the "minimal semantic commitment" principle so that application profiles that utilize those vocabulary terms can apply their own constraints without creating incompatibilities. If you plan for a vocabulary with few constraints that is the base for APs I think you get the most out of your vocabulary.

@aidig
Copy link
Contributor

aidig commented Sep 22, 2020

Great discussion! It would be fantastic if there was a greater alignment between the human readable definitions and the formal machine readable expressions. It would force knowledge engineers to consider the potential of resuse of a new element and not to add restrictions unless deemed necessary for it interpretation - but also the opposite, that is, making sure that the defining characteristics are indeed present in the definition for narrowing the number of instances down to match the exact set that was intended.

In the case of the DCAT definition, it seems the minimal semantic commitment would more or less be the abstract "a collection of data" possibly adding "curated or published" in the scope of DCAT or even the definition proposed in this issue .
Indeed inserting a "generally" or a "typically" in the sentence would give the users an idea of common use cases and further profiling. However I would argue stongly against making such notes part of the definition as they are supplementary and should not be part of the formal machine readable definition. Rather such notes should be added as (usage) notes, comments, examples keeping the human readable definitions consise, precise and consistant and ready for formal machine readable expression.

A definition is a single phrase that can replace the term wherever used. It does not start with an article (e.g. “a”, “the”) or end with a full-stop. It does not take the form of, or contain, a requirement or recommendation. Additional information can be included in a Note to entry or an Example.

Ref: ISO on definitions: https://www.iso.org/files/live/sites/isoorg/files/archive/pdf/en/how-to-write-standards.pdf

@riccardoAlbertoni riccardoAlbertoni added this to To do in DCAT revision via automation Sep 23, 2020
@riccardoAlbertoni riccardoAlbertoni added this to the DCAT3 FPWD milestone Sep 23, 2020
@riccardoAlbertoni riccardoAlbertoni added feedback Issues stemming from external feedback to the WG dcat:Dataset DesignPrinciples Somehow related to the design principles, e.g.levels of machine readability, ontological commitment labels Sep 23, 2020
@andrea-perego andrea-perego added this to In progress in DCAT Sprint: Feedback Oct 30, 2020
@andrea-perego andrea-perego moved this from To do to In progress in DCAT Sprint: Design Principles Oct 30, 2020
@andrea-perego andrea-perego modified the milestones: DCAT3 2PWD, DCAT3 3PWD May 4, 2021
@andrea-perego andrea-perego modified the milestones: DCAT3 3PWD, DCAT3 4PWD Jan 26, 2022
@andrea-perego andrea-perego changed the title definition expressed machine readible Definition expressed machine readible Mar 8, 2022
@davebrowning davebrowning added this to To analyse in DCAT: Potential new requirements via automation Feb 13, 2023
@davebrowning davebrowning added the future-work issue deferred to the next standardization round label Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:Dataset dcat DesignPrinciples Somehow related to the design principles, e.g.levels of machine readability, ontological commitment feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round
Development

No branches or pull requests

7 participants