Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining validation #449

Open
kcoyle opened this issue Oct 8, 2018 · 14 comments
Open

Defining validation #449

kcoyle opened this issue Oct 8, 2018 · 14 comments

Comments

@kcoyle
Copy link
Contributor

kcoyle commented Oct 8, 2018

The profile guidance document will include some statements about the validation of profiles. Digging into this deeper, it is going to be necessary to say what we mean by validation.

A set of validation rules can themselves represent a view, and thus a profile. Validation can be limited to a few "necessary" rules, or it can be strict and validate every constraint expressed in the profile.

There is also the question of whether validation of RDF instance data performs any inferencing as part of the validation.

And the big question is whether validation of a profile makes use of any constraints from the base vocabulary that the profile is based on.

The questions I see are:

  1. Do we have guidance to give on the types of validation that can be performed?
  2. Is there a need to restrict the guidance to certain validation actions, leaving the rest outside of our scope?
  3. Assuming we make decisions, how should those be reflected in the profileDesc vocabulary?
@nicholascar
Copy link
Contributor

Currenty the ProfilesOntology contains a ResourceRole class to indicate the role played by various profile implementation documents. There is also a demo vocabulary of Role instances (and here in RDF, some updated version of which will shortly be rolled into the Profile Ont doc. The demo vocab contains "Full Constraints", "Part Constraints" & "Conformance Test" roles which address some of the roles discussed here.

The expectation is the vocab would be extended within the WG and also extensible.

@kcoyle
Copy link
Contributor Author

kcoyle commented Oct 9, 2018

Thanks, @nicholascar . That brings up another question for me: do we talk about validation (the act of validating) or do we talk about constraints (the definition of rules), or both? They are different, but sometimes are realized in a single code - that is, a SHACL document defines constraints but it also can be used to implement the act validation (checking the instance data against the defined constraints and returning the results). In a sense SHACL can be seen as both documentation and implementation although you could argue that the latter function is its primary function. The DCAT-AP PDF document defines constraints that are the re-defined in the SHACL file. The latter can be used as implementation of validation, the former is only documentation.

(The use of OWL "constraints" complicates things, as open world OWL makes use of those constraints quite differently from SHACL or ShEx.)

@larsgsvensson
Copy link
Contributor

This might be splitting hairs, but we probably need to differentiate between "a set of validation rules" and "validation", where the latter is the act of applying the former on some kind of data. In PROV-O terms the "validation" would be a prov:Activity, whereas the "set of validation rules", the data to be validated and the validation report are all prov:Entitys. The "validation" is performed by some kind of prov:Agent (a human or software application).

that is, a SHACL document defines constraints but it also can be used to implement the act validation (checking the instance data against the defined constraints and returning the results). In a sense SHACL can be seen as both documentation and implementation although you could argue that the latter function is its primary function.

To me a SHACL document is a "set of validation rules" but does not implement the act of validation itself. It's documentation using a specific machine-readable syntax, but it's not validation. The act of validation is performed by the validation software.

Does that make sense?

@kcoyle
Copy link
Contributor Author

kcoyle commented Oct 9, 2018

@larsgsvensson I agree that validation is an activity, whereas the rules are what feeds into that activity. However, where do you stand on the question of constraints vs validation rules? Do you see them as the same thing, or different? And do we include written rules in non-actionable documents, as in the DCAT-AP PDF, or must constraints be implementable? (This is essentially where I am stuck, but I think we need to get it clear or our text will mix them up and be incoherent.)

@rob-metalinkage
Copy link
Contributor

We definitely include constraints expressed in non-actionable documents, but we should recommend that as far as possible constraints are written in a formalism that allows validation activities to be formed.
e.g. DCAT-AP has a partial SHACL implementation non-normative I believe. All constraints should be testable - even if it is manual inspection. So constraints SHOULD be expressed as validation rules is perhaps as much as we can say,

@larsgsvensson
Copy link
Contributor

@kcoyle scripsit:

where do you stand on the question of constraints vs validation rules? Do you see them as the same thing, or different?

In this context I'd say that validation rules and constraints are the same thing (it's about constraining the set of possibilities). In order not to confuse them with OWL constraints (a different kind of beast) we'd probably better just use the term "validation rules" and not talk about "constraints".

And do we include written rules in non-actionable documents, as in the DCAT-AP PDF, or must constraints be implementable?

Here I'm with @rob-metalinkage that we "include constraints expressed in non-actionable documents".

All constraints should be testable - even if it is manual inspection.

Here I agree, too (when we do requirements review I usually keep asking how to test that the requirement is fulfilled) although that can be difficult without deeper knowledge of the data being tested. Karen's example of "mandatory if applicable" is such a case: If something is applicable usually differs from institution to institution (or from application to application).

@agreiner
Copy link
Contributor

agreiner commented Oct 9, 2018

There are two kinds of validation relevant here. One is validation of profiles and another is validation of datasets against particular profiles. To meet the W3C publishing requirements for a rec, I think we need to enable the former. To help people understand how to write good profiles, we probably ought to provide at least some guidance about the latter as well.

@kcoyle
Copy link
Contributor Author

kcoyle commented Oct 9, 2018

Great points, @agreiner , thanks. This is definitely something we should cover.

@aisaac
Copy link
Contributor

aisaac commented Oct 9, 2018

For the sake of our mental health, now that @agreiner has identified the dichotomy, I would prefer not to address the issue of validating profiles. Or just say that we expect profile expressions/representations to conform to whatever format/language/best practices they're supposed to follow.
First, we don't have a concrete requirement about it, I think. Second, it's quite obvious that a profile being itself a chunk of (meta)data or even (meta)(meta)data, since it's data it must adheres to whatever rule apply to the means of expression chosen for it (this would also apply to human-readable expressions of the profile of course). I believe that the validation of profiles is already covered by the general case of validation against profiles/standards.

@aisaac
Copy link
Contributor

aisaac commented Oct 9, 2018

On the more general thread: I agree it's appropriate to distinguish validation actions vs validation rules. It may be splitting hair actually too, but it may bring some useful modeling for the profile ontology.

I'm less sure about constraint vs validation rules. In fact OWL constraints do "constrain the set of possibilities", it's just that they do it in an open world and that is quite different from constraining data in a a closed world. Maybe we can refer to 'validation' for checking rules in a closed world, and 'constraint checking' for the action of checking rules without any assumption about the openness. But it will probably conflict with other terminology.

@rob-metalinkage
Copy link
Contributor

I think we do not need to say anything about validation processes - neither DCAT nor Profiles model the results of validation checks, so there is no explicit process we need to reference, hence no need to get too deep into defining them.

@aisaac
Copy link
Contributor

aisaac commented Oct 9, 2018

+1 for not venturing in the validation process, after we've identified that there's some process.

Ultimately probably all this is an area where we should stick very closely to the requirements we have and not get trapped in long discussion without having the list under our eyes and not venture beyond what we are asked to do to meet these requirements

@kcoyle kcoyle added this to To do in Profile Guidance via automation Oct 23, 2018
@nicholascar
Copy link
Contributor

De-tagging this from PROF as it's not a profiles vocabulary issue.

@nicholascar nicholascar removed the profiles-vocabulary For discussion of profile description vocabulary label Aug 18, 2019
@kcoyle
Copy link
Contributor Author

kcoyle commented Aug 19, 2019

Nick, if PROF will have a role that is some version of the term "validation" then it needs a definition, doesn't it? It may not be as extensive as the one in the guidance document, but one would expect each role to have a short definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants