Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

Open
nicholascar opened this issue Jun 27, 2018 · 31 comments

Comments

@nicholascar
Copy link
Contributor

Entered from Google Doc

@nicholascar nicholascar added requirement profile-guidance requires discussion Issue to be discussed in a telecon (group or plenary) labels Jun 27, 2018
@nicholascar nicholascar changed the title Requirement: a profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation Sep 1, 2018
@aisaac aisaac changed the title A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) Nov 12, 2018
@aisaac aisaac added plenary-approved and removed requires discussion Issue to be discussed in a telecon (group or plenary) labels Nov 12, 2018
@jpullmann
Copy link

If this requirement is concerned with validation, I'd merge with implementation/definition - there are other ways to implement a validating profile then schema languages. The latter seem covered by #279.
What about: Profiles should allow for different types of data validation (structure, values, conventions).

@smrgeoinfo
Copy link
Contributor

This seems like a statement related to the modular specification approach, where in a conformance class defines a subset of requirements and tests to determine (validate) that they are met by instance documents. Can ID37 be met by pointing out that profiles (or any standard) can define one or more conformance classes indicating various levels or types of (partial) conformance.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 15, 2018

@smrgeoinfo I don't actually see the modularity here. The Guidance for Profile Publication document needs to say that there may be more than one file or resource that describes a profile, with different roles or functionality. Current DCAT profiles are often a PDF (for humans) plus a SHACL file (for validation). This requirement recognizes that situation.

@smrgeoinfo
Copy link
Contributor

I was focusing on the idea of 'partial implementation' and 'different levels of data validation'. The SHACL file, or an XSD, or maybe some text in the PDF define conformance classes. There may be guidance documents with different functionality; if different guidance documents provide different validation rules (whether testable with tools, or stated in text), they define different conformance classes. In that case a profile should be able to point at the specific guidance it is following. But perhaps I'm misunderstanding the issue.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 15, 2018

My gut is that if there are different validation rules or different guidance rules then you have a different profile. That is definitely how it would be judged in the library/archive world.

What gets tricky is the whole "equivalence" issue - and the need for a "center". If you have a profile with a SHACL validation file and a separately developed ShEx validation file, what is the requirement for equivalence? If you have a profile with a PDF document and a SHACL file, which one is the "center"? That is, which one provides the authoritative set of rules for the profile? Which one would you use to develop yet another validation file?

@smrgeoinfo
Copy link
Contributor

Good questions! Perhaps the only solution is for the provider to declare the authoritative validation rules (the 'center'); as far as DCAT, there would need to be some way to identify these artifacts.

@nicholascar
Copy link
Contributor Author

One could posit an additional Role for a central validator resource in the Resource Roles vocab for this purpose: https://w3c.github.io/dxwg/profilesont/resource_roles.html. I think focusing on the various Roles within a vocab of roles would be a good step forward for clarity.

@aisaac
Copy link
Contributor

aisaac commented Nov 18, 2018

@smrgeoinfo your interpretations are interesting, but they don't apply here. The requirement here is more basic. "(partially)" was just added because we needed to accommodate cases where the "implementation" of a profile according to a schema language would fail to express all the validation rules, just because the schema language is not expressive enough. For example, XML Schema can't express everything that SHACL allows to express. And SHACL itself may fail to express what a profile wants to express. So the "partial" coverage would happen more by accident than by design.

This said what you suggest is quite certainly a valid concerns. But it should be raised as a different requirement then, because our use case 37 does not support it.

@kcoyle @smrgeoinfo @nicholascar I like the idea of a 'center' but I'd advise to not spend time on it as it's a risky notion. Taking a basic, real-world counter-example that every profile has such 'center': in the Europeana Data Model, one could see the 'center' to be the human-readable spec. But when we embarked on creating an XML Schema for our model, we realized this schema would have to impose a sequential order on the elements in the data, which we didn't want! So in some respects the XML Schema (in the line of my remark above) has failed to implement everything we wanted to implement. But on other respects it actually forced us to an implementation that has more than what an "ideal" implementation would have.

@nicholascar
Copy link
Contributor Author

@aisaac: I agree that "advise to not spend time on it", hence the suggestion to create a Role for this and then leave it there. A vocab of roles, as per ISO codelists, can include some commonly used and other less commonly used codes. If some people wanted a centre, they could use the corresponding role. Others need not. This moves the discussion out of the core ontology and provides for such a possibility but doesn't require it and also ensures that if someone does call on this role, it is understood sincel it's in the list of known roles.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 19, 2018

I also agree that the concept of "center" is huge and beyond our abilities at this moment and for this deliverable. It's one of those things that we can put into a note that talks about "what else needs to be done to make profiles more useful" and hope that it becomes someone's future charter.

@aisaac
Copy link
Contributor

aisaac commented Nov 19, 2018

@nicholascar @kcoyle if it's a role, then actually I wouldn't object if it's in the core spec, as long as (1) it's not for the version being released now (and that would allow us to see what happens with W3C registration process); (2) it carefully avoids any semantic over-commitment, possibly leaving to another group the option to make it more precise. Some of the 'motivations' in Web Annotations are probably not super precisely defined, either, so I guess nobody would blame us for doing this.

@aisaac
Copy link
Contributor

aisaac commented Nov 19, 2018

@nicholascar @kcoyle in the end, whether it's in the core or not, what would be needed is a documented case and requirement!

@kcoyle
Copy link
Contributor

kcoyle commented Nov 19, 2018

The lack of a use case could indicate that it isn't core. At the same time, I think the profiles ontology brings with it issues that we didn't create use cases for at the beginning, and maybe we should consider scoping our work based on the use cases we have. The UCR document doesn't include profiles ontology as a filter so unfortunately we would have to dig through both the profiles and conneg requirements to see what requirements there really are. It would be reasonable for someone to ask which requirements the profiles ontology responds to. @jpullmann

@aisaac
Copy link
Contributor

aisaac commented Nov 19, 2018

@kcoyle considering the amount of requirements we currently have for profiles and conneg it won't be very hard to identify these that could be relevant for the profile ontology :-)

@nicholascar
Copy link
Contributor Author

I think we can generate a cogent Use Case in time for a 2PWD easily enough. @aisaac, would you like to create an Issue for this? I'm interested to assist on this one as I want to test out thr thoughts here.

@aisaac
Copy link
Contributor

aisaac commented Nov 19, 2018

@nicholascar ok I've created #597 thanks for the suggestion.

I think now we can close the parenthesis and come back to @jpullmann 's original comment:

If this requirement is concerned with validation, I'd merge with implementation/definition - there are other ways to implement a validating profile then schema languages. The latter seem covered by #279. What about: Profiles should allow for different types of data validation (structure, values, conventions).

@kcoyle
Copy link
Contributor

kcoyle commented Nov 19, 2018

I didn't think this was about TYPES of validation, just that validation schemas themselves can be considered aspects/objects of a profile. And it recognizes that a profile may provide more than one schema, and the schemas may not necessarily cover the same ground. I definitely would not use "should" here because we are not requiring data validation.

@smrgeoinfo
Copy link
Contributor

If a "profile may provide more than one schema, and the schemas may not necessarily cover the same ground", these various validation procedures must be clearly identifiable such that e.g. a dcat:distribution can point to the identifier for the profile validation that has been run on the dataset in the dct:conformsTo property.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 19, 2018

@smrgeoinfo I agree but I'm not sure that what we're working on will solve that - I don't know of any vocabulary that would tell you, of the various documents, vocabularies, validation files in a profile exactly WHAT they represent in terms of content. The only useful file relationship that I'm aware of is the dct:hasFormat that I suggested in this comment, which basically means "same exact data, different format" and which often means that the data is a different output format from the same software. My gut feeling is that if you have any validation files that are not output from the same process you cannot guarantee that they contain exactly the same data unless you run a test suite that compares outputs from the same instance data.

BTW, there is such a test suite applied to SHACL and ShEx, and there are a very small number of differences. There also is a way to translate between SHACL and ShEx, AFAIK. But if you develop the validation files independently, I wouldn't place a bet on the results being the same.

@smrgeoinfo
Copy link
Contributor

Seems like what is needed is an identifier for a resourceDescriptor (whatever that ends up being named) that you can use as the value for dct:conformsTo.

@nicholascar
Copy link
Contributor Author

Careful all: a starting assumption here is that it is the Profile that is being conformed to, not a part thereof. This is so a Profile can have guidance docs etc, as well as validator files. If we ID the validator file and people conformsTo to that, I think we've breached a concpetual contract like identifying a Dataset by one of its Distribution URIs.

Unless we work out specialised axiomes to chain a specialised conformsTo to a multi-step thing (X conformsTo Y, Y part of some Profile Z with some Role A).

@kcoyle
Copy link
Contributor

kcoyle commented Nov 20, 2018

@nicholascar Remember that anyone can say anything about anything (and they probably will). I hope that profile resources/objects will have identifiers, because if they don't they'll be blank nodes (/me making the 'go away vampire' sign). Also, I want people to be able to say things about them, like who created them, what version they are, etc.

dct:conformsTo has no domain and therefore the subject is of type rdf:Resource. If the desire is to limit conformance to profiles, such that

profileA  
      rdf:type prof:Profile ;  
      ex:conformsTo standardX .

then you'll need a prof:conformsTo that has a domain of prof:Profile.

That said, you can write rules for your required usage within your community and you can test for following those rules. But the vocabulary allow all defined uses of dct:conformsTo.

@kcoyle
Copy link
Contributor

kcoyle commented Nov 20, 2018

@nicholascar I just noticed that "dct:conformsTo" is listed in the Profiles Ontology document under 7.5 Class Resource Descriptor. Isn't that the opposite of what you say above?

@aisaac
Copy link
Contributor

aisaac commented Nov 20, 2018

I'm not sure I understand the link between identifiers, conformsTo and the original wording of the requirement, or @jpullmann 's rewording. This goes in solution space, no?

My take is that the requirement is about expressing that a profile can serve the validation of data. That the profile contributes to such validation goal by providing specifications for validation rules, expressed in various (computer or human-readable) languages.
The requirement then observes that this validation may happen at different levels and to different extents (this is what @jpullmann calls 'types' if I understand it right), considering that the languages to express validation rules have different scopes and expressivity levels.

Is it something we can agree on?

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 20, 2018

Careful all: a starting assumption here is that it is the Profile that is being conformed to, not a part thereof. This is so a Profile can have guidance docs etc, as well as validator files.

Given that the validator files provide the only concrete way to test conformance, if

this validation may happen at different levels and to different extents

Then either these different validators represent different profiles, or there needs to be some concrete way to declare that a representation has passed some particular validation test. It is certainly possible that validators might operate on some canonical version of the dataset or on a particular representation; from the point of view of a user, I think its most important to know how the representation a particular distribution offers has been validated. As has been pointed out above, an xml distribution can be validated via schema and schematron, whereas an RDF distribution might be validated with a SHACL script that has different expressive capabilities. Perhaps this could best be represented using the Data Quality vocabularies that allow specifying the test that was run and the test result.

@rob-metalinkage
Copy link
Contributor

@smrgeoinfo the issue of validation is no different for any other declarative statement. The semantics of conformsTo is that all constraints are met and all normative validation tests pass.

People can still lie, or configure machines to do it for them by accident...

at this stage there is no "a little bit like" or "partially conformsTo" mechanism - "aspiresTo" ?

We lack a motivating Use Case for explicit DQ statements about conformance evidence.

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 20, 2018 via email

@rob-metalinkage
Copy link
Contributor

https://www.w3.org/TR/dcat-ucr/#ID21 doesnt seem to have these matters called out, and they dont appear to be readily found in the linked white paper http://usgin.github.io/usginspecs/ContentModelForLinks.htm

@smrgeoinfo
Copy link
Contributor

Sorry if the use scenarios in the white paper aren't spelled out in detail--they leave a bit to the imagination. The gist of ID21 is there needs to be something in the metadata so a client can parse the distributions for the dataset and pick the one it can work with-- I thought that's one of the possible purposes for a conformsTo profile declaration (correct me if I'm wrong and I'll shut up, don't want to bark up the wrong tree). At the level of making stuff work, it includes the information model and serialization scheme, ideally the vocabularies used. In the XML world conveyed by asserting that the representation validates according to some schema and maybe schematron rules. In other serialization schemes it might be JSON schema or some SHACL, but they don't tell me my xml app is going to work.
Telling a client that a metadata catalog provides ISO19115 metadata (a dataset/information model level declaration; conformsTo could point the to ISOspec) only gets one a short step towards interoperability; what you really need to know is that metadata representations according to e.g. the Energistics profile, or the the INSPIRE profile or the ANZLIC profile are available. (metadata examples used here with hopes that more people are familiar than with the GeoSciML/OGC services world). Each of those profiles might have different validation tests; other profiles might not have executable validation tests (conformance by inspection and trust).

@rob-metalinkage
Copy link
Contributor

The sense is correct - its declarative, but there is still no explicit statement about validation test history. I think its a great use case for DQ that perhaps ought to have a section in the guidance document - and in the DCAT with a suggestion for how to attach such info.

@aisaac
Copy link
Contributor

aisaac commented Nov 25, 2018

@smrgeoinfo the more you discuss about this, the more I think your concern is about something related but not equal to this requirement.
I can see indeed a need to represent that a certain validation test to check the conformance of data to the profile was done using a specific distribution (XMLSchema, SHACL).
But that's quite different from the requirement here, where what just said is that there should be some representation that allows some validation according to the profile.
Would it be possible for you to create a new candidate requirement via a github issue?
Note that it is possible to beef up your case if you think it's currently captured at too much abstract a level. We've done this for other use cases, some month ago.
I'm concerned that if you insist on piggybacking on the requirement here, we'll end up having having a discussion that is too complex to handle for the starting point, and which will be forgotten also for what you're interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants