A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

nicholascar · 2018-06-27T03:51:16Z

Entered from Google Doc

jpullmann · 2018-11-15T10:37:18Z

If this requirement is concerned with validation, I'd merge with implementation/definition - there are other ways to implement a validating profile then schema languages. The latter seem covered by #279.
What about: Profiles should allow for different types of data validation (structure, values, conventions).

smrgeoinfo · 2018-11-15T16:39:39Z

This seems like a statement related to the modular specification approach, where in a conformance class defines a subset of requirements and tests to determine (validate) that they are met by instance documents. Can ID37 be met by pointing out that profiles (or any standard) can define one or more conformance classes indicating various levels or types of (partial) conformance.

kcoyle · 2018-11-15T19:15:35Z

@smrgeoinfo I don't actually see the modularity here. The Guidance for Profile Publication document needs to say that there may be more than one file or resource that describes a profile, with different roles or functionality. Current DCAT profiles are often a PDF (for humans) plus a SHACL file (for validation). This requirement recognizes that situation.

smrgeoinfo · 2018-11-15T22:01:51Z

I was focusing on the idea of 'partial implementation' and 'different levels of data validation'. The SHACL file, or an XSD, or maybe some text in the PDF define conformance classes. There may be guidance documents with different functionality; if different guidance documents provide different validation rules (whether testable with tools, or stated in text), they define different conformance classes. In that case a profile should be able to point at the specific guidance it is following. But perhaps I'm misunderstanding the issue.

kcoyle · 2018-11-15T22:50:38Z

My gut is that if there are different validation rules or different guidance rules then you have a different profile. That is definitely how it would be judged in the library/archive world.

What gets tricky is the whole "equivalence" issue - and the need for a "center". If you have a profile with a SHACL validation file and a separately developed ShEx validation file, what is the requirement for equivalence? If you have a profile with a PDF document and a SHACL file, which one is the "center"? That is, which one provides the authoritative set of rules for the profile? Which one would you use to develop yet another validation file?

smrgeoinfo · 2018-11-15T23:59:10Z

Good questions! Perhaps the only solution is for the provider to declare the authoritative validation rules (the 'center'); as far as DCAT, there would need to be some way to identify these artifacts.

nicholascar · 2018-11-16T00:20:47Z

One could posit an additional Role for a central validator resource in the Resource Roles vocab for this purpose: https://w3c.github.io/dxwg/profilesont/resource_roles.html. I think focusing on the various Roles within a vocab of roles would be a good step forward for clarity.

aisaac · 2018-11-18T22:49:24Z

@smrgeoinfo your interpretations are interesting, but they don't apply here. The requirement here is more basic. "(partially)" was just added because we needed to accommodate cases where the "implementation" of a profile according to a schema language would fail to express all the validation rules, just because the schema language is not expressive enough. For example, XML Schema can't express everything that SHACL allows to express. And SHACL itself may fail to express what a profile wants to express. So the "partial" coverage would happen more by accident than by design.

This said what you suggest is quite certainly a valid concerns. But it should be raised as a different requirement then, because our use case 37 does not support it.

@kcoyle @smrgeoinfo @nicholascar I like the idea of a 'center' but I'd advise to not spend time on it as it's a risky notion. Taking a basic, real-world counter-example that every profile has such 'center': in the Europeana Data Model, one could see the 'center' to be the human-readable spec. But when we embarked on creating an XML Schema for our model, we realized this schema would have to impose a sequential order on the elements in the data, which we didn't want! So in some respects the XML Schema (in the line of my remark above) has failed to implement everything we wanted to implement. But on other respects it actually forced us to an implementation that has more than what an "ideal" implementation would have.

nicholascar · 2018-11-19T01:58:09Z

@aisaac: I agree that "advise to not spend time on it", hence the suggestion to create a Role for this and then leave it there. A vocab of roles, as per ISO codelists, can include some commonly used and other less commonly used codes. If some people wanted a centre, they could use the corresponding role. Others need not. This moves the discussion out of the core ontology and provides for such a possibility but doesn't require it and also ensures that if someone does call on this role, it is understood sincel it's in the list of known roles.

kcoyle · 2018-11-19T04:25:59Z

I also agree that the concept of "center" is huge and beyond our abilities at this moment and for this deliverable. It's one of those things that we can put into a note that talks about "what else needs to be done to make profiles more useful" and hope that it becomes someone's future charter.

aisaac · 2018-11-19T17:28:06Z

@nicholascar @kcoyle if it's a role, then actually I wouldn't object if it's in the core spec, as long as (1) it's not for the version being released now (and that would allow us to see what happens with W3C registration process); (2) it carefully avoids any semantic over-commitment, possibly leaving to another group the option to make it more precise. Some of the 'motivations' in Web Annotations are probably not super precisely defined, either, so I guess nobody would blame us for doing this.

aisaac · 2018-11-19T17:28:51Z

@nicholascar @kcoyle in the end, whether it's in the core or not, what would be needed is a documented case and requirement!

kcoyle · 2018-11-19T19:05:28Z

The lack of a use case could indicate that it isn't core. At the same time, I think the profiles ontology brings with it issues that we didn't create use cases for at the beginning, and maybe we should consider scoping our work based on the use cases we have. The UCR document doesn't include profiles ontology as a filter so unfortunately we would have to dig through both the profiles and conneg requirements to see what requirements there really are. It would be reasonable for someone to ask which requirements the profiles ontology responds to. @jpullmann

aisaac · 2018-11-19T21:09:29Z

@kcoyle considering the amount of requirements we currently have for profiles and conneg it won't be very hard to identify these that could be relevant for the profile ontology :-)

nicholascar · 2018-11-19T22:13:14Z

I think we can generate a cogent Use Case in time for a 2PWD easily enough. @aisaac, would you like to create an Issue for this? I'm interested to assist on this one as I want to test out thr thoughts here.

aisaac · 2018-11-19T22:24:19Z

@nicholascar ok I've created #597 thanks for the suggestion.

I think now we can close the parenthesis and come back to @jpullmann 's original comment:

If this requirement is concerned with validation, I'd merge with implementation/definition - there are other ways to implement a validating profile then schema languages. The latter seem covered by #279. What about: Profiles should allow for different types of data validation (structure, values, conventions).

kcoyle · 2018-11-19T23:09:25Z

I didn't think this was about TYPES of validation, just that validation schemas themselves can be considered aspects/objects of a profile. And it recognizes that a profile may provide more than one schema, and the schemas may not necessarily cover the same ground. I definitely would not use "should" here because we are not requiring data validation.

smrgeoinfo · 2018-11-19T23:15:11Z

If a "profile may provide more than one schema, and the schemas may not necessarily cover the same ground", these various validation procedures must be clearly identifiable such that e.g. a dcat:distribution can point to the identifier for the profile validation that has been run on the dataset in the dct:conformsTo property.

kcoyle · 2018-11-19T23:34:26Z

@smrgeoinfo I agree but I'm not sure that what we're working on will solve that - I don't know of any vocabulary that would tell you, of the various documents, vocabularies, validation files in a profile exactly WHAT they represent in terms of content. The only useful file relationship that I'm aware of is the dct:hasFormat that I suggested in this comment, which basically means "same exact data, different format" and which often means that the data is a different output format from the same software. My gut feeling is that if you have any validation files that are not output from the same process you cannot guarantee that they contain exactly the same data unless you run a test suite that compares outputs from the same instance data.

BTW, there is such a test suite applied to SHACL and ShEx, and there are a very small number of differences. There also is a way to translate between SHACL and ShEx, AFAIK. But if you develop the validation files independently, I wouldn't place a bet on the results being the same.

smrgeoinfo · 2018-11-19T23:47:15Z

Seems like what is needed is an identifier for a resourceDescriptor (whatever that ends up being named) that you can use as the value for dct:conformsTo.

nicholascar · 2018-11-20T00:37:08Z

Careful all: a starting assumption here is that it is the Profile that is being conformed to, not a part thereof. This is so a Profile can have guidance docs etc, as well as validator files. If we ID the validator file and people conformsTo to that, I think we've breached a concpetual contract like identifying a Dataset by one of its Distribution URIs.

Unless we work out specialised axiomes to chain a specialised conformsTo to a multi-step thing (X conformsTo Y, Y part of some Profile Z with some Role A).

kcoyle · 2018-11-20T01:34:38Z

@nicholascar Remember that anyone can say anything about anything (and they probably will). I hope that profile resources/objects will have identifiers, because if they don't they'll be blank nodes (/me making the 'go away vampire' sign). Also, I want people to be able to say things about them, like who created them, what version they are, etc.

dct:conformsTo has no domain and therefore the subject is of type rdf:Resource. If the desire is to limit conformance to profiles, such that

profileA  
      rdf:type prof:Profile ;  
      ex:conformsTo standardX .

then you'll need a prof:conformsTo that has a domain of prof:Profile.

That said, you can write rules for your required usage within your community and you can test for following those rules. But the vocabulary allow all defined uses of dct:conformsTo.

kcoyle · 2018-11-20T02:35:02Z

@nicholascar I just noticed that "dct:conformsTo" is listed in the Profiles Ontology document under 7.5 Class Resource Descriptor. Isn't that the opposite of what you say above?

aisaac · 2018-11-20T09:10:53Z

I'm not sure I understand the link between identifiers, conformsTo and the original wording of the requirement, or @jpullmann 's rewording. This goes in solution space, no?

My take is that the requirement is about expressing that a profile can serve the validation of data. That the profile contributes to such validation goal by providing specifications for validation rules, expressed in various (computer or human-readable) languages.
The requirement then observes that this validation may happen at different levels and to different extents (this is what @jpullmann calls 'types' if I understand it right), considering that the languages to express validation rules have different scopes and expressivity levels.

Is it something we can agree on?

smrgeoinfo · 2018-11-20T22:20:14Z

Careful all: a starting assumption here is that it is the Profile that is being conformed to, not a part thereof. This is so a Profile can have guidance docs etc, as well as validator files.

Given that the validator files provide the only concrete way to test conformance, if

this validation may happen at different levels and to different extents

Then either these different validators represent different profiles, or there needs to be some concrete way to declare that a representation has passed some particular validation test. It is certainly possible that validators might operate on some canonical version of the dataset or on a particular representation; from the point of view of a user, I think its most important to know how the representation a particular distribution offers has been validated. As has been pointed out above, an xml distribution can be validated via schema and schematron, whereas an RDF distribution might be validated with a SHACL script that has different expressive capabilities. Perhaps this could best be represented using the Data Quality vocabularies that allow specifying the test that was run and the test result.

rob-metalinkage · 2018-11-20T22:42:57Z

@smrgeoinfo the issue of validation is no different for any other declarative statement. The semantics of conformsTo is that all constraints are met and all normative validation tests pass.

People can still lie, or configure machines to do it for them by accident...

at this stage there is no "a little bit like" or "partially conformsTo" mechanism - "aspiresTo" ?

We lack a motivating Use Case for explicit DQ statements about conformance evidence.

smrgeoinfo · 2018-11-20T22:43:52Z

How about ID21?

rob-metalinkage · 2018-11-20T22:53:37Z

https://www.w3.org/TR/dcat-ucr/#ID21 doesnt seem to have these matters called out, and they dont appear to be readily found in the linked white paper http://usgin.github.io/usginspecs/ContentModelForLinks.htm

smrgeoinfo · 2018-11-20T23:20:18Z

Sorry if the use scenarios in the white paper aren't spelled out in detail--they leave a bit to the imagination. The gist of ID21 is there needs to be something in the metadata so a client can parse the distributions for the dataset and pick the one it can work with-- I thought that's one of the possible purposes for a conformsTo profile declaration (correct me if I'm wrong and I'll shut up, don't want to bark up the wrong tree). At the level of making stuff work, it includes the information model and serialization scheme, ideally the vocabularies used. In the XML world conveyed by asserting that the representation validates according to some schema and maybe schematron rules. In other serialization schemes it might be JSON schema or some SHACL, but they don't tell me my xml app is going to work.
Telling a client that a metadata catalog provides ISO19115 metadata (a dataset/information model level declaration; conformsTo could point the to ISOspec) only gets one a short step towards interoperability; what you really need to know is that metadata representations according to e.g. the Energistics profile, or the the INSPIRE profile or the ANZLIC profile are available. (metadata examples used here with hopes that more people are familiar than with the GeoSciML/OGC services world). Each of those profiles might have different validation tests; other profiles might not have executable validation tests (conformance by inspection and trust).

rob-metalinkage · 2018-11-21T02:38:30Z

The sense is correct - its declarative, but there is still no explicit statement about validation test history. I think its a great use case for DQ that perhaps ought to have a section in the guidance document - and in the DCAT with a suggestion for how to attach such info.

aisaac · 2018-11-25T20:28:52Z

@smrgeoinfo the more you discuss about this, the more I think your concern is about something related but not equal to this requirement.
I can see indeed a need to represent that a certain validation test to check the conformance of data to the profile was done using a specific distribution (XMLSchema, SHACL).
But that's quite different from the requirement here, where what just said is that there should be some representation that allows some validation according to the profile.
Would it be possible for you to create a new candidate requirement via a github issue?
Note that it is possible to beef up your case if you think it's currently captured at too much abstract a level. We've done this for other use cases, some month ago.
I'm concerned that if you insist on piggybacking on the requirement here, we'll end up having having a discussion that is too complex to handle for the starting point, and which will be forgotten also for what you're interested in.

nicholascar added requirement profile-guidance requires discussion Issue to be discussed in a telecon (group or plenary) labels Jun 27, 2018

aisaac added plenary-approved and removed requires discussion Issue to be discussed in a telecon (group or plenary) labels Nov 12, 2018

nicholascar mentioned this issue Nov 23, 2018

Explore whether list_profiles operation could be provided by an external component #602

Closed

aisaac mentioned this issue Feb 25, 2020

Create a use case and requirement for "central" authoritative validation rules w3c/dx-prof#7

Open

andrea-perego added this to To do in Profile Guidance Mar 11, 2021

andrea-perego added this to the Profile Guidance Document - FPWD milestone Mar 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

nicholascar commented Jun 27, 2018

jpullmann commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

kcoyle commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

kcoyle commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

nicholascar commented Nov 16, 2018

aisaac commented Nov 18, 2018 •

edited

Loading

nicholascar commented Nov 19, 2018

kcoyle commented Nov 19, 2018

aisaac commented Nov 19, 2018

aisaac commented Nov 19, 2018

kcoyle commented Nov 19, 2018

aisaac commented Nov 19, 2018

nicholascar commented Nov 19, 2018

aisaac commented Nov 19, 2018

kcoyle commented Nov 19, 2018

smrgeoinfo commented Nov 19, 2018

kcoyle commented Nov 19, 2018

smrgeoinfo commented Nov 19, 2018

nicholascar commented Nov 20, 2018

kcoyle commented Nov 20, 2018

kcoyle commented Nov 20, 2018

aisaac commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018 •

edited

Loading

rob-metalinkage commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018 via email •

edited

Loading

rob-metalinkage commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018

rob-metalinkage commented Nov 21, 2018

aisaac commented Nov 25, 2018

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

A profile may be (partially) "implemented" by "schemas" (in OWL, SHACL, XML Schema...) that allow different levels of data validation [ID37] (5.37) #273

Comments

nicholascar commented Jun 27, 2018

jpullmann commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

kcoyle commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

kcoyle commented Nov 15, 2018

smrgeoinfo commented Nov 15, 2018

nicholascar commented Nov 16, 2018

aisaac commented Nov 18, 2018 • edited Loading

nicholascar commented Nov 19, 2018

kcoyle commented Nov 19, 2018

aisaac commented Nov 19, 2018

aisaac commented Nov 19, 2018

kcoyle commented Nov 19, 2018

aisaac commented Nov 19, 2018

nicholascar commented Nov 19, 2018

aisaac commented Nov 19, 2018

kcoyle commented Nov 19, 2018

smrgeoinfo commented Nov 19, 2018

kcoyle commented Nov 19, 2018

smrgeoinfo commented Nov 19, 2018

nicholascar commented Nov 20, 2018

kcoyle commented Nov 20, 2018

kcoyle commented Nov 20, 2018

aisaac commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018 • edited Loading

rob-metalinkage commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018 via email • edited Loading

rob-metalinkage commented Nov 20, 2018

smrgeoinfo commented Nov 20, 2018

rob-metalinkage commented Nov 21, 2018

aisaac commented Nov 25, 2018

aisaac commented Nov 18, 2018 •

edited

Loading

smrgeoinfo commented Nov 20, 2018 •

edited

Loading

smrgeoinfo commented Nov 20, 2018 via email •

edited

Loading