Data quality model [RDQM] #58

jpullmann · 2018-01-18T21:11:52Z

Data quality model [RDQM]

Identify common modeling patterns for different aspects of data quality based on frequently referenced data quality attributes found in existing standards and practices.

This includes potential use and revision of DQV

Aspects include:

the degree of a dataset's precision (i.e. measure of resolution or variability).

the degree of a dataset's accuracy (i.e. measure of correctness).

the degree a dataset conforms to a stated quality standard.

details of data quality conformance test results.

Related use cases: Modeling data precision and accuracy [ID15] Data quality modeling patterns [ID14] Modeling conformance test results on data quality [ID16] Machine actionable link for a mapping client [ID21] Template link in metadata [ID22] Data Quality Vocabulary (DQV) Wish List left by the DWBP WG [ID23]

riccardoAlbertoni · 2018-01-24T14:12:49Z

Some remarks to start discussing this requirement.

Remark 1:
The Data Quality Vocabulary (DQV)[1] already offers common modelling patterns for different aspects of Data Quality.

For those who are not familiar with DQV, it relates DCAT datasets and distributions with different types of quality statements including

dqv:QualityAnnotation, which represents feedback and quality certificates given about the dataset or its distribution.
dcterms:Standard, which represents a standard the dataset or its distribution conforms to.
dqv:QualityPolicy, which represents a policy or agreement that is chiefly governed by data quality concerns.
dqv:QualityMeasurement which represents a metric value providing quantitative or qualitative information about the dataset or distribution.

Each type of quality statement can be related to one or more quality dimensions, namely, quality characteristics relevant to the consumer. The practice to see the quality as a multi-dimensional space is consolidated in the field of quality management, it is a sort of pattern which helps in splitting the quality management into addressable chunks.
DQV does not define a normative list of quality dimensions. Starting from use cases included in the Use Cases & Requirements document [2], DQV offers the quality dimensions proposed in ISO 25012 [3] and Zaveri et al. [4] as two possible starting points. Ultimately, implementers will need to choose themselves the collection of quality dimensions that best fits their needs.

Remark 2:
As far as I understand, all the four aspects included in this requirement can be somehow managed by building upon the DQV.

The example “Express dataset precision and accuracy“ [5] shows how to model the degree of a dataset’s precision and accuracy, which are the two first aspects mentioned;

Regarding the third and the fourth aspect, perhaps we need to discuss a little more.
For example, they could be expressed as quality measurement by defining a proper metrics (e.g., RDFUnit suggests to document Test Execution combining the "Test-Driven Data Validation Ontology" [6] with DQV).

Probably to advance the discussion, we need to come up with new examples, which might later be included in the DCAT document or its primer (if the group decide to have a primer for DCAT).
Might this "example-driven" strategy work for the group?

Am I misunderstanding the ultimate goal of this requirement?

[1] R. Albertoni, A. Isaac, J. Debattista, M. Dekkers
C. Guéret, D. Lee, N. Mihindukulasooriya, A. Zaveri “Data on the Web Best Practices: Data Quality Vocabulary,” W3C Group Note. Dec-2016, https://www.w3.org/TR/vocab-dqv/
[2] B. F. Loscio, D. Lee, and P. Archer, “Data on the Web Best Practices Use Cases & Requirements,” W3C Group Note. Feb-2015http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/.
[3] A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. S. Auer, “Quality Assessment for Linked Data: A Survey,” Semant. Web J., vol. 1, no. 7, pp. 63–93, 2016.
[4] http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
[5] https://www.w3.org/TR/vocab-dqv/#ExpressDatasetAccuracyPrecision
[6] http://rdfunit.aksw.org/ns/core#TestExecution

riccardoAlbertoni · 2018-07-04T15:27:34Z

I have added some new examples on how to deal with
the intertwined sub-requirements:

the degree a dataset conforms to a stated quality standard;
details of data quality conformance test results.

Please consider to suggest corrections or further examples based on W3C vocabularies.

dr-shorthair · 2018-07-10T00:10:24Z

@riccardoAlbertoni - Could you roll these into the main document and create a new PR?

riccardoAlbertoni · 2018-07-10T14:56:17Z

@riccardoAlbertoni - Could you roll these into the main document and create a new PR?

Yes, sure I will elaborate a proposal.

…section about dataset accurancy and precision

Dcat issue #58 riccardo - as per discussion yesterday, (https://www.w3.org/2018/09/06-dxwgdcat-minutes) this contribution moves the story forward, and may stimulate some broader conversation in the next WD

davebrowning · 2018-09-07T13:28:06Z

Text added to draft for 2PWD in the hope of eliciting input and comment.

davebrowning · 2018-12-13T15:55:58Z

As per the discussion at the DCAT weekly meeting here, the examples incorporated in the draft here provide some guidance and discussion and have been publicly available in the working draft since October. There is however a potential need for more examples/guidance work to be done, which while potentially desirable aren't going to be achievable for the next PWD and may not even end up in the REC track deliverable. Rather than close this issue, we agreed to move it to a later milestone where it can be reviewed against other commitments.

andrea-perego · 2018-12-13T22:04:43Z

Thanks for the great work, @riccardoAlbertoni !

I would however suggest a revision to the examples concerning INSPIRE. In INSPIRE, the levels of conformance are only 3: conformant, not conformant, not evaluated. Some of the examples go further by showing how to express "how much" (e.g., in percentage) a resource is conformant with INSPIRE.

This may lead to misunderstandings and confusion. I strongly recommend revising these examples before going for PWD, referring to another reference specification / standard. A possible option is to refer to the FAIR principles.

I can take care of preparing this revision.

riccardoAlbertoni · 2018-12-14T15:28:16Z

Thanks for your comment @andrea-perego

I like your proposal of referring to FAIR Principles and I am willing to contribute to actuate your proposal.

Are you planning to switch all the examples?
Do you want to refer to the FairMetrics Metrics or what?

davebrowning · 2018-12-18T15:45:49Z

Since we have a holiday coming up, I'll move this back into Third Public Dradft milestone, so we don't forget to make the changes suggested by @andrea-perego

andrea-perego · 2019-01-09T22:27:44Z

@riccardoAlbertoni wrote:

Thanks for your comment @andrea-perego

I like your proposal of referring to FAIR Principles and I am willing to contribute to actuate your proposal.

Thanks, @riccardoAlbertoni . Happy to work this out together!

Are you planning to switch all the examples?

I was thinking to revise just the examples expressing conformance not compliant with INSPIRE.

Do you want to refer to the FairMetrics Metrics or what?

Good idea to use FAIRmetrics. And it would be nice to have URIs for each of the principles / metrics.

@agbeltran , do you have any suggestion?

andrea-perego · 2019-01-09T23:24:42Z

@riccardoAlbertoni , what about if we add also examples using the DWBPs (and, maybe, the SDWBPs as well) ? BTW, many of the FAIR metrics can be mapped to DWBPs...

riccardoAlbertoni · 2019-01-10T17:12:26Z

I have just uploaded a new branch ChangingConformanceExamplesToDWBP-riccardo in which I turned the examples about the degree of conformance.

I used the DWBP instead of INSPIRE.
@andrea-perego Could you take a look? It is a quick tweak as reading your previous comments I perceived that there was an urge of changing the examples. I hope the changes go in the right direction. If you want to chime in changing things be my guest ;)

The option of having examples with FAIR metrics is still open and is extremely interesting to me. For the moment, I used the DWBP to avoid confusion and criticalities as I have noticed that the FAIR community is offering a technological layer (YAML using a smartAPI interface annotation) for the metrics .... I suspect the presence of this layer might affect our examples with details that would be extremely interesting for a new paper but less for such a non-normative section in the DCAT ... @agbeltran what do you think in this regard?

andrea-perego · 2019-01-10T21:45:09Z

Thanks, @riccardoAlbertoni . Just checked, and a big +1 from me.

If I may, I would just propose some editorial changes via a separate PR. One of those I would like to propose (but this applies to the whole DCAT spec) is to make all the code snippets as numbered examples, so that they can be linked to. I'll take care of that with a separate issue / PR.

About FAIR metrics, I also saw that. IMO, it will still be useful to include an example about FAIR. My concern is that we should possibly be aligned with what is returned by the FAIR service you refer to. I tried it yesterday, but it was not working, so I was not able to check how they specify conformance test results (in case they do).

...following-up from #58 (comment)

andrea-perego · 2019-01-10T23:39:03Z

@riccardoAlbertoni , I've just created the relevant PR: #652

Please check if you're happy with it.

Thanks!

aisaac · 2019-01-11T11:14:51Z

Hi it's a bit of a late review, but I finally could have a quick look at it. It's overall very good, a great answer to requirements. I've made some editorial suggestions.
I've not reviewed all yet but I'd have already some questions about the patterns for tests, to @riccardoAlbertoni and @andrea-perego

does PROV need such a complex pattern to represent test results , i.e. with the prov:qualifiedAssociation? can't prov:agent and prov:hadPlan be attached directly to a:TestingActivity
is a:notConformant from the INSPIRE vocabulary really expected to be used as a dcterms:type of a test result?

A final editorial comment: in general the convention for instance names is to start them with lower case.

andrea-perego · 2019-01-11T21:23:37Z

@aisaac wrote:

does PROV need such a complex pattern to represent test results , i.e. with the prov:qualifiedAssociation? can't prov:agent and prov:hadPlan be attached directly to a:TestingActivity

is a:notConformant from the INSPIRE vocabulary really expected to be used as a dcterms:type of a test result?

The example is taken from UC16, which reflects how this is done in GeoDCAT-AP - see:

https://semiceu.github.io/GeoDCAT-AP/drafts/latest/#conformity-and-data-quality---not-in-iso19115-core

This pattern is indeed complex, but this is what the PROV WG suggested - see email thread starting here:

https://lists.w3.org/Archives/Public/public-prov-comments/2015May/0001.html

riccardoAlbertoni · 2019-01-11T22:25:40Z

Thanks @andrea-perego and @aisaac for the feedback and the contributions they are now merged in #654

@aisaac wrote:

A final editorial comment: in general the convention for instance names is to start them with lower case.

I have corrected the instance names, now they should stick to the convention.

aisaac · 2019-01-13T21:42:47Z

@andrea-perego ok I get it. I guess I'm frustrated by the fact that QualifiedAssociation is useful especially for roles of agents (in a typical role or even n-ary relationship pattern) and this is a bit missing here.
The alternative that I saw was that the plan (conformance test) could just as well been attached to the testing activity (i.e. one could have expressed that a:conformanceTest guides a:testingActivity directly, instead of saying that it guides the agent that did the test). This (and at the same time directly attaching the agent <http://validator.example.org/> to a:testingActivity) would have spared one level of description.
But well maybe PROV says that prov:agent and prov:hadPlan can't be used in such a pattern. And anyway I
won't question what you have already done elsewhere. It's just that as the complex pattern didn't (and still doesn't) seem needed to me, I wanted to be sure. Not I'll stay silent for a while :-)

Maybe I'll have some comments on 8.2.3 but I won't have time to read it carefully quickly, and anyway I think it's a good contribution to the draft so I'm not going to critique it now.

I am going to try to react to a couple of other issues, but it seems they're discussed in other places. Maybe PR #654

andrea-perego · 2019-01-14T23:03:35Z

Thanks, @aisaac . No problem for me to look into an alternative approach.

Just to better understand your point, would you mind providing the same example revised according to your proposal?

aisaac · 2019-01-15T17:39:07Z

@andrea-perego is it really worth it? If you've already made choices in your own specs for quite a while, I do not want to cause trouble and going back to work on established recommendations. At the moment I have not yet proved that the pattern doesn't work. It's quite complex, but that's not a crucial issue, and from your answer I see you had already taken it into account. So unless you are really eager to work on this now, I would rather wait until a next WD to see if the issue deserves being raised again.

andrea-perego · 2019-01-15T18:32:04Z

@aisaac wrote:

[...] So unless you are really eager to work on this now, I would rather wait until a next WD to see if the issue deserves being raised again.

+1 from me.

davebrowning · 2019-01-22T15:56:33Z

Above discussion reflected and actioned in #654.

#66, #109, #61, #67, #58, #128

jpullmann added dcat distribution quality referencing requirement provenance labels Jan 18, 2018

dr-shorthair added dcat:Dataset and removed meta labels Feb 1, 2018

riccardoAlbertoni added this to the Quality Description milestone May 16, 2018

aisaac removed distribution labels May 29, 2018

riccardoAlbertoni mentioned this issue May 30, 2018

Added a section to deal with quality and started some guidance for r… #245

Merged

riccardoAlbertoni self-assigned this Jul 10, 2018

riccardoAlbertoni added a commit to riccardoAlbertoni/dxwg that referenced this issue Jul 23, 2018

moving issue w3c#58 in the quality paragraph adding reference to DQV …

f66472d

…section about dataset accurancy and precision

riccardoAlbertoni mentioned this issue Jul 23, 2018

Dcat issue #58 riccardo #309

Merged

dr-shorthair removed this from the Description of quality in DCAT milestone Aug 21, 2018

davebrowning added this to the DCAT Second Public Working Draft milestone Sep 7, 2018

riccardoAlbertoni mentioned this issue Sep 19, 2018

dcat: Moving and rephrasing the note about issue 58 #361

Merged

agbeltran removed this from the DCAT Second Public Working Draft milestone Sep 27, 2018

davebrowning modified the milestones: DCAT Third Public Working Draft, DCAT Fourth Public Working Draft Dec 13, 2018

davebrowning modified the milestones: DCAT Fourth Public Working Draft, DCAT Third Public Working Draft Dec 18, 2018

riccardoAlbertoni added a commit to riccardoAlbertoni/dxwg that referenced this issue Jan 10, 2019

DCAT - using DWBP in the degree of conformance examples w3c#58

21f1de1

andrea-perego added a commit that referenced this issue Jan 10, 2019

Editorial revision to sec "Quality Information"

17b03f7

...following-up from #58 (comment)

andrea-perego mentioned this issue Jan 10, 2019

Editorial revision to sec "Quality Information" #652

Merged

riccardoAlbertoni mentioned this issue Jan 11, 2019

Dcat issue58 changing conformance examples to dwbp riccardo #654

Merged

davebrowning added the due for closing Issue that is going to be closed if there are no objection within 6 days label Jan 22, 2019

davebrowning closed this as completed Jan 24, 2019

davebrowning removed the due for closing Issue that is going to be closed if there are no objection within 6 days label Jan 24, 2019

davebrowning added a commit that referenced this issue Jan 25, 2019

Remove formatting from closed issues

f10269e

#66, #109, #61, #67, #58, #128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data quality model [RDQM] #58

Data quality model [RDQM] #58

jpullmann commented Jan 18, 2018