Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data quality model [RDQM] #58

Closed
jpullmann opened this issue Jan 18, 2018 · 22 comments
Closed

Data quality model [RDQM] #58

jpullmann opened this issue Jan 18, 2018 · 22 comments

Comments

@jpullmann
Copy link

Data quality model [RDQM]

Identify common modeling patterns for different aspects of data quality based on frequently referenced data quality attributes found in existing standards and practices.

This includes potential use and revision of DQV

Aspects include:

  • the degree of a dataset's precision (i.e. measure of resolution or variability).
  • the degree of a dataset's accuracy (i.e. measure of correctness).
  • the degree a dataset conforms to a stated quality standard.
  • details of data quality conformance test results.

  • Related use cases: Modeling data precision and accuracy [ID15] Data quality modeling patterns [ID14] Modeling conformance test results on data quality [ID16] Machine actionable link for a mapping client [ID21] Template link in metadata [ID22] Data Quality Vocabulary (DQV) Wish List left by the DWBP WG [ID23] 
    @riccardoAlbertoni
    Copy link
    Contributor

    Some remarks to start discussing this requirement.

    Remark 1:
    The Data Quality Vocabulary (DQV)[1] already offers common modelling patterns for different aspects of Data Quality.

    For those who are not familiar with DQV, it relates DCAT datasets and distributions with different types of quality statements including

    • dqv:QualityAnnotation, which represents feedback and quality certificates given about the dataset or its distribution.
    • dcterms:Standard, which represents a standard the dataset or its distribution conforms to.
    • dqv:QualityPolicy, which represents a policy or agreement that is chiefly governed by data quality concerns.
    • dqv:QualityMeasurement which represents a metric value providing quantitative or qualitative information about the dataset or distribution.

    Each type of quality statement can be related to one or more quality dimensions, namely, quality characteristics relevant to the consumer. The practice to see the quality as a multi-dimensional space is consolidated in the field of quality management, it is a sort of pattern which helps in splitting the quality management into addressable chunks.
    DQV does not define a normative list of quality dimensions. Starting from use cases included in the Use Cases & Requirements document [2], DQV offers the quality dimensions proposed in ISO 25012 [3] and Zaveri et al. [4] as two possible starting points. Ultimately, implementers will need to choose themselves the collection of quality dimensions that best fits their needs.

    Remark 2:
    As far as I understand, all the four aspects included in this requirement can be somehow managed by building upon the DQV.

    The example “Express dataset precision and accuracy“ [5] shows how to model the degree of a dataset’s precision and accuracy, which are the two first aspects mentioned;

    Regarding the third and the fourth aspect, perhaps we need to discuss a little more.
    For example, they could be expressed as quality measurement by defining a proper metrics (e.g., RDFUnit suggests to document Test Execution combining the "Test-Driven Data Validation Ontology" [6] with DQV).

    Probably to advance the discussion, we need to come up with new examples, which might later be included in the DCAT document or its primer (if the group decide to have a primer for DCAT).
    Might this "example-driven" strategy work for the group?

    Am I misunderstanding the ultimate goal of this requirement?

    [1] R. Albertoni, A. Isaac, J. Debattista, M. Dekkers
    C. Guéret, D. Lee, N. Mihindukulasooriya, A. Zaveri “Data on the Web Best Practices: Data Quality Vocabulary,” W3C Group Note. Dec-2016, https://www.w3.org/TR/vocab-dqv/
    [2] B. F. Loscio, D. Lee, and P. Archer, “Data on the Web Best Practices Use Cases & Requirements,” W3C Group Note. Feb-2015http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/.
    [3] A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. S. Auer, “Quality Assessment for Linked Data: A Survey,” Semant. Web J., vol. 1, no. 7, pp. 63–93, 2016.
    [4] http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
    [5] https://www.w3.org/TR/vocab-dqv/#ExpressDatasetAccuracyPrecision
    [6] http://rdfunit.aksw.org/ns/core#TestExecution

    @riccardoAlbertoni
    Copy link
    Contributor

    riccardoAlbertoni commented Jul 4, 2018

    I have added some new examples on how to deal with
    the intertwined sub-requirements:

    • the degree a dataset conforms to a stated quality standard;
    • details of data quality conformance test results.

    Please consider to suggest corrections or further examples based on W3C vocabularies.

    @dr-shorthair
    Copy link
    Contributor

    @riccardoAlbertoni - Could you roll these into the main document and create a new PR?

    @riccardoAlbertoni
    Copy link
    Contributor

    riccardoAlbertoni commented Jul 10, 2018

    @riccardoAlbertoni - Could you roll these into the main document and create a new PR?

    Yes, sure I will elaborate a proposal.

    @riccardoAlbertoni riccardoAlbertoni self-assigned this Jul 10, 2018
    riccardoAlbertoni added a commit to riccardoAlbertoni/dxwg that referenced this issue Jul 23, 2018
    …section about dataset accurancy and precision
    @dr-shorthair dr-shorthair removed this from the Description of quality in DCAT milestone Aug 21, 2018
    davebrowning added a commit that referenced this issue Sep 7, 2018
    Dcat issue #58 riccardo - as per discussion yesterday, (https://www.w3.org/2018/09/06-dxwgdcat-minutes) this contribution moves the story forward, and may stimulate some broader conversation in the next WD
    @davebrowning
    Copy link
    Contributor

    Text added to draft for 2PWD in the hope of eliciting input and comment.

    @davebrowning
    Copy link
    Contributor

    As per the discussion at the DCAT weekly meeting here, the examples incorporated in the draft here provide some guidance and discussion and have been publicly available in the working draft since October. There is however a potential need for more examples/guidance work to be done, which while potentially desirable aren't going to be achievable for the next PWD and may not even end up in the REC track deliverable. Rather than close this issue, we agreed to move it to a later milestone where it can be reviewed against other commitments.

    @andrea-perego
    Copy link
    Contributor

    Thanks for the great work, @riccardoAlbertoni !

    I would however suggest a revision to the examples concerning INSPIRE. In INSPIRE, the levels of conformance are only 3: conformant, not conformant, not evaluated. Some of the examples go further by showing how to express "how much" (e.g., in percentage) a resource is conformant with INSPIRE.

    This may lead to misunderstandings and confusion. I strongly recommend revising these examples before going for PWD, referring to another reference specification / standard. A possible option is to refer to the FAIR principles.

    I can take care of preparing this revision.

    @riccardoAlbertoni
    Copy link
    Contributor

    Thanks for your comment @andrea-perego

    I like your proposal of referring to FAIR Principles and I am willing to contribute to actuate your proposal.

    Are you planning to switch all the examples?
    Do you want to refer to the FairMetrics Metrics or what?

    @davebrowning
    Copy link
    Contributor

    Since we have a holiday coming up, I'll move this back into Third Public Dradft milestone, so we don't forget to make the changes suggested by @andrea-perego

    @andrea-perego
    Copy link
    Contributor

    andrea-perego commented Jan 9, 2019

    @riccardoAlbertoni wrote:

    Thanks for your comment @andrea-perego

    I like your proposal of referring to FAIR Principles and I am willing to contribute to actuate your proposal.

    Thanks, @riccardoAlbertoni . Happy to work this out together!

    Are you planning to switch all the examples?

    I was thinking to revise just the examples expressing conformance not compliant with INSPIRE.

    Do you want to refer to the FairMetrics Metrics or what?

    Good idea to use FAIRmetrics. And it would be nice to have URIs for each of the principles / metrics.

    @agbeltran , do you have any suggestion?

    @andrea-perego
    Copy link
    Contributor

    @riccardoAlbertoni , what about if we add also examples using the DWBPs (and, maybe, the SDWBPs as well) ? BTW, many of the FAIR metrics can be mapped to DWBPs...

    riccardoAlbertoni added a commit to riccardoAlbertoni/dxwg that referenced this issue Jan 10, 2019
    @riccardoAlbertoni
    Copy link
    Contributor

    I have just uploaded a new branch ChangingConformanceExamplesToDWBP-riccardo in which I turned the examples about the degree of conformance.

    I used the DWBP instead of INSPIRE.
    @andrea-perego Could you take a look? It is a quick tweak as reading your previous comments I perceived that there was an urge of changing the examples. I hope the changes go in the right direction. If you want to chime in changing things be my guest ;)

    The option of having examples with FAIR metrics is still open and is extremely interesting to me. For the moment, I used the DWBP to avoid confusion and criticalities as I have noticed that the FAIR community is offering a technological layer (YAML using a smartAPI interface annotation) for the metrics .... I suspect the presence of this layer might affect our examples with details that would be extremely interesting for a new paper but less for such a non-normative section in the DCAT ... @agbeltran what do you think in this regard?

    @andrea-perego
    Copy link
    Contributor

    Thanks, @riccardoAlbertoni . Just checked, and a big +1 from me.

    If I may, I would just propose some editorial changes via a separate PR. One of those I would like to propose (but this applies to the whole DCAT spec) is to make all the code snippets as numbered examples, so that they can be linked to. I'll take care of that with a separate issue / PR.

    About FAIR metrics, I also saw that. IMO, it will still be useful to include an example about FAIR. My concern is that we should possibly be aligned with what is returned by the FAIR service you refer to. I tried it yesterday, but it was not working, so I was not able to check how they specify conformance test results (in case they do).

    @andrea-perego
    Copy link
    Contributor

    @riccardoAlbertoni , I've just created the relevant PR: #652

    Please check if you're happy with it.

    Thanks!

    @aisaac
    Copy link
    Contributor

    aisaac commented Jan 11, 2019

    Hi it's a bit of a late review, but I finally could have a quick look at it. It's overall very good, a great answer to requirements. I've made some editorial suggestions.
    I've not reviewed all yet but I'd have already some questions about the patterns for tests, to @riccardoAlbertoni and @andrea-perego

    • does PROV need such a complex pattern to represent test results , i.e. with the prov:qualifiedAssociation? can't prov:agent and prov:hadPlan be attached directly to a:TestingActivity
    • is a:notConformant from the INSPIRE vocabulary really expected to be used as a dcterms:type of a test result?

    A final editorial comment: in general the convention for instance names is to start them with lower case.

    @andrea-perego
    Copy link
    Contributor

    @aisaac wrote:

    • does PROV need such a complex pattern to represent test results , i.e. with the prov:qualifiedAssociation? can't prov:agent and prov:hadPlan be attached directly to a:TestingActivity
    • is a:notConformant from the INSPIRE vocabulary really expected to be used as a dcterms:type of a test result?

    The example is taken from UC16, which reflects how this is done in GeoDCAT-AP - see:

    https://semiceu.github.io/GeoDCAT-AP/drafts/latest/#conformity-and-data-quality---not-in-iso19115-core

    This pattern is indeed complex, but this is what the PROV WG suggested - see email thread starting here:

    https://lists.w3.org/Archives/Public/public-prov-comments/2015May/0001.html

    @riccardoAlbertoni
    Copy link
    Contributor

    Thanks @andrea-perego and @aisaac for the feedback and the contributions they are now merged in #654

    @aisaac wrote:

    A final editorial comment: in general the convention for instance names is to start them with lower case.

    I have corrected the instance names, now they should stick to the convention.

    @aisaac
    Copy link
    Contributor

    aisaac commented Jan 13, 2019

    @andrea-perego ok I get it. I guess I'm frustrated by the fact that QualifiedAssociation is useful especially for roles of agents (in a typical role or even n-ary relationship pattern) and this is a bit missing here.
    The alternative that I saw was that the plan (conformance test) could just as well been attached to the testing activity (i.e. one could have expressed that a:conformanceTest guides a:testingActivity directly, instead of saying that it guides the agent that did the test). This (and at the same time directly attaching the agent <http://validator.example.org/> to a:testingActivity) would have spared one level of description.
    But well maybe PROV says that prov:agent and prov:hadPlan can't be used in such a pattern. And anyway I
    won't question what you have already done elsewhere. It's just that as the complex pattern didn't (and still doesn't) seem needed to me, I wanted to be sure. Not I'll stay silent for a while :-)

    Maybe I'll have some comments on 8.2.3 but I won't have time to read it carefully quickly, and anyway I think it's a good contribution to the draft so I'm not going to critique it now.

    I am going to try to react to a couple of other issues, but it seems they're discussed in other places. Maybe PR #654

    @andrea-perego
    Copy link
    Contributor

    Thanks, @aisaac . No problem for me to look into an alternative approach.

    Just to better understand your point, would you mind providing the same example revised according to your proposal?

    @aisaac
    Copy link
    Contributor

    aisaac commented Jan 15, 2019

    @andrea-perego is it really worth it? If you've already made choices in your own specs for quite a while, I do not want to cause trouble and going back to work on established recommendations. At the moment I have not yet proved that the pattern doesn't work. It's quite complex, but that's not a crucial issue, and from your answer I see you had already taken it into account. So unless you are really eager to work on this now, I would rather wait until a next WD to see if the issue deserves being raised again.

    @andrea-perego
    Copy link
    Contributor

    @aisaac wrote:

    [...] So unless you are really eager to work on this now, I would rather wait until a next WD to see if the issue deserves being raised again.

    +1 from me.

    @davebrowning
    Copy link
    Contributor

    Above discussion reflected and actioned in #654.

    @davebrowning davebrowning added the due for closing Issue that is going to be closed if there are no objection within 6 days label Jan 22, 2019
    @davebrowning davebrowning removed the due for closing Issue that is going to be closed if there are no objection within 6 days label Jan 24, 2019
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants