Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a section to deal with quality and started some guidance for r… #245

Merged
merged 19 commits into from Jun 19, 2018

Conversation

@riccardoAlbertoni
Copy link
Collaborator

riccardoAlbertoni commented May 30, 2018

Added a section "Quality information" in the DCAT document to deal with quality providing guidance for reusing DQV.
The pull is an attempt to deal with #57, but depending on the group opinion I would consider to extend the guidance including some examples of use and help with #58.

@riccardoAlbertoni riccardoAlbertoni changed the title Added as section to deal with quality and started some guidance for r… Added a section to deal with quality and started some guidance for r… Jun 5, 2018
@riccardoAlbertoni riccardoAlbertoni requested review from aisaac, makxdekkers and andrea-perego and removed request for aisaac, makxdekkers and andrea-perego Jun 5, 2018
@dr-shorthair

This comment has been minimized.

Copy link
Contributor

dr-shorthair commented Jun 5, 2018

Thanks @riccardoAlbertoni

The examples you have developed all appear to attach quality information to DCAT resources with information that is external to the DCAT resource itself - with the URI for the dataset description as the object of an axiom. So this information would not actually be in the dcat:Catalog.

Can you provide an example with quality information encoded in a way that would allow it to be included in a dcat:Dataset? i.e. with the individual dataset description URI as the subject of a triple?

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

dr-shorthair commented Jun 5, 2018

@riccardoAlbertoni

This comment has been minimized.

Copy link
Collaborator Author

riccardoAlbertoni commented Jun 6, 2018

The examples you have developed all appear to attach quality information to DCAT resources with information that is external to the DCAT resource itself - with the URI for the dataset description as the object of an axiom. So this information would not actually be in the dcat:Catalog.

Dear @dr-shorthair,
I am not sure to fully understand your remark, which seems to propose an additional requirement related to the DCAT self-containment, not explicit in the issue #57. Anyway, I sense that it implies more than one desiderata, which I am listing below

  1. to have dcat element not only as an object of axioms;
  2. to have quality statement collected into a single container X;
  3. To have X expressable as a native DCAT element.

As to 1, the dcat:Dataset/Distribution are connect to Measurements or Annotations through the properties dqv:hasQualityMeasurement and dqv:hasQualityAnnotation.

As to 2, it is possible to collect all the kind of quality information into :myQualityMetadata. :myQualityMetadata is an instance of dqv:QualityMetadata and collects all into the same graph, or in the same Turtle.
We can related dcat:dataset/distribution to myQualityMetadata saying

dcat:busStopInGenoa dqv:hasQualityMetadata :myQualityMetadata

For example, assuming :myQualityMetadata is serialized in TRix, we can write the following

:myQualityMetadata a dqv:QualityMetadata. 

GRAPH :myQualityMetadata {
    
    :busStopInGenoa a dcat:Dataset ;
        dqv:hasQualityAnnotation :qualityNote .

    :qualityNote  
        a dqv:UserQualityFeedback ;
        oa:hasTarget :busStopInGenoa ;
        oa:hasBody :textBody ;
        oa:motivatedBy dqv:qualityAssessment ;
        prov:wasAttributedTo :consumer1 ;
        prov:generatedAtTime "2018-05-27T02:52:02Z"^^xsd:dateTime ;
        dqv:inDimension ldqd:completeness
        .

    :textBody a oa:TextualBody ;
        rdf:value "Incomplete dataset: it contains only 20500 out of 30000 existing bus stops" ;
        dc:language "en" ; 
        dc:format "text/plain" 
        . 
    
    :busStopInGenoa
        dqv:hasQualityMeasurement :myMeasurement .

    :myMeasurement
        a dqv:QualityMeasurement ;
        dqv:computedOn :busStopInGenoa ;
        dqv:isMeasurementOf :completenessWRTExpectedNumberOfEntities ;
        dqv:value "0.6833333"^^xsd:decimal  ;
        prov:wasAttributedTo :myQualityChecker ;
        prov:generatedAtTime "2018-05-27T02:52:02Z"^^xsd:dateTime ;
        prov:wasGeneratedBy :myQualityChecking   
        .

 :completenessWRTExpectedNumberOfEntities 
        a dqv:Metric ;
        skos:definition "it returns the degree of completeness as ratio between the actual number of entities included in the dataset and the declared expected number of entities."@en ;
        dqv:expectedDataType xsd:decimal ;
        dqv:inDimension ldqd:completeness .
}

As to 3, I do not see any reason why we cannot define a dcat:Distribution or a dcat:Dataset for cataloguing the data quality information serialized in:myQualityMetadata.

We have already dqv:hasQualityMetadata conneting dcat:Dataset/Distribution to dqv:QualityMetadata
Another issue is whether or not we want more explicit ways to say this dcat:Dataset Y contains the quality data of dcat:Dataset X or to say Y has been derived by X in a quality assessment activity.
I tend to consider this a separate issue which might be influenced by the solutions chosen in to solve the Qualified forms [RQF] #79, Provenance information [RPIF] #76, and the dcat core element restructuration.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

dr-shorthair commented Jun 7, 2018

OK - I understand now.

  1. the examples show how quality information can be associated with a dcat"Dataset using the dqv:hasQualityAnnotation and dqv:hasQualityMeasurment properties. The 'model' is complete.
  2. the illustration makes no comment on where the quality information would reside, how it is managed etc. That is out of scope for the DCAT vocabulary, as supplemented by DQV. We can just assume that the quality individuals are available using the URIs indicated.

However, implementers of catalogs probably need some commentary on possible implementation options, from both a technical and governance point of view. Technically, the quality statements might be local (in the same datastore or catalog) or they might be remote (by de-referencing the URI). From a governance point of view, statements provided by the dataset (or dataservice) custodian might be part of the resource description, in the catalogue. Quality statements from a third party would more likely be persisted in another service, maybe entered through an annotation portal ...

I suggest a few sentences - clearly labelled <informative> - be added to the narrative to provide pointers (but not recommendations) to the possible implementation options.

@riccardoAlbertoni

This comment has been minimized.

Copy link
Collaborator Author

riccardoAlbertoni commented Jun 8, 2018

@dr-shorthair I have updated my push considering the discussion we did in the last DCAT call. Please feel free to rephrase whatever you want. Would you fancy an extra issue about "DCAT self-containment"?

Copy link
Contributor

aisaac left a comment

I think it’s a very good proposal. Thanks for this @riccardoAlbertoni!

Some editorial suggestions to make a couple of things clearer (though a bit longer!)

  • the dataset about bus stopS in Genoa could be called genoaBusStopsDataset
  • the quality note could be genoaBusStopsDatasetCompletenessNote

I also see the point in the discussion between you and @dr-shorthair . This point was worth addressing, and I think you've addressed it well.

@riccardoAlbertoni

This comment has been minimized.

Copy link
Collaborator Author

riccardoAlbertoni commented Jun 14, 2018

I have added a note explaining why section 7 is not normative and that DQV is a w3c group note. Please merge this pull.

@agbeltran

This comment has been minimized.

Copy link
Member

agbeltran commented Jun 14, 2018

Hi @riccardoAlbertoni, it seems you still need to push the commit(s) about the note into this branch?

In
https://github.com/w3c/dxwg/pull/245/commits
I see the latest commit from 30th May
(and don't see the update in the document either).

@riccardoAlbertoni

This comment has been minimized.

Copy link
Collaborator Author

riccardoAlbertoni commented Jun 14, 2018

hi @agbeltran,
This is quite peculiar, Could you double check? I already have pushed everything, actually, if I click on https://github.com/w3c/dxwg/pull/245/commits I can see all my 18 commits.

So I am not sure to understand why you can't see them.
If it is ok for you I can merge my pushes on my own.

@agbeltran agbeltran requested a review from w3c/dxwg-editors Jun 14, 2018
@agbeltran

This comment has been minimized.

Copy link
Member

agbeltran commented Jun 14, 2018

Sorry, I was expecting to see a green NOTE box, as discussed on the call, which I've now added.

@agbeltran agbeltran merged commit c865c9b into gh-pages Jun 19, 2018
@agbeltran agbeltran deleted the dcatQualityRelatedInformationIssue57Riccardo branch Jun 19, 2018
@dr-shorthair dr-shorthair removed this from the Description of quality in DCAT milestone Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.