How does DataFAQs play a role in vocabulary selection? Would DataFAQs be used as part of an iterative process?
Yes. And Yes.
The vocabulary that one chooses to model their domain is critically important. Although many vocabularies may adequately communicate the topic of our interests, some vocabularies have more practical value than others.
To take an example from our most recent conversion, consider two alternate RDF forms of the same tabular row:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix void: <http://rdfs.org/ns/void#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix prov: <http://www.w3.org/ns/prov-o/> . @prefix local_vocab: <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/> . @prefix e1: <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/enhancement/1/> . @prefix biographical-directory-of-the-united-states-congress: <http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/> . @prefix value_of_state: <http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/value-of/state/> . @prefix : <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/> . :congressperson_49 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ; void:inDataset <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ; a local_vocab:Congressperson , foaf:Person ; foaf:firstName "John" ; foaf:family_name "BULL" ; e1:congress biographical-directory-of-the-united-states-congress:congress_0 ; foaf:memberOf biographical-directory-of-the-united-states-congress:congress_0 ; # sic foaf:workInfoHomepage <http://bioguide.congress.gov/scripts/biodisplay.pl?index=B001047> , <http://bioguide.congress.gov/scripts/guidedisplay.pl?index=B001047> , <http://bioguide.congress.gov/scripts/bibdisplay.pl?index=B001047> ; con:preferredURI biographical-directory-of-the-united-states-congress:B001047 ; prov:specializationOf biographical-directory-of-the-united-states-congress:B001047 ; e1:doc "2012-01-04T02:12:01" ; dbpediaprop:state value_of_state:SC; . value_of_state:SC dcterms:identifier "SC" ; rdfs:label "SC" ; owl:sameAs dbpedia:South_Carolina , <http://sws.geonames.org/4597040/> , govtrackusgov:SC .
Many semantic web developers would agree that some of the modeling above is slightly better than the modeling that follows:
@prefix : <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/> . @prefix raw: <http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/raw/> . :thing_49 dcterms:isReferencedBy <http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ; void:inDataset <http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ; raw:first_name "John" ; raw:last_name "BULL" ; raw:congress "0" ; raw:p_url "http://bioguide.congress.gov/scripts/biodisplay.pl?index=B001047" ; raw:doc "2012-01-04T02:12:01" ; raw:state "SC" ; raw:death "1802" ; raw:birth "1740c" ; raw:party "¬†" ; raw:position "ContCong" ; raw:c_yr "" ; ov:csvRow "49"^^xsd:integer .
But what, exactly is better about? Well, lots of things. Different people are concerned about different aspects of the difference shown above. Some claims about quality may include:
foaf:firstNameis way better than
raw:first_namebecause 400 systems recognize it and display it.
raw:p_urlas a URI and label is incomprehensible to anyone that did not build this database. And it's a literal, which means that RDF agents will not know that it can be resolved on the web. Using
foaf:workInfoHomepageis way better because it already exists to associate a person with their work homepages. And systems recognize foaf already. And people know foaf already.
e1:congressis way better than
raw:congressbecause its value is a URI that can be further described. Being stuck with
raw:congress's value "0" is very uninformative. What do I do with zero? At very least, we can type the
biographical-directory-of-the-united-states-congress:congress_0and start describing it's temporal interval, etc.
foaf:memberOf, when that URI is not defined in the foaf namespace! That violates Linked Data principles. On the other hand, it's pretty obvious what it is -- it's the inverse of
foaf:memberand we can use it and have systems recognize it even without the FOAF Elite defining it in their vocabulary. Practicality can trump principles. Depending on who you ask.
local_vocab:Congresspersonis, but at least we know it's a kind of person foaf:Person. We can work with that.
dbpediaprop:state :SCis way better than
raw:state "SC"because lots of people run to dbpedia for example data, so more people will start using
dbpediaprop:state. But when more people start using it without clear, established rules, they they'll use it inconsistently. So the relation will have many meanings and runs the risk of becoming meaningless.
void:inDataset?! Well, some recognize one, some recognize the other. What if we want to talk to both of them? We say both.
http://www.w3.org/ns/prov-o/404s. What gives? The W3C working group isn't done yet.
DataFAQs is not designed to declare authoritative quality of the datasets it comes by. Instead, it is a framework to allow interested stakeholders to express, survey, and understand the aspects of quality that they and others value. This increased community understanding -- accelerated by automated, asynchronous feedback -- provides the basis for stakeholders to make better, more informed decisions about the vocabulary that they use. Those decisions are based on concrete, qualitative information that is provided by the community, for the community. DataFAQs just connects all of the dots, accumulates perspectives on datasets, and allows you to explore what the community thinks about your dataset.
DataFAQs can and will be used to assist vocabulary selection.
It is important to remember that DataFAQs is not only a resource that provides "grades" for datasets that you point it to. More importantly, it is a framework that allows any stakeholder to reflect their needs, interests, or preferences when it comes to the quality of any dataset.
How to help stakeholders find high-quality vocabs for the linked data they plan to publish... and subsequently to evaluate the resultant quality of their linked data?
DataFAQs connects data publishers with potential data consumers.
Last edited by timrdf,