Skip to content
Jose María Alvarez edited this page May 31, 2013 · 3 revisions

Intro

One of the cornerstones to boost the use of Linked Data is to ensure the quality of data according to different terms like timely, correctness, etc. The intrinsic features of this initiative provide a framework for the distributed publication of data and resources (linking together datasources on the web). Due to this open approach some mechanisms should be added to check if data is well linked or it is just a try to link together some part of the web. Most of the cases of linking data use an automatic way to discover and create links between resources (e.g. Silk Framework), this situation implies that the process is, in some factors, ambiguous so human decision is required. In the case of the data, the quality may vary as information providers have different levels of knowledge, objectives, etc. Thus information and data are released in order to accomplish a specific task and their quality should be assessed depending on different criteria according to a specific domain.

For instance, a data provider is releasing information about payments, is it possible to check which is the decimal separator, 10,000 or 10.000? is this information homogenous across all resources in the dataset?. If a literal value should be “Oviedo”, what happen if the real value is “Obiedo”? How we can detect and fix these situations?

These cases have motivated some related work:

  • The PhD thesis of Christian Bizer that purposes a template language and a framework (WIQA) to detect if a triple fulfills the requirements to be accepted in a dataset. (2007)
  • LODQ vocabulary is a RDF model to express criteria about 15 kind of metrics that have been formulated by Glenn McDonald in a mailing list. A processor of this vocabulary is still missing. (2011)
  • A paper entitled “Linked Data Quality Assessment through Network Analysis” by Christian Gueret, in which some metrics are provided to check the quality of links. This work is part of the LATC project. (2011)
  • The workshop COLD (Consuming Linked Data) is also a good start point to check problems and approaches to deal with the requirements of implementing linked data applications
  • ...that are collected in the aforementioned works.

In some sense we should think that this problem is new but the truth is that it is inherited from the traditional databases. One of the arising questions is the possibility of applying existing approaches to solve the assessment of quality in the linked data realm…but this will be evaluated in next posts.

Types of Quality

  • Provenance: Are the source values the same in the generated RDF?
  • Structural: Does all RDF resources accomplish with a template?
  • Contents
  • Does all the properties in generated RDF resources accomplish with range and domain?
  • Does all RDF resources accomplish with rules in contents? for instance are all values > 0?
  • Access
  • Content negotiation
  • Design Patterns
  • External tools

Tools

  • Vapour (http://idi.fundacionctic.org/vapour)
  • Dereferenceable URIs. Check that all URIs employed are deferenceable.
  • Content negotiation. Use a system to check content negotiation. Use a system like Vapour.
  • In order to check if the dataset can be published in the LOD Cloud Diagram) and in thedatahub.org. After filling the metadata of the dataset this checker must be passed Data Hub LOD Validator.
  • CKAN’s QA extension. The extension runs automatically in the background and calculates the star-rating as documented on the CKAN wiki. At present it can still only calculate the first 4 stars (out of 5); there’s a proposal for calculating the fifth star.

References

  • Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres and Stefan Decker. "An empirical survey of Linked Data conformance". In the Journal of Web Semantics 14 [doi:10.1016/j.websem.2012.02.001]
  • http://www4.wiwiss.fu-berlin.de/bizer/wiqa/
  • http://logd.tw.rpi.edu/category/keywords/linked_data
  • Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. Linked Data Quality Assessment through Network Analysis. 2011.
  • Linked Data Basic Profile, http://www.w3.org/Submission/2012/02/
  • https://github.com/pornel/hCardValidator
  • http://www.w3.org/2007/08/pyRdfa/
  • Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, Axel Polleres. "Weaving the Pedantic Web". In the Proceedings of the Linked Data on the Web WWW2010
  • Jürgen Umbrich, Michael Hausenblas, Aidan Hogan, Axel Polleres, Stefan Decker. "Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources". In the Proceedings of the Linked Data on the Web WWW2010 Workshop (LDOW 2010), Raleigh, North Carolina, USA, 27 April, 2010.
  • Tobias Käfer, Jürgen Umbrich, Aidan Hogan, Axel Polleres. "Towards a Dynamic Linked Data Observatory". In the Proceedings of the Linked Data on the Web WWW2012 Workshop (LDOW 2012), Lyon, France, 16 April, 2010.