Skip to content

Publication: IPAW 2014

Tim L edited this page Mar 9, 2015 · 109 revisions

This pages provides supplemental details of our IPAW 2014 paper:

  • Walking into the Future with PROV Pingback: An Application to OPeNDAP using Prizms

(paper pdf, talk slides)


@incollection{lebo2014walking,
	Author = {Lebo, Timothy and West, Patrick and McGuinness, Deborah L.},
	Booktitle = {Provenance and Annotation of Data and Processes},
	Editor = {Ludaescher, Bertram and Plale, Beth},
	Keywords = {Data Integration; Transparency; Provenance Granularity; Derived Abstractions; Provenance of Provenance; Linked Data},
	Publisher = {Springer Berlin Heidelberg},
	Series = {Lecture Notes in Computer Science},
	Title = {Walking into the Future with PROV Pingback: An Application to OPeNDAP using Prizms (in press)},
	Year = {2014}}

2 The State of the Linked PROV Cloud

  • lodcloud project's source/openlinksw-com/lod-cloud-cache-ns-prov will reproduce the term occurrence queries.
    • The analysis itself occupies 2.9GB in a development clone of the lodcloud project, which required some manual intervention. Virtually all of the 2.9GB is the 24 million RDF subject URIs of prov:wasDerivedFrom.
    • The initial portion without materializing the subjects was replicated into version 2014-Mar-03. It contains the counts.
  • This page lists some live queries into datahub.io for lodcloud datasets.

3.1 Prizms’ “SDV” Dataset Organization

3.2 A Concrete Basis: Modeling the Structure of the Host System

Details about this section can be found on the wiki.

3.4 Prizms Publishes Host System’s prov:has provenance Target

3.5 Prizms Accepts Pingback Pointers

(git proxy to track who accesses the repositories, e.g. http://gitprov.org/clone/https/github/provbench/ProvenanceCaptureDisparaties) would behave exactly like https://github.com/provbench/Wikipedia-PROV, but would record that it was cloned and by whom (e.g. IP).

3.6 Prizms Retrieves, Analyzes, and Rehosts Pingback Pointers

This SPARQL query is the one used to provide the downstream listing shown in the paper.

4 Discussion

Pingback on Twitter (PB)

(thanks to the reviewer for the suggestion to consider Twitter as a medium for pingbacks).

Twitter's retweet format is RT @<cited-user-name> <quotation>. It would be cool to invent the pingback convention that would be something like:

PB @<cited-user-name>+ <cited-users-upstream-antecedent-URL> <downstream-derivation-URL>

Examples:

Questions at Talk

(Notes for the event)

  • Q: What about looking for extensions of PROV in your search? Most application use extensions, not PROV directly...
    • A(online): Yes, this needs to be done. Against */sparql not just on datahub.io listings or in OpenLink's LODCache.
    • A(offline): How to find all extensions of PROV? Those listed at http://lov.okfn.org/dataset/lov/details/vocabulary_prov.html only grows when someone manually submits to the curators, who manually include it. It automatically includes any vocab that a listed vocab uses, but that's the wrong direction when trying to find extensions.
  • Q: What about security issues?
    • A(online): I scared to death of the potential for abuse of pingback. Some good work to do here. Centers around the fact that you're retrieving any URL that someone gives you via pingback. I don't keep the service up because of this risk.
  • Q: OPeNDAP is awesome, but what about the larger workflow that a user goes about to do their job? (OPeNDAP is just the query selection, something inevitably gets computed later).
    • A(online): I'm not aware of the abstract workflow that a typical OPeNDAP user goes through, but if I were to tackle it to support them with a more complete provenance-enabled system, I would start with my abstract workflow and data organization paradigm called "Situated Computation" (of which SDV Organization is one part). The advantage here is flexibility for the user to do that they need, but with some underlying version control, some methodological structure, and "Linked [meta]Data + PROV" "for free" from the Prizms Linked Data publishing platform.

Glossary

(this page's bitly: http://bit.ly/lebo-ipaw-2014, http://bit.ly/prov-pingback-via-twitter)

http://opendap.tw.rpi.edu/sparql?default-graph-uri=&query=PREFIX+foaf%3A++++%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+prov%3A++++%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0A%0D%0Aselect+distinct+%3Fhost_input+%3Fclient_copy+%3Fclient_derivation+%3Fformat+%3FF%0D%0Awhere+%7B%0D%0A++%3Fhost_response+++++++++++++++++++++++++++++++++++++++++++++++++++%0D%0A++++foaf%3AisPrimaryTopicOf+%3Chttp%3A%2F%2Fopendap.tw.rpi.edu%2Fsource%2Fus%2Fdataset%2Fopendap-prov%2Fversion%2F20140304-1393967542-ab12%3E%3B+++++++++++++++++++++++++++++++++++++%0D%0A++++prov%3AwasDerivedFrom+%5B+prov%3AspecializationOf+%3Fhost_input+%5D.++++++++%0D%0A+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++%0D%0A++%3Fhost_input+++++++++++++++++++++++++++++++++++++++++++++++++++++++%0D%0A+++++%5E%28prov%3AwasDerivedFrom+%7C+prov%3AwasQuotedFrom%29++%3Fclient_copy.+++++%0D%0A++%3Fclient_copy+++++++++++++++++++++++++++++++++++++++++++++++++++++%0D%0A+++++%5E%28prov%3AwasDerivedFrom+%7C+prov%3AwasQuotedFrom%29%2B+%3Fclient_derivation.+%0D%0A++optional+%7B++%3Fformat+%5Edcterms%3Aformat+%3Fclient_derivation++++++++++++++%0D%0A++++optional+%7B%3Fformat++dcterms%3Atitle++%3FF%7D+%7D%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on

Subsequent related work