Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

WIP: Add rdf serialization #49

Open
wants to merge 20 commits into from

3 participants

satra added some commits
@satra satra fix: rename contants -> constants e5ffce8
@satra satra enh: first pass at serialization d1d1407
@satra satra remove debug print 5799160
@satra satra starting deserialization ac37255
@satra satra enh: updated to reflect latest upstream changes for attributes, added…
… bundle support
f2c263d
@satra satra compat: remove bundle statement to be compatible with Luc's output 95bde98
@satra satra fix: ensure that URIRefs remain as URIRefs c3198a3
@satra satra fix: resolved conflict 73b9e57
@satra satra fix: updated QName -> QualifiedName 1e2a8c6
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed Build status link
  Added build status from Travis-CI
  doc: fixed docstring for assertLess
  fix: remove redundant ls
  fix: remove checking for py3k
  fix: added assertLess
  fix: added support for asserts to unittest
  fix: more set fixes
  fix: support for 2.6 set
  fix: added python versions to tests
  fix: updated dateutil name
  enh: add travis testing file
4a6814f
@satra satra fix: remove diff statement 08d25d3
@satra satra fix: add rdflib to travis 481fe13
@trungdong
Owner

Hi @satra,

FYI, prov now has many more tests (https://github.com/trungdong/prov/tree/master/prov/tests), which you can use to test the RDF export. Since we don't have RDF import, you won't be able to do the round-trip tests as in the test cases, but even a one-way export test could be useful.

satra added some commits
@satra satra resolve conflicts 0ac1617
@satra satra updated requirements 32fac75
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed: Cloning the records when creating a new document from them
  Bugfix regarding a software agent record.
7f2cd65
@satra

@trungdong - i'm slowly making my way through rdf deserialization. in terms of comparing documents, how do you ensure that order of attributes don't matter?

ACTUAL: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="bonjour"@fr, prov:label="hello", prov:label="activity2", prov:label="bye"@en])
endDocument'
 DESIRED: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="activity2", prov:label="hello", prov:label="bye"@en, prov:label="bonjour"@fr])
endDocument'

to me these are the same graphs, but the roundtrip fails because the orders of attributes are different.

@trungdong
Owner

Hi @satra,
Glad you have some time to get on with this. Thanks.

I suggest you do prov.model --> RDF --> prov.model. Comparing two ProvDocument instances is not sensitive to ordering (of attributes or records) as it uses set instead of list.

@satra

thanks @trungdong - i was doing assert_equal(g.get_provn(), g1.get_provn()), but i changed to assert_equal(g, g1)

@satra

even sets are not quite doing their job - will have to look into this further.

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00,
 prov:type="ex:abc" %% xsd:QName, prov:type="http://example.org/hello" %% xsd:anyURI,
 prov:type="1.0" %% xsd:float, prov:type="true", prov:label="activity2"])

endDocument

vs

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00, 
prov:type="http://example.org/hello" %% xsd:anyURI, prov:type="1.0" %% xsd:float, prov:type="ex:abc"
 %% xsd:QName, prov:type="true", prov:label="activity2"])

endDocument

sets

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>, 
datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <Identifier: http://example.org/hello>), (<QualifiedName: prov:type>, <Literal: "1.0" %%
 xsd:float>), (<QualifiedName: prov:type>, <Literal: "ex:abc" %% xsd:QName>), (<QualifiedName:
 prov:type>, u'true'), (<QualifiedName: prov:label>, u'activity2')]

vs

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>,
 datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <XSDQName: ex:abc>), (<QualifiedName: prov:type>, <Identifier: http://example.org/hello>),
 (<QualifiedName: prov:type>, <Literal: "1.0" %% xsd:float>), (<QualifiedName: prov:type>, u'true'),
 (<QualifiedName: prov:label>, u'activity2')]
@satra

forgot to say that the graphs in the previous comment are failing the assert.

@satra

nevermind - found it - it's the QName

satra added some commits
@satra satra updated rdf serialization b201b3d
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed: PROV-N representation for  xsd:dateTime (closed #58)
  Fixed: Unintended merging of Identifier and QualifiedName values
54cfc77
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  fix: formal attributes were not being included in all attributes
089ae65
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed #60 but no need to touch ProvRecord.formal_attributes (as per #61)
68c129e
@satra satra current state 4909e4f
@cmaumet

Hi @satra. You will find below an example in which the serialization to rdf adds extra (unwanted) qualified relations:

from prov.model import ProvDocument
from exporter.objects.constants import *

if __name__ == '__main__':
        doc = ProvDocument()

        activity_id = NIIRI["activity"]
        doc.activity(activity_id)

        entity_1 = NIIRI["entity_1"]
        doc.entity(entity_1)

        entity_2 = NIIRI["entity_2"]
        doc.entity(entity_2)

        doc.used(activity_id, entity_1)
        doc.wasGeneratedBy(entity_1, activity_id)
        doc.wasDerivedFrom(entity_1, entity_1)

        ttl_file = "example.ttl"
        ttl_fid = open(ttl_file, 'w');
        ttl_fid.write(doc.serialize(format='rdf'))

Obtained turtle export:

@prefix niiri:  .
@prefix prov:  .
@prefix rdf:  .
@prefix rdfs:  .
@prefix xml:  .
@prefix xsd:  .

niiri:entity_2 a prov:Entity .

niiri:activity a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity niiri:entity_1 ] ;
    prov:used niiri:entity_1 .

niiri:entity_1 a prov:Entity ;
    prov:qualifiedDerivation [ a prov:Derivation ;
            prov:usedEntity niiri:entity_1 ] ;
    prov:qualifiedGeneration [ a prov:Generation ;
            prov:activity niiri:activity ] ;
    prov:wasDerivedFrom niiri:entity_1 ;
    prov:wasGeneratedBy niiri:activity .

Unfortunately, I did not find the fix...

I hope this example is useful. Let me know if I can help you to track this down!

@satra

@cmaumet - should entity_1 be derived from entity_1?

@satra

also the qualified relations aren't unwanted - that's how the representation for derivation is intended to be.

a wasDerivedFrom is a relationship, i.e. an edge between two nodes. the qualified derivation allows describing properties of that edge.

this is partly what makes the deserialization difficult.

@cmaumet

@satra: you are right entity_1 should have been derived from entity_2...

Let me look at a smaller example:

doc = ProvDocument()
activity_id = NIIRI["activity"]
doc.activity(activity_id)
entity_1 = NIIRI["entity_1"]
doc.entity(entity_1)   
doc.used(activity_id, entity_1)

Here is the turtle export:

@prefix niiri: <http://iri.nidash.org/> .
...
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

niiri:activity a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity niiri:entity_1 ] ;
    prov:used niiri:entity_1 .

niiri:entity_1 a prov:Entity .

Do we want a qualifiedUsage even if there are no property to attach to the used edge? Could we have a simpler serialisation:

@prefix niiri: <http://iri.nidash.org/> .
...
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

niiri:activity a prov:Activity ;
    prov:used niiri:entity_1 .

niiri:entity_1 a prov:Entity .

instead?

@satra

we wanted to match the prov translator. have you tried converting the provn output through the prov translator?

i think we would want to send some emails to the prov-o authors to see how this simpler scenario should play out. while i agree that in the simple case that set of triples is redundant, it would be good to hear from the folks who originally designed the qualified relations.

@cmaumet

Yes, I actually noticed this difference when trying to use the python toolbox instead of the java one.

provn output (from python prov toolbox):

document
          prefix niiri <http://iri.nidash.org/>

          activity(niiri:activity, -, -)
          entity(niiri:entity_1)
          used(niiri:activity, niiri:entity_1, -)
endDocument

Turtle serialisation (from ProvToolbox provconvert)

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix niiri: <http://iri.nidash.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


niiri:activity a prov:Activity .

niiri:entity_1 a prov:Entity .

niiri:activity prov:used niiri:entity_1 .
@satra

@cmaumet - thanks for that. i'll push some changes shortly

@cmaumet

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 18, 2014
  1. @satra
  2. @satra

    enh: first pass at serialization

    satra authored
  3. @satra

    remove debug print

    satra authored
  4. @satra

    starting deserialization

    satra authored
  5. @satra
  6. @satra
Commits on Jun 22, 2014
  1. @satra
Commits on Jun 23, 2014
  1. @satra

    fix: resolved conflict

    satra authored
  2. @satra
  3. @satra

    Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x

    satra authored
    * upstream/master:
      Fixed Build status link
      Added build status from Travis-CI
      doc: fixed docstring for assertLess
      fix: remove redundant ls
      fix: remove checking for py3k
      fix: added assertLess
      fix: added support for asserts to unittest
      fix: more set fixes
      fix: support for 2.6 set
      fix: added python versions to tests
      fix: updated dateutil name
      enh: add travis testing file
  4. @satra

    fix: remove diff statement

    satra authored
Commits on Jun 24, 2014
  1. @satra

    fix: add rdflib to travis

    satra authored
Commits on Sep 4, 2014
  1. @satra

    resolve conflicts

    satra authored
  2. @satra

    updated requirements

    satra authored
Commits on Oct 9, 2014
  1. @satra

    Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x

    satra authored
    * upstream/master:
      Fixed: Cloning the records when creating a new document from them
      Bugfix regarding a software agent record.
Commits on Oct 14, 2014
  1. @satra

    updated rdf serialization

    satra authored
  2. @satra

    Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x

    satra authored
    * upstream/master:
      Fixed: PROV-N representation for  xsd:dateTime (closed #58)
      Fixed: Unintended merging of Identifier and QualifiedName values
  3. @satra

    Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x

    satra authored
    * upstream/master:
      fix: formal attributes were not being included in all attributes
Commits on Nov 12, 2014
  1. @satra

    Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x

    satra authored
    * upstream/master:
      Fixed #60 but no need to touch ProvRecord.formal_attributes (as per #61)
Commits on Nov 14, 2014
  1. @satra

    current state

    satra authored
This page is out of date. Refresh to see the latest.
View
2  .travis.yml
@@ -18,4 +18,4 @@ script:
- coverage run setup.py test
after_success:
- - coveralls
+ - coveralls
View
2  prov/serializers/__init__.py
@@ -21,9 +21,11 @@ class Registry:
def load_serializers():
from prov.serializers.provjson import ProvJSONSerializer
from prov.serializers.provxml import ProvXMLSerializer
+ from prov.serializers.provrdf import ProvRDFSerializer
Registry.serializers = {
'json': ProvJSONSerializer,
+ 'rdf': ProvRDFSerializer,
'xml': ProvXMLSerializer
}
View
4 prov/serializers/provjson.py
@@ -14,8 +14,8 @@
import StringIO
from prov import Serializer, Error
from prov.constants import *
-from prov.model import Literal, Identifier, QualifiedName, XSDQName, Namespace, ProvDocument, ProvBundle, \
- first, parse_xsd_datetime
+from prov.model import (Literal, Identifier, QualifiedName, XSDQName, Namespace,
+ ProvDocument, ProvBundle, first, parse_xsd_datetime)
class ProvJSONException(Error):
View
526 prov/serializers/provrdf.py
@@ -0,0 +1,526 @@
+"""PROV-RDF serializers for ProvDocument
+
+@author: Satrajit Ghosh <satra@mit.edu>
+@copyright: University of Southampton 2014
+"""
+import logging
+logger = logging.getLogger(__name__)
+
+import base64
+import datetime
+import dateutil.parser
+from prov import Serializer, Error
+from prov.constants import *
+from prov.model import (Literal, Identifier, QualifiedName, Namespace,
+ ProvRecord, ProvDocument, XSDQName)
+import prov.model as pm
+
+attr2rdf = lambda attr: URIRef(PROV[PROV_ID_ATTRIBUTES_MAP[attr].split('prov:')[1]].uri)
+
+from rdflib.term import URIRef, BNode
+from rdflib.term import Literal as RDFLiteral
+from rdflib.graph import ConjunctiveGraph
+from rdflib.namespace import RDF, RDFS, XSD
+
+class ProvRDFException(Error):
+ pass
+
+
+class AnonymousIDGenerator():
+ def __init__(self):
+ self._cache = {}
+ self._count = 0
+
+ def get_anon_id(self, obj, local_prefix="id"):
+ if obj not in self._cache:
+ self._count += 1
+ self._cache[obj] = Identifier('_:%s%d' % (local_prefix,
+ self._count)).uri
+ return self._cache[obj]
+
+
+# Reverse map for prov.model.XSD_DATATYPE_PARSERS
+LITERAL_XSDTYPE_MAP = {
+ float: XSD['double'],
+ long: XSD['long'],
+ int: XSD['int'],
+ # boolean, string values are supported natively by PROV-JSON
+ # datetime values are converted separately
+}
+
+def valid_qualified_name(bundle, value, xsd_qname=False):
+ if value is None:
+ return None
+ qualified_name = bundle.valid_qualified_name(value)
+ return qualified_name if not xsd_qname else XSDQName(qualified_name)
+
+
+class ProvRDFSerializer(Serializer):
+ def serialize(self, stream=None, **kwargs):
+ container = self.encode_document(self.document)
+ newargs = kwargs.copy()
+ if newargs and 'rdf_format' in newargs:
+ newargs['format'] = newargs['rdf_format']
+ del newargs['rdf_format']
+ container.serialize(stream, **newargs)
+
+ def deserialize(self, stream, **kwargs):
+ newargs = kwargs.copy()
+ if newargs and 'rdf_format' in newargs:
+ newargs['format'] = newargs['rdf_format']
+ del newargs['rdf_format']
+ container = ConjunctiveGraph().parse(stream, **newargs)
+ document = ProvDocument()
+ self.document = document
+ self.decode_document(container, document)
+ return document
+
+ def valid_identifier(self, value):
+ return self.document.valid_qualified_name(value)
+
+ def encode_rdf_representation(self, value):
+ #print value, type(value) #dbg
+ if isinstance(value, URIRef):
+ return value
+ elif isinstance(value, Literal):
+ return literal_rdf_representation(value)
+ elif isinstance(value, datetime.datetime):
+ return RDFLiteral(value.isoformat(), datatype=XSD['dateTime'])
+ elif isinstance(value, QualifiedName):
+ #if value.namespace == PROV:
+ return URIRef(value.uri) #, datatype=XSD['QName'])
+ #else:
+ # return RDFLiteral(value, datatype=XSD['QName'])
+ elif isinstance(value, XSDQName):
+ return RDFLiteral(value, datatype=XSD['QName'])
+ elif isinstance(value, Identifier):
+ return URIRef(value.uri)
+ elif type(value) in LITERAL_XSDTYPE_MAP:
+ return RDFLiteral(value, datatype=LITERAL_XSDTYPE_MAP[type(value)])
+ else:
+ return RDFLiteral(value)
+
+ """
+ def decode_rdf_representation(self, literal):
+ if isinstance(literal, RDFLiteral):
+ # complex type
+ value = literal.value if literal.value is not None else literal
+ datatype = literal.datatype if hasattr(literal, 'datatype') else None
+ langtag = literal.language if hasattr(literal, 'language') else None
+ datatype = valid_qualified_name(self.document, datatype)
+ if datatype == XSD_ANYURI:
+ return Identifier(value)
+ elif datatype == XSD_QNAME:
+ return valid_qualified_name(self.document, value, xsd_qname=True)
+ elif datatype == PROV_QUALIFIEDNAME:
+ return valid_qualified_name(self.document, value)
+ else:
+ # The literal of standard Python types is not converted here
+ # It will be automatically converted when added to a record by _auto_literal_conversion()
+ return Literal(value, datatype, langtag)
+ elif isinstance(literal, URIRef):
+ val = unicode(literal)
+ return Identifier(val)
+ else:
+ # simple type, just return it
+ return literal
+
+ """
+ def decode_rdf_representation(self, literal):
+ #print(('Decode', literal))
+ if isinstance(literal, RDFLiteral):
+ value = literal.value if literal.value is not None else literal
+ datatype = literal.datatype if hasattr(literal, 'datatype') else None
+ langtag = literal.language if hasattr(literal, 'language') else None
+ if datatype and 'base64Binary' in datatype:
+ value = base64.standard_b64encode(value)
+ #print((value, datatype, langtag)) #dbg
+ '''
+ if datatype == XSD['anyURI']:
+ return Identifier(value)
+ elif datatype == PROV['QualifiedName']:
+ return self.valid_identifier(value)
+ '''
+ if datatype == XSD['QName']:
+ for ns in self.document.namespaces:
+ if literal.startswith(ns.prefix):
+ return pm.XSDQName(QualifiedName(ns,
+ literal.replace(ns.prefix + ':',
+ '')))
+ raise Exception('No namespace found for: %s' % literal)
+ if datatype == XSD['dateTime']:
+ return dateutil.parser.parse(literal)
+ else:
+ # The literal of standard Python types is not converted here
+ # It will be automatically converted when added to a record by _auto_literal_conversion()
+ return Literal(value, self.valid_identifier(datatype), langtag)
+ elif isinstance(literal, URIRef):
+ val = unicode(literal)
+ return Identifier(val)
+ else:
+ # simple type, just return it
+ return literal
+
+ def encode_document(self, document):
+ container = self.encode_container(document)
+ for b_id, b in document._bundles.items():
+ # encoding the sub-bundle
+ bundle = self.encode_container(b, identifier=b_id.uri)
+ container.addN(bundle.quads())
+ return container
+
+ def encode_container(self, bundle, container=None, identifier=None):
+ if container is None:
+ container = ConjunctiveGraph(identifier=identifier)
+ nm = container.namespace_manager
+ nm.bind('prov', PROV.uri)
+ prefixes = {}
+ for namespace in bundle._namespaces.get_registered_namespaces():
+ container.bind(namespace.prefix, namespace.uri)
+ if bundle._namespaces._default:
+ prefixes['default'] = bundle._namespaces._default.uri
+
+ id_generator = AnonymousIDGenerator()
+ real_or_anon_id = lambda record: record._identifier.uri if \
+ record._identifier else id_generator.get_anon_id(record)
+
+ for record in bundle._records:
+ rec_type = record.get_type()
+ rec_label = PROV[PROV_N_MAP[rec_type]].uri
+ if hasattr(record, 'identifier') and record.identifier: #record.is_relation():
+ identifier = URIRef(unicode(real_or_anon_id(record)))
+ container.add((identifier, RDF.type, URIRef(rec_type.uri)))
+ else:
+ identifier = None
+ if record.attributes:
+ bnode = None
+ formal_objects = []
+ used_objects = []
+ all_attributes = list(record.formal_attributes) + list(record.attributes)
+ #print all_attributes
+ #all_attributes = set(record.formal_attributes).union(set(record.attributes))
+ for idx, (attr, value) in enumerate(all_attributes):
+ #print identifier, idx, attr, value
+ #print record, rec_type.uri
+ if record.is_relation():
+ pred = URIRef(PROV[PROV_N_MAP[rec_type]].uri)
+ # create bnode relation
+ if bnode is None:
+ for key, val in record.formal_attributes:
+ formal_objects.append(key)
+ used_objects = [record.formal_attributes[0][0]]
+ subj = None
+ if record.formal_attributes[0][1]:
+ subj = URIRef(record.formal_attributes[0][1].uri)
+ if identifier is None and subj is not None:
+ try:
+ obj_val = record.formal_attributes[1][1]
+ obj_attr = URIRef(record.formal_attributes[1][0].uri)
+ except IndexError:
+ obj_val = None
+ if obj_val:
+ used_objects.append(record.formal_attributes[1][0])
+ obj_val = self.encode_rdf_representation(obj_val)
+ container.add((subj, pred, obj_val))
+ #print identifier, pred, obj_val
+ if rec_type in [PROV_ALTERNATE]: #, PROV_ASSOCIATION]:
+ continue
+ if subj:
+ QRole = URIRef(PROV['qualified' +
+ rec_type._localpart].uri)
+ if identifier is not None:
+ container.add((subj, QRole, identifier))
+ else:
+ identifier = BNode()
+ container.add((subj, QRole, identifier))
+ container.add((identifier, RDF.type,
+ URIRef(rec_type.uri)))
+ # reset identifier to BNode
+ '''
+ for key, val in record.formal_attributes:
+ formal_objects.append(key)
+ used_objects = [record.formal_attributes[0][0]]
+ if record.formal_attributes[0][1]:
+ identifier = URIRef(record.formal_attributes[0][1].uri)
+ try:
+ obj_val = record.formal_attributes[1][1]
+ obj_attr = URIRef(record.formal_attributes[1][0].uri)
+ except IndexError:
+ obj_val = None
+ if obj_val:
+ used_objects.append(record.formal_attributes[1][0])
+ obj_val = self.encode_rdf_representation(obj_val)
+ container.add((identifier, pred, obj_val))
+ print identifier, pred, obj_val
+ if rec_type in [PROV_ALTERNATE]: #, PROV_ASSOCIATION]:
+ continue
+ QRole = URIRef(PROV['qualified' +
+ rec_type._localpart].uri)
+ if hasattr(record, 'identifier') and record.identifier:
+ bnode = URIRef(record.identifier.uri)
+ else:
+ bnode = BNode()
+ container.add((identifier, QRole, bnode))
+ container.add((bnode, RDF.type,
+ URIRef(rec_type.uri)))
+ # reset identifier to BNode
+ identifier = bnode
+ print identifier, obj_attr, obj_val #dbg
+ if obj_val:
+ container.add((identifier, obj_attr, obj_val))
+ '''
+ if value is not None and attr not in used_objects:
+ #print 'attr', attr #dbg
+ if attr in formal_objects:
+ pred = attr2rdf(attr)
+ elif attr == PROV['role']:
+ pred = URIRef(PROV['hadRole'].uri)
+ elif attr == PROV['plan']:
+ pred = URIRef(PROV['hadPlan'].uri)
+ elif attr == PROV['type']:
+ pred = RDF.type
+ elif attr == PROV['label']:
+ pred = RDFS.label
+ elif isinstance(attr, QualifiedName):
+ pred = URIRef(attr.uri)
+ else:
+ pred = self.encode_rdf_representation(attr)
+ if PROV['plan'].uri in pred:
+ pred = URIRef(PROV['hadPlan'].uri)
+ #print identifier, pred, value #dbg
+ container.add((identifier, pred,
+ self.encode_rdf_representation(value)))
+ continue
+ if value is None:
+ continue
+ if isinstance(value, ProvRecord):
+ obj = URIRef(unicode(real_or_anon_id(value)))
+ else:
+ # Assuming this is a datetime value
+ obj = self.encode_rdf_representation(value)
+ #print type(value), type(obj)
+ if attr == PROV['location']:
+ pred = URIRef(PROV['atLocation'].uri)
+ if False and isinstance(value, (URIRef, QualifiedName)):
+ if isinstance(value, QualifiedName):
+ #value = RDFLiteral(unicode(value), datatype=XSD['QName'])
+ value = URIRef(value.uri)
+ container.add((identifier, pred, value))
+ #container.add((value, RDF.type,
+ # URIRef(PROV['Location'].uri)))
+ else:
+ container.add((identifier, pred,
+ self.encode_rdf_representation(obj)))
+ continue
+ #pred = attr2rdf(attr)
+ if attr == PROV['type']:
+ pred = RDF.type
+ elif attr == PROV['label']:
+ pred = RDFS.label
+ else:
+ pred = self.encode_rdf_representation(attr)
+ container.add((identifier, pred, obj))
+ return container
+
+ def decode_document(self, content, document):
+ for prefix, url in content.namespaces():
+ #if prefix in ['rdf', 'rdfs', 'xml']:
+ # continue
+ document.add_namespace(prefix, unicode(url))
+ for bundle_stmt in content.triples((None, RDF.type,
+ URIRef(pm.PROV['bundle'].uri))):
+ bundle_id = unicode(bundle_stmt[0])
+ if hasattr(content, 'contexts'):
+ for graph in content.contexts():
+ bundle_id = unicode(graph.identifier)
+ bundle = document.bundle(bundle_id)
+ self.decode_container(graph, bundle)
+ else:
+ self.decode_container(content, document)
+
+ def decode_container(self, graph, bundle):
+ ids = {}
+ PROV_CLS_MAP = {}
+ for key, val in PROV_N_MAP.items():
+ PROV_CLS_MAP[key.uri] = val
+ for key, val in ADDITIONAL_N_MAP.items():
+ PROV_CLS_MAP[key.uri] = val
+ for stmt in graph.triples((None, RDF.type, None)):
+ id = unicode(stmt[0])
+ obj = unicode(stmt[2])
+ #print obj, type(obj), obj in PROV_CLS_MAP #dbg
+ if obj in PROV_CLS_MAP:
+ #print 'obj_found' #dbg
+ try:
+ prov_obj = getattr(bundle, PROV_CLS_MAP[obj])(identifier=id)
+ except TypeError, e:
+ #print e
+ prov_obj = getattr(bundle, PROV_CLS_MAP[obj])
+ if id not in ids:
+ ids[id] = prov_obj
+ else:
+ raise ValueError(('An object cannot be of two different '
+ 'PROV types'))
+ other_attributes = {}
+ for stmt in graph.triples((None, RDF.type, None)):
+ id = unicode(stmt[0])
+ if id not in other_attributes:
+ other_attributes[id] = []
+ obj = unicode(stmt[2]) #unicode(stmt[2]).replace('http://www.w3.org/ns/prov#', '').lower()
+ if obj in PROV_CLS_MAP:
+ continue
+ elif id in ids:
+ obj = self.decode_rdf_representation(stmt[2])
+ if hasattr(ids[id], '__call__'):
+ other_attributes[id].append((pm.PROV['type'], obj))
+ else:
+ ids[id].add_attributes([(pm.PROV['type'], obj)])
+ for id, pred, obj in graph:
+ #print((id, pred, obj)) #dbg
+ id = unicode(id)
+ if id not in other_attributes:
+ other_attributes[id] = []
+ if pred == RDF.type:
+ continue
+ elif pred == URIRef(PROV['alternateOf'].uri):
+ bundle.alternate(id, unicode(obj))
+ elif pred == URIRef(PROV['wasAssociatedWith'].uri):
+ bundle.association(id, unicode(obj))
+ elif id in ids:
+ #print((id, pred, obj)) #dbg
+ obj1 = self.decode_rdf_representation(obj)
+ if pred == RDFS.label:
+ if hasattr(ids[id], '__call__'):
+ other_attributes[id].append((pm.PROV['label'], obj1))
+ else:
+ ids[id].add_attributes([(pm.PROV['label'], obj1)])
+ elif pred == URIRef(PROV['atLocation'].uri):
+ ids[id].add_attributes([(pm.PROV['location'], obj1)])
+ else:
+ if hasattr(ids[id], '__call__'):
+ if ids[id].__name__ == 'association':
+ if 'agent' in unicode(pred):
+ aid = ids[id](None, agent=obj1,
+ identifier=unicode(id))
+ ids[id] = aid
+ if other_attributes[id]:
+ aid.add_attributes(other_attributes[id])
+ other_attributes[id] = []
+ else:
+ if 'hadPlan' in pred:
+ pred = pm.PROV_ATTR_PLAN
+ elif 'hadRole' in pred:
+ pred = PROV_ROLE
+ other_attributes[id].append((pred, obj1))
+ else:
+ if 'hadPlan' in pred:
+ ids[id].add_attributes([(pm.PROV_ATTR_PLAN, obj1)])
+ elif 'hadRole' in pred:
+ ids[id].add_attributes([(PROV_ROLE,
+ obj1)])
+ else:
+ ids[id].add_attributes([(unicode(pred), obj1)])
+ if unicode(obj) in ids:
+ #print obj #dbg
+ if pred == URIRef(PROV['qualifiedAssociation'].uri):
+ if hasattr(ids[unicode(obj)], '__call__'):
+ aid = ids[unicode(obj)](id, identifier=unicode(obj))
+ if other_attributes[id]:
+ aid.add_attributes(other_attributes[id])
+ other_attributes[id] = []
+ ids[unicode(obj)] = aid
+ else:
+ ids[unicode(obj)].add_attributes([(pm.PROV_ATTR_ACTIVITY,
+ id)])
+ #print other_attributes #dbg
+ for key, val in other_attributes.items():
+ if val:
+ ids[key].add_attributes(val)
+
+ '''
+ if u'prefix' in jc:
+ prefixes = jc[u'prefix']
+ for prefix, uri in prefixes.items():
+ if prefix != 'default':
+ bundle.add_namespace(Namespace(prefix, uri))
+ else:
+ bundle.set_default_namespace(uri)
+ del jc[u'prefix']
+
+ for rec_type_str in jc:
+ rec_type = PROV_RECORD_IDS_MAP[rec_type_str]
+ for rec_id, content in jc[rec_type_str].items():
+ if rec_type == PROV_BUNDLE:
+ raise ProvRDFException('A bundle cannot have nested bundles')
+ else:
+ if hasattr(content, 'items'): # it is a dict
+ # There is only one element, create a singleton list
+ elements = [content]
+ else:
+ # expect it to be a list of dictionaries
+ elements = content
+
+ for element in elements:
+ prov_attributes = {}
+ extra_attributes = []
+ # Splitting PROV attributes and the others
+ membership_extra_members = None # this is for the multiple-entity membership hack to come
+ for attr, value in element.items():
+ if attr in PROV_ATTRIBUTES_ID_MAP:
+ attr_id = PROV_ATTRIBUTES_ID_MAP[attr]
+ if isinstance(value, list):
+ # Multiple values
+ if len(value) == 1:
+ # Only a single value in the list, unpack it
+ value = value[0]
+ else:
+ if rec_type == PROV_MEMBERSHIP and attr_id == PROV_ATTR_ENTITY:
+ # This is a membership relation with multiple entities
+ # HACK: create multiple membership relations, one for each entity
+ membership_extra_members = value[1:] # Store all the extra entities
+ value = value[0] # Create the first membership relation as normal for the first entity
+ else:
+ error_msg = 'The prov package does not support PROV attributes having multiple values.'
+ logger.error(error_msg)
+ raise ProvRDFException(error_msg)
+ prov_attributes[attr_id] =\
+ self.valid_identifier(value) if attr_id not in PROV_ATTRIBUTE_LITERALS else \
+ self.decode_rdf_representation(value)
+ else:
+ attr_id = self.valid_identifier(attr)
+ if isinstance(value, list):
+ # Parsing multi-value attribute
+ extra_attributes.extend(
+ (attr_id, self.decode_rdf_representation(value_single))
+ for value_single in value
+ )
+ else:
+ # add the single-value attribute
+ extra_attributes.append((attr_id, self.decode_rdf_representation(value)))
+ bundle.add_record(rec_type, rec_id, prov_attributes, extra_attributes)
+ # HACK: creating extra (unidentified) membership relations
+ if membership_extra_members:
+ collection = prov_attributes[PROV_ATTR_COLLECTION]
+ for member in membership_extra_members:
+ bundle.membership(collection, self.valid_identifier(member))
+ '''
+
+def literal_rdf_representation(literal):
+ value = unicode(literal.value) if literal.value else literal
+ if literal.langtag:
+ # a language tag can only go with prov:InternationalizedString
+ return RDFLiteral(value, lang=str(literal.langtag))
+ else:
+ datatype = literal.datatype
+ '''
+ if isinstance(datatype, QualifiedName):
+ print 'QName', datatype, datatype.uri
+ return RDFLiteral(unicode(literal.value),
+ datatype=unicode(datatype))
+ else:
+ # Assuming it is a valid identifier
+ print 'URI', datatype
+ '''
+ if 'base64Binary' in datatype.uri:
+ value = base64.standard_b64encode(value)
+ return RDFLiteral(value, datatype=datatype.uri)
View
3  requirements.txt
@@ -2,4 +2,5 @@ lxml==3.3.5
pydot==1.0.2
pyparsing==1.5.7
python-dateutil==2.2
-wheel==0.24.0
+wheel==0.24.0
+rdflib>=4.1.2
Something went wrong with that request. Please try again.