[MRG]: Add rdf serialization #49

Merged
merged 35 commits into from Oct 9, 2016

Conversation

Projects
None yet
4 participants
Contributor

satra commented Jun 23, 2014 edited

This is ready for merge!

satra added some commits Apr 28, 2014

@satra satra fix: rename contants -> constants e5ffce8
@satra satra enh: first pass at serialization d1d1407
@satra satra remove debug print 5799160
@satra satra starting deserialization ac37255
@satra satra enh: updated to reflect latest upstream changes for attributes, added…
… bundle support
f2c263d
@satra satra compat: remove bundle statement to be compatible with Luc's output 95bde98
@satra satra fix: ensure that URIRefs remain as URIRefs c3198a3
@satra satra fix: resolved conflict 73b9e57
@satra satra fix: updated QName -> QualifiedName 1e2a8c6
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed Build status link
  Added build status from Travis-CI
  doc: fixed docstring for assertLess
  fix: remove redundant ls
  fix: remove checking for py3k
  fix: added assertLess
  fix: added support for asserts to unittest
  fix: more set fixes
  fix: support for 2.6 set
  fix: added python versions to tests
  fix: updated dateutil name
  enh: add travis testing file
4a6814f
@satra satra fix: remove diff statement 08d25d3
@satra satra fix: add rdflib to travis 481fe13
Owner

trungdong commented Jul 14, 2014

Hi @satra,

FYI, prov now has many more tests (https://github.com/trungdong/prov/tree/master/prov/tests), which you can use to test the RDF export. Since we don't have RDF import, you won't be able to do the round-trip tests as in the test cases, but even a one-way export test could be useful.

satra added some commits Sep 4, 2014

@satra satra resolve conflicts 0ac1617
@satra satra updated requirements 32fac75
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed: Cloning the records when creating a new document from them
  Bugfix regarding a software agent record.
7f2cd65
Contributor

satra commented Oct 11, 2014

@trungdong - i'm slowly making my way through rdf deserialization. in terms of comparing documents, how do you ensure that order of attributes don't matter?

ACTUAL: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="bonjour"@fr, prov:label="hello", prov:label="activity2", prov:label="bye"@en])
endDocument'
 DESIRED: u'document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:label="activity2", prov:label="hello", prov:label="bye"@en, prov:label="bonjour"@fr])
endDocument'

to me these are the same graphs, but the roundtrip fails because the orders of attributes are different.

Owner

trungdong commented Oct 11, 2014

Hi @satra,
Glad you have some time to get on with this. Thanks.

I suggest you do prov.model --> RDF --> prov.model. Comparing two ProvDocument instances is not sensitive to ordering (of attributes or records) as it uses set instead of list.

Contributor

satra commented Oct 11, 2014

thanks @trungdong - i was doing assert_equal(g.get_provn(), g1.get_provn()), but i changed to assert_equal(g, g1)

Contributor

satra commented Oct 11, 2014

even sets are not quite doing their job - will have to look into this further.

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00,
 prov:type="ex:abc" %% xsd:QName, prov:type="http://example.org/hello" %% xsd:anyURI,
 prov:type="1.0" %% xsd:float, prov:type="true", prov:label="activity2"])

endDocument

vs

document
  prefix ex <http://example.org/>

  activity(ex:a2, -, -, [prov:type="a", prov:type=1, prov:type=2014-06-23T12:28:53.843000+01:00, 
prov:type="http://example.org/hello" %% xsd:anyURI, prov:type="1.0" %% xsd:float, prov:type="ex:abc"
 %% xsd:QName, prov:type="true", prov:label="activity2"])

endDocument

sets

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>, 
datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <Identifier: http://example.org/hello>), (<QualifiedName: prov:type>, <Literal: "1.0" %%
 xsd:float>), (<QualifiedName: prov:type>, <Literal: "ex:abc" %% xsd:QName>), (<QualifiedName:
 prov:type>, u'true'), (<QualifiedName: prov:label>, u'activity2')]

vs

[(<QualifiedName: prov:type>, u'a'), (<QualifiedName: prov:type>, 1), (<QualifiedName: prov:type>,
 datetime.datetime(2014, 6, 23, 12, 28, 53, 843000, tzinfo=tzoffset(None, 3600))), (<QualifiedName:
 prov:type>, <XSDQName: ex:abc>), (<QualifiedName: prov:type>, <Identifier: http://example.org/hello>),
 (<QualifiedName: prov:type>, <Literal: "1.0" %% xsd:float>), (<QualifiedName: prov:type>, u'true'),
 (<QualifiedName: prov:label>, u'activity2')]
Contributor

satra commented Oct 11, 2014

forgot to say that the graphs in the previous comment are failing the assert.

Contributor

satra commented Oct 11, 2014

nevermind - found it - it's the QName

satra added some commits Oct 14, 2014

@satra satra updated rdf serialization b201b3d
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed: PROV-N representation for  xsd:dateTime (closed #58)
  Fixed: Unintended merging of Identifier and QualifiedName values
54cfc77
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  fix: formal attributes were not being included in all attributes
089ae65
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Fixed #60 but no need to touch ProvRecord.formal_attributes (as per #61)
68c129e
@satra satra current state 4909e4f
Contributor

cmaumet commented Dec 4, 2014

Hi @satra. You will find below an example in which the serialization to rdf adds extra (unwanted) qualified relations:

from prov.model import ProvDocument
from exporter.objects.constants import *

if __name__ == '__main__':
        doc = ProvDocument()

        activity_id = NIIRI["activity"]
        doc.activity(activity_id)

        entity_1 = NIIRI["entity_1"]
        doc.entity(entity_1)

        entity_2 = NIIRI["entity_2"]
        doc.entity(entity_2)

        doc.used(activity_id, entity_1)
        doc.wasGeneratedBy(entity_1, activity_id)
        doc.wasDerivedFrom(entity_1, entity_1)

        ttl_file = "example.ttl"
        ttl_fid = open(ttl_file, 'w');
        ttl_fid.write(doc.serialize(format='rdf'))

Obtained turtle export:

@prefix niiri:  .
@prefix prov:  .
@prefix rdf:  .
@prefix rdfs:  .
@prefix xml:  .
@prefix xsd:  .
niiri:entity_2 a prov:Entity .
niiri:activity a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity niiri:entity_1 ] ;
    prov:used niiri:entity_1 .
niiri:entity_1 a prov:Entity ;
    prov:qualifiedDerivation [ a prov:Derivation ;
            prov:usedEntity niiri:entity_1 ] ;
    prov:qualifiedGeneration [ a prov:Generation ;
            prov:activity niiri:activity ] ;
    prov:wasDerivedFrom niiri:entity_1 ;
    prov:wasGeneratedBy niiri:activity .

Unfortunately, I did not find the fix...

I hope this example is useful. Let me know if I can help you to track this down!

Contributor

satra commented Dec 4, 2014

@cmaumet - should entity_1 be derived from entity_1?

Contributor

satra commented Dec 4, 2014

also the qualified relations aren't unwanted - that's how the representation for derivation is intended to be.

a wasDerivedFrom is a relationship, i.e. an edge between two nodes. the qualified derivation allows describing properties of that edge.

this is partly what makes the deserialization difficult.

Contributor

cmaumet commented Dec 4, 2014

@satra: you are right entity_1 should have been derived from entity_2...

Let me look at a smaller example:

doc = ProvDocument()
activity_id = NIIRI["activity"]
doc.activity(activity_id)
entity_1 = NIIRI["entity_1"]
doc.entity(entity_1)   
doc.used(activity_id, entity_1)

Here is the turtle export:

@prefix niiri: <http://iri.nidash.org/> .
...
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

niiri:activity a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity niiri:entity_1 ] ;
    prov:used niiri:entity_1 .

niiri:entity_1 a prov:Entity .

Do we want a qualifiedUsage even if there are no property to attach to the used edge? Could we have a simpler serialisation:

@prefix niiri: <http://iri.nidash.org/> .
...
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

niiri:activity a prov:Activity ;
    prov:used niiri:entity_1 .

niiri:entity_1 a prov:Entity .

instead?

Contributor

satra commented Dec 4, 2014

we wanted to match the prov translator. have you tried converting the provn output through the prov translator?

i think we would want to send some emails to the prov-o authors to see how this simpler scenario should play out. while i agree that in the simple case that set of triples is redundant, it would be good to hear from the folks who originally designed the qualified relations.

Contributor

cmaumet commented Dec 4, 2014

Yes, I actually noticed this difference when trying to use the python toolbox instead of the java one.

provn output (from python prov toolbox):

document
          prefix niiri <http://iri.nidash.org/>

          activity(niiri:activity, -, -)
          entity(niiri:entity_1)
          used(niiri:activity, niiri:entity_1, -)
endDocument

Turtle serialisation (from ProvToolbox provconvert)

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix niiri: <http://iri.nidash.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


niiri:activity a prov:Activity .

niiri:entity_1 a prov:Entity .

niiri:activity prov:used niiri:entity_1 .
Contributor

satra commented Dec 4, 2014

@cmaumet - thanks for that. i'll push some changes shortly

Contributor

cmaumet commented Dec 4, 2014

thank you

Contributor

satra commented Mar 19, 2016 edited

a few more things to finalize:

  • support bundles (via trig)
  • make the tests run on travis
  • check py3 support
  • settle an issue with Decimal representation (see #77)
  • skip scruffy round trip tests for now - just ensure they can be read without error for the moment.
Contributor

satra commented Mar 20, 2016

these tests fail round trip - need to figure out a way to skip.

FAIL: test_scruffy_end_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_end_3 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_end_4 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_generation_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_invalidation_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_2 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_3 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_start_4 (prov.tests.test_rdf.RoundTripRDFTests)
FAIL: test_scruffy_usage_2 (prov.tests.test_rdf.RoundTripRDFTests)
Contributor

satra commented Mar 20, 2016

@trungdong - some of the issues here are unfortunately due to rdflib interactions. but this is good for review. i still need to figure out the python 3 errors, again an interaction with rdflib! and how to suppress the failing scruff tests.

satra added some commits Apr 20, 2016

@satra satra Merge branch 'fix/literal' into enh/rdf-1.x
* fix/literal:
  fix: extra curly bracket
  fix: test setup
  fix: only escape triple quotes in the triple quote case
  fix: string representation containing double quotes or triple quotes - closes #79
bc822d4
@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
56e7815
@satra satra fix: skipping known roundtrip and literal representation failures 1debd4b

coveralls commented Jun 27, 2016 edited

Coverage Status

Coverage decreased (-1.3%) to 89.442% when pulling 1debd4b on satra:enh/rdf-1.x into a556fa7 on trungdong:master.

satra added some commits Oct 8, 2016

@satra satra Merge remote-tracking branch 'upstream/master' into enh/rdf-1.x
* upstream/master:
  Remove networkx versioning also in setup.py
  Relaxed networkx requirement. Closed #84.
  Fix deprecated usage of cgi.escape since Python 3.3
949ed48
@satra satra fixed tests and skipping round trip tests that don't pass 1f8955e

coveralls commented Oct 8, 2016 edited

Coverage Status

Coverage decreased (-1.5%) to 89.207% when pulling 1f8955e on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

Coverage Status

Coverage decreased (-0.4%) to 90.311% when pulling 5a29a1b on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

coveralls commented Oct 8, 2016 edited

Coverage Status

Coverage decreased (-0.4%) to 90.311% when pulling 5a29a1b on satra:enh/rdf-1.x into 5ddeade on trungdong:master.

satra changed the title from WIP: Add rdf serialization to [MRG]: Add rdf serialization Oct 8, 2016

Contributor

satra commented Oct 8, 2016

@trungdong - finally got some time to fix and this is ready for merge :)

Contributor

satra commented Oct 8, 2016

closes #1

@trungdong trungdong merged commit 4d2c236 into trungdong:master Oct 9, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Owner

trungdong commented Oct 9, 2016

Excellent! Thank you very much @satra!!!
I'll try to get some time this week to clean up the current master branch and will make a new release with RDF support soon.

@cmaumet cmaumet pushed a commit to cmaumet/nidmresults that referenced this pull request Oct 13, 2016

cmaumet Export as RDF using prov library
new feature from: trungdong/prov#49
0b78abf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment