including other dictionaries/lists as values of the additional attributes #50

Closed
aspinuso opened this Issue Jul 3, 2014 · 5 comments

Comments

Projects
None yet
2 participants

aspinuso commented Jul 3, 2014

I need to add an attribute like
{ve:parameters:[{"val": "../test-resources/testfiles/stations", "key": "stations_file"}]}
to an activity, in order to support activity specific parameters, which I don't want to treat as entities.

Same applies if I want to add an attribute "annotations" to an entity whose content is typically user defined. Say..
{ve:annotations:[{"val": "0.4", "key": "contribute"}]}

I.e. Annotations can be produced at run-time if specific property of the produced data are recognized.

I have noticed that adding such structured attributes to the 'other_attributes' parameter makes the serialisation fails with

File "/prov/model.py", line 321, in add_attributes
self._attributes[attr].add(value)
TypeError: unhashable type: 'list'

It does make sense to us, because of the characteristic of the provenance data we are producing.

In general I think that the api is not supporting a structure like the one shown in the EXAMPLE3 of
http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/
which should be expressed with something like

g.entity("e1",other_attributes={"ex:values": [{ "$":"1034","type":"xsd:positiveInteger"},2]})

aspinuso changed the title from including other dictionaries as values of the additional attributes to including other dictionaries/lists as values of the additional attributes Jul 3, 2014

Owner

trungdong commented Jul 4, 2014

Unfortunately, such structures are not defined in the PROV standards.
Values of an attribute can only be of certain types.

aspinuso commented Jul 4, 2014

Sure but it also says:

"This specification does not provide any interpretation for any attribute declared in any other namespace"
And the framework does support the declaration of additional namespaces already.

I consider PROV being a data model that promotes profiling and many community specific profiles are already out there, defining additional attributes and semantics.

It would be just enough to relax the constraints imposed by the implementation of the API, which is already not accepting the example provided by the JSON-PROV specification itself (as mentioned in the previous post when referring to the EXAMPLE3)

Thanks!

Owner

trungdong commented Jul 4, 2014

I believe quote you mentioned above is about attribute names. For example, if you have prov:role='ex:aRole' then it would mean something in the PROV data model. However, if you have ex:role='ex:aRole' then this is ignored by PROV.

Even so, converting a PROV statement from one representation to another requires that values of an attribute can be of certain types as I mentioned above. You are free to use whatever attribute names to extend the PROV data model, but if the value is of an unsupported type, the statement will no longer comply with PROV and cannot be converted to another representation. ProvStore, for example, won't be able to save it because its database schema only supports values from types allowed by PROV.

Having said that, it's still possible to save custom datatype with PROV. In that case you will need to encode the value, say a map/dict structure, to a string representation and decode it when you read the value back.

The PROV-JSON's Example 3 you mentioned above is JSON-specific. PROV-JSON serializer and deserializer are responsible for encoding and decoding the structure from/to a valid PROV value.

If you want to add multiple values to an entity in Python as in the example 3 above, you can do as follows:

e1.add_attributes([
    ('ex:values', Literal(1034, 'xsd:positiveInteger')),
    ('ex:values', Literal(82.5, 'xsd:decimal')),
    ('ex:values', Literal('Y29udGBudCBoZXJl', 'xsd:base64Binary'))
])

The PROV-JSON deserializer will generate the JSON structure that you saw in Example 3.

aspinuso commented Jul 5, 2014

Thanks trungdong

Then, let's have a look together at a real life scenario.
My workflow produces an entity, whose, say "ve:content", is composed by a collection of domain specific "things" rather than just a value. Now I want to describe this entity with the related metadata, in order to be able to search upon those.
I was expecting to be able to do something like:

{
"entity":{
"e1":{
"ve:content":[
{
"sm:sampling_rate":1000.0,
"sm:station":"MOMA",
"sm:delta":0.001,
"sm:calib":1.0,
"sm:type":"velocity",
"sm:channel":"FXE",
"sm:network":"IV"
},
{
"sm:sampling_rate":1000.0,
"sm:station":"MOMA",
"sm:delta":0.041,
"sm:calib":1.0,
"sm:type":"velocity",
"sm:channel":"FXZ",
"sm:network":"IV"
}
]
}
}
}

Any document store (for instance MongoDb) would allow you to do powerful queries over such data structures (we currently do so adopting our own provenance serialisation and database), injecting into generic provenance queries also domain specific concepts and search patterns.

Now, as far I understand, in order to use PROV and ProvStore, I should split this "single" entity in two, in order to be able to express and search upon the metadata which are characterizing the "single" output of my computation. Is that so? Is PROV enforcing in this case a higher fragmentation/granularity which goes beyond the semantics of the actual computation? How would you use prov to describe such scenarios?

Owner

trungdong commented Jul 9, 2014

If you don't want to 'split' the entity, you can encode the value in JSON as a string literal and that will work. Please note that PROV is a standard for interchanging provenance, it might not be suitable for storing application-native data. You can choose the storage mechanism and encoding that's best for your application and only need to convert the data to PROV when you need to expose/interchange them.

trungdong closed this Jul 29, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment