Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project context [RPCX] #71

Open
jpullmann opened this issue Jan 18, 2018 · 27 comments

Comments

@jpullmann
Copy link
Contributor

commented Jan 18, 2018

Project context [RPCX]

Provide a means to define a 'project' as a research, funding or work organzation context of a dataset.


Related use cases: Dataset business context [ID49] Modeling funding sources [ID31] 
@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2018

I have a draft proposal for a small vocabulary for Project, as a subclass of prov:Activity.
See
https://dr-shorthair.github.io/ont/project/

This would support linking a dataset to a project using the prov:wasGeneratedBy predicate, as mentioned in #77

@andrea-perego

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2018

Thanks, @dr-shorthair . I would also like to contribute some work we did to map DataCite to DCAT-AP, which includes the mapping of what in DataCite is called "Funding Reference". The mapping tables are here: https://ec-jrc.github.io/datacite-to-dcat-ap/

I think this is also one of the possible use cases for the use of qualified and non-qualified forms.
The non-qualified form is the basic case where you want to say that a given dataset has been created by a given project. However, if you also need to say that this is done in a given timeframe of the activity of a project you need to add a node in the graph, between "project" and "dataset", where to attach this information.

@dr-shorthair , in my understanding the vocabulary you contributed support both cases, right?

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2018

Linking through to funding details is part of the proposed Project ontology. Not sure if I got it all right yet, so would be interested in working through other examples.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2018

Suggest removing the following labels:
profile, semantics, service, usage_control, version
??

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

Picking up the second example on #253 which describes a dataset from CSIRO's DAP, the following uses PROV to document the project context for the dataset. The PROV-O property prov:wasGeneratedBy points to dap:P366 which is a prov:Activity, which in turn is associated with dap:ATNF and used the dap:Parkes-radio-telescope.

dap:atnf-P366-2003SEPT
  rdf:type dcat:Dataset ;
# other properties omitted here
  dcterms:identifier "https://doi.org/10.4225/08/598dc08d07bb7"^^xsd:anyURI ;
  dcterms:relation [      dcterms:identifier "PH0090_0011.sf" ;    ] ;
  dcterms:relation [      dcterms:identifier "PH0090_0021.sf" ;    ] ;
  dcterms:relation [      dcterms:identifier "PH0090_0031.sf" ;    ] ;
  dcterms:title "Parkes observations for project P366 semester 2003SEPT" ;
  dcat:contactPoint dap:MartaBurgay-vcard ;
  dcat:keyword "pulsar" ;
  dcat:landingPage <https://data.csiro.au/dap/landingpage?pid=csiro:P366-2003SEPT> ;
  prov:wasGeneratedBy dap:P366 ;
.
dap:P366
  rdf:type prov:Activity ;
  dcterms:contributor dap:A_Lyne , dap:Andrea_Possenti , dap:B_Joshi , dap:F_Camilo , dap:G_Pearce , dap:M_Kramer , dap:M_McLaughlin , dap:Nichi_D'Amico , dap:R_Manchester ;
  dcterms:type "Observation" ;
  rdfs:comment "Parkes multibeam high-latitude pulsar survey" ;
  rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey" ;
  prov:used dap:Parkes-radio-telescope ;
  prov:wasAssociatedWith dap:Marta_Burgay ;
  prov:wasInformedBy dap:ATNF ;
.
dap:ATNF
  rdf:type prov:Activity ;
  rdfs:label "Australia Telescope National Facility" ;
  prov:informed dap:P366 ;
.

Note that prov:wasGeneratedBy is axiomatized

prov:wasGeneratedBy
  rdf:type owl:ObjectProperty ;
  rdfs:domain prov:Entity ;
  rdfs:range prov:Activity ;
.

so this entails that dap:atnf-P366-2003SEPT is a prov:Entity.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2018

... and not being entirely happy with the limitations of PROV and DC, here is the same activity described using my proposed Project Ontology, which specializes prov:Activity for planned and budgeted activities commonly known as Projects :-)

dap:P366-1
  rdf:type prov:Activity ;
  rdfs:comment "Parkes multibeam high-latitude pulsar survey" ;
  rdfs:label "P366 - Parkes multibeam high-latitude pulsar survey" ;
  proj:hasParticipant dap:A_Lyne , dap:Andrea_Possenti , dap:B_Joshi , dap:F_Camilo , dap:G_Pearce , dap:M_Kramer , dap:M_McLaughlin , dap:Nichi_D'Amico, dap:R_Manchester ;
  proj:hasPrincipalInvestigator dap:Marta_Burgay ;
  proj:isSubActivityOf dap:ATNF-1 ;
  proj:objective "Observation" ;
  prov:used dap:Parkes-radio-telescope ;
.
dap:ATNF-1
  rdf:type proj:Project ;
  rdf:type prov:Activity ;
  rdfs:label "Australia Telescope National Facility" ;
  proj:hasSubActivity dap:P366-1 ;
.

Also see #77 #76 #128

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2018

See https://ec-jrc.github.io/datacite-to-dcat-ap/#alignment-issues-agent-roles as evidence of independent discovery of this pattern (thanks @andrea-perego

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2018

Mostly resolved by #312

The remaining issue is whether to on the proposed Project Vocabulary as an option for project descriptions - specializing prov:Activity.

The paragraph at the end of https://w3c.github.io/dxwg/dcat/#Property:dataset_wasgeneratedby mentions several existing project vocabularies, though only DOAP (for software projects) is cleanly documented, and none is linked to prov:Activity

@pwin

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

Are we going to provide a distinction between 'project' [defined by time, topic, budget, etc] and 'business as usual' [BAU] for the context of data-related activity? Both deliver data that will need cataloguing. The BAU has some characteristics described in the Project Vocabulary - name, objective, leader, sponsor etc, and also have activity structures, dependency structures, stakeholders, funding etc. They just don't operate in a 'project' manner.

@davebrowning

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

Mmm. The requirements (particularly ID49 talk about business context. Projects are one kind of context, it seems that examples which show other contexts (such as business as usual) would be useful and helpful to adopters. Other kinds of project could also be useful, I guess, but we do need to be a bit careful about too many examples...

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

I think we can tweak the wording, adding 'business as usual' or 'ongoing activity' to the list of kinds of activities.

And I think an ongoing-activity can be described as a prov:Activity with no prov:endedAtTime.

See #338

@larsgsvensson

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

@dr-shorthair scripsit:

And I think an ongoing-activity can be described as a prov:Activity with no prov:endedAtTime.

This is an interesting look at the nature of activities. I always thought that activities need to be completed in order to be recorded in prov... Is there any previous work on this?

@davebrowning

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

Mmm. I haven't had a chance to look for something public - not aware of anything off the top of my head. I would have thought others would have come across this.

If I recall correctly, where my colleagues are using this internally, prov:Activity is always bounded (i.e. has a prov:endedAtTime). We took the view that that our feeds are operate continually rather than continuously - a record at a time, so to speak - so the publication activity was quite granular, but finite. (There are weaknesses in this approach, in my view but it suited our specific use case). The business context on the other hand, isn't finite - rather, it's indefinite, with no known end time.

@larsgsvensson

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

Our use case (might be slightly OT for this thread) is when a human assigns a language code to a document. Is every assignment its own prov:Activity or can we see the continuous assignment of language codes as one prov:Activity going on since we started doing that?

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

In the real world we have 'projects' (or similar ongoing activities), within which there are more specific or atomic activities. The PROV model focuses on the latter - the atomic events associated with each specific output. Nevertheless there appears to be a quite common requirement to describe the bigger (project) context instead of (or in addition to) the atomic events. It's not that the latter don't exist conceptually of course, just that the appropriate level of detail for the application may not necessarily match the viewpoint that guided the development of the PROV model and OWL implementation.

IMO however the prov model still applies at the coarser level - all the properties of a prov:Activity are relevant to projects or ongoing activities. The key addition needed is an activity-nesting or -composition predicate.

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2018

So the specific answer to @larsgsvensson is 'both', conceptually at least. But practiclaly it might be the case that you only want to describe the overall process and not each individual sub-activity.

DCAT revision automation moved this from To do to Done Sep 25, 2018

@dr-shorthair dr-shorthair reopened this Sep 25, 2018

DCAT revision automation moved this from Done to In progress Sep 25, 2018

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2018

Should we close this issue?
Example here https://w3c.github.io/dxwg/dcat/#examples-dataset-provenance
Normative statements here https://w3c.github.io/dxwg/dcat/#Property:dataset_wasgeneratedby
(from #312, #338).

Or do we still intend to look at the proposed Project Ontology as a potential "Note"

@davebrowning

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2018

When we last looked at this I had the impression we could/wanted to do more (for example the continuous publication stuff), but its also true that that we can always do more... I do think we've addressed the requirement, and the other scenarios are tracked by actions here and here.

re: a potential "Note" - I have no strong opinion (though I do think the ontology is useful)

@dr-shorthair

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2018

So I put the note on the table more or less from the beginning of the DXWG.

Everyone who has looked at it seems to agree that it is useful, though it has attracted no formal reaction, positive or negative. I have not pushed it since we were clearly quite busy enough with the other things on our plate. But it could probably be finished up with 2-3 days work. So the question becomes: is there an appetite in the DXWG for another non-Rec-track deliverable? Should I raise a specific issue for this question, so we can decide one way or the other?

@agbeltran

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

@andrea-perego - I cannot access the link to the mapping that you referred here., is it available somewhere else?

Thanks, @dr-shorthair . I would also like to contribute some work we did to map DataCite to DCAT-AP, which includes the mapping of what in DataCite is called "Funding Reference". The mapping tables are here: https://webgate.ec.europa.eu/CITnet/stash/projects/ODCKAN/repos/datacite-to-dcat-ap/browse/documentation/Mappings.md

I think this is also one of the possible use cases for the use of qualified and non-qualified forms.
The non-qualified form is the basic case where you want to say that a given dataset has been created by a given project. However, if you also need to say that this is done in a given timeframe of the activity of a project you need to add a node in the graph, between "project" and "dataset", where to attach this information.

@dr-shorthair , in my understanding the vocabulary you contributed support both cases, right?

@agbeltran

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

So I put the note on the table more or less from the beginning of the DXWG.

Everyone who has looked at it seems to agree that it is useful, though it has attracted no formal reaction, positive or negative. I have not pushed it since we were clearly quite busy enough with the other things on our plate. But it could probably be finished up with 2-3 days work. So the question becomes: is there an appetite in the DXWG for another non-Rec-track deliverable? Should I raise a specific issue for this question, so we can decide one way or the other?

Given that projects (and related funding, see #66) are generic topics that go beyond DCAT, I think it is worth considering the Project ontology as another output of the WG, as it can be easily used within DCAT to cover the requirements (this on and #66). Not sure about the process, though, as strictly our purpose is the DCAT revision.

@agbeltran

This comment has been minimized.

Copy link
Member

commented Sep 27, 2018

As this issue needs more discussion, I'm moving it to the next milestone.

@andrea-perego

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2018

@andrea-perego - I cannot access the link to the mapping that you referred here., is it available somewhere else?

Sorry for the late reaction, @agbeltran . The work is now on GH (I just updated the link in the original comment):

https://ec-jrc.github.io/datacite-to-dcat-ap/

@davebrowning

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

This issue remains active, since there are a number of things we could do with either the project ontology or in providing additional examples that perhaps go beyond the common meaning of the word project. This would all be valueable but requires resource which won't be available in 3PWD timescales, so removing from this milestone and 'parking' it in 4PWD for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.