Action by software: identification of the software #38

mtrekels · 2020-06-24T14:12:41Z

If the property:type has value http://schema.org/SoftwareApplication

Is there a way to specify:

software used
version of the software (release number, revision number...)

Use-case: action is performed in an automated way (identified by artificial intelligence, recorded with camera trap...)

mtrekels · 2020-06-24T14:15:43Z

Similarly: do we need a property to identify the hardware used?

matdillen · 2020-06-24T15:34:52Z

Wouldn't we "just" need to point to an external ID for the observing hardware/software? That's what we would do if the subject was a person, when we want to include other subject metadata.

Of course, this begs the question where we would keep our persistent IDs for our camera traps.

mtrekels · 2020-06-25T08:07:26Z

This is mainly of an issue for the use-case of not having a persistent identifier available. The property 'name'/'verbatimName' is provided, but this covers mainly 'human' agents.

Extra question: a persistent identifier is not really 'human readable'. Do we want to include the possibility of a more readable representation of the agent?

matdillen · 2020-06-25T09:13:25Z

Well, in the absence of an identifier, the name/verbatimName combination can just as well be insufficient for a person as it can be for hardware or software.

The key problem in both cases is disambiguation. Common names likes James Smith or Zhang Wei will hardly be sufficient without additional information or a unique ID. Internal identifiers could be padded on to the name field, e.g. James Smith (3) or Zhang Wei_13, but that is only helpful as an opaque internal identifier.

I think we're running into the problem we're already trying to solve by strongly recommending the use of unique identifiers. Is it important that we come up with a solution in the absence of identifiers as well?

mtrekels · 2020-09-29T09:56:57Z

@dshorthouse we mentioned the issue of attribution of software at the TDWG workshop. One take-away message from the meeting, is the fact that by attributing the software, you actually attribute the 'developers' of the software/algorithm

deepreef · 2020-09-29T17:29:59Z

@mtrekels : A similar issue was raised during the Machine Observations session. When a robot (or some other automated machine) records an observation, who should get the attribution? The robot/machine as an "agent", or the engineers who designed the robot and its software? Does it/will it make a difference if AI algorithms are used to make decisions about when to record an observation? At what point does attribution pass from the developers of the software logic to the machine itself?

dshorthouse · 2020-09-29T17:33:38Z

Value judgments re: attribution aside, what I think we'll need in this context is how to unique identify the software just as we (mostly) have mechanisms to uniquely identify people. A GitHub repo? A Zenodo DOI? Other?

mtrekels · 2020-09-29T18:01:53Z

Thanks @deepreef and @dshorthouse for the remarks.

Maybe indeed it's not really very clear if software can be considered an agent. However, my personal feeling is that it's 'something' that performs an action. As such I think it fits in the definition.

With regards to what identifier to be used, in many cases this is probably of similar complexity as for people. Some will have a well defined (GitHub, Zenodo...). Others unfortunately not (some commercial softwares might be more difficult, but we/I need to think/search a bit deeper). However, I think that nowadays traceability of software is quite good.

wouteraddink · 2020-09-29T18:36:25Z

I also see an agent as something that performs an action, I think that is also how Prov-O defined it. So the software itself and not the makers of the software. That would get fuzzy quickly the more autonomous a system is. Of course the software doesn't mind whether it is attributed for this but it can be used for e.g assertions how trustworthy the result of the action is. And if we could uniquely identify the software and its makers, the makers could in turn receive attribution for the software being used. As Maarten wrote, there is currently not a single globally endorsed system in place for identifying software so it will be a challenge similar to other types of agents. Kind regards, Wouter Op di 29 sep. 2020 20:02 schreef Maarten Trekels <notifications@github.com>:

…

Thanks @deepreef <https://github.com/deepreef> and @dshorthouse <https://github.com/dshorthouse> for the remarks. Maybe indeed it's not really very clear if software can be considered an agent. However, my personal feeling is that it's 'something' that performs an action. As such I think it fits in the definition. With regards to what identifier to be used, in many cases this is probably of similar complexity as for people. Some will have a well defined (GitHub, Zenodo...). Others unfortunately not (some commercial softwares might be more difficult, but we/I need to think/search a bit deeper). However, I think that nowadays traceability of software is quite good. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADAUXX4Y54ZDXKBZ5AP5MLSIIOKDANCNFSM4OGWTZIA> .

deepreef · 2020-09-29T23:17:39Z

I agree with @mtrekels and @wouteraddink on this -- there really needs to be a way to cite a non-human (non-living?) entity as an agent for attribution purposes. But there is a subtle but potentially important issue related to this, especially if identifiers are to be minted, which has to do with establishing some sort of parity between an instance of human agent, and an instance of non-human/non-living agent. "Human" refers to a class of thing (≈Homo sapiens Linnaeus 1758), and individual persons are instances of that class. "Human" itself is a subclass of "living things" (and so on). So on the software (and hardware) side of Agents playing roles, what are the equivalencies? At first blush, I would think of "Electronic Equipment" and "Computer Software" are more or less congruent to "Living Thing". But how to define an "instance" of one of these to be comparable to an instance of a human (e.g., Richard Lawrence Pyle)? Would it be sufficient to document it as something like "Whizbang Organism Identifier Software version 3.7", or "Canon EOS R5 DLSR"? Or would it be better to somehow identify a specific instance/installation of those "subclasses" of software/hardware? (e.g., via individual serial numbers)

Most of this is philosophical, so may not be worth spending any time on. But there are some practical implications, and I would assume whatever system is proposed or adopted would be clear enough and well-defined enough to promote consistent implementation (and, maybe more importantly, consistent understanding of what an identifier minted for a non-human agent actually refers to).

matdillen · 2020-09-30T09:05:48Z

Persistent identifiers for software have been designed. Not sure how frequently these are used.

Do we need to indicate the nature of the agent (human, other animal, software, hardware...)? I think this falls under the authority of the resource from which we use the identifier. We may want to define what can possibly constitute an agent and what cannot, to avoid interoperability issues in the future.

danstowell · 2021-02-18T10:25:44Z

I agree with others here, that it's reasonable to treat an algorithmic agent as the entity to be attributed. (As well as the algorithm "designers", there's increasingly common the "training dataset", and indeed it gets muddy if we try to ignore the existence of the algorithm as a stable entity issuing assertions.)

I also have found Prov-O a good source, and I hope Prov-O will help us with the "indirect" attribution via a software agent to its creators.

I don't think TDWG should try to enumerate all the ways that an algorithm could be described. Name of software, version of software, would be fine, but presumably simply as string values. Beyond that, we should allow for some opaque/external way(s) to refer to algorithms, such as the "persistent identifiers" mentioned by Mat. I've not seen those identifiers before, but what I have seen often is the use of DOIs to refer to a specific software edition, in particular the use of Github-Zenodo's DOI service. Would DOI be one useful option? (I expect it wouldn't cover every case.)

Edit: re-reading the thread, I realise that DOIs are fine, and the more difficult issue is how to refer to software agents that don't have any such obvious identifier.

matdillen · 2021-02-18T12:36:26Z

Edit: re-reading the thread, I realise that DOIs are fine, and the more difficult issue is how to refer to software agents that don't have any such obvious identifier.

Yes, and I think the problem is similar for people without obvious identifiers. The solution is to strongly recommend unique identifiers and list good examples. For software, it is a bit more complex than people, because you can have different versions of the same software around at the same time, but only one version of a person.

The other question remains unanswered, I think: "Do we need to indicate the nature of the agent (human, other animal, software, hardware...)?" This is particularly relevant when we just have a string, and not a URI to a resource which can address this for us.

wouteraddink · 2021-02-18T12:48:15Z

For software as agent I think what you want to identify is the instance of the software, not (only) the software project and version. The instance is very comparable to a human, there is only one and it may have a unique configuration/set of settings. The lifespan of a software instance is much shorter than that of a human though.

pmergen · 2021-02-18T12:53:56Z

Hi

I am currently involved in another discussion group, on how to convey scientific knowledge to different audiences in the digital transformation world. They strongly advocate for having the best, most transparent and complete information on the source of the information. In this context it is probably important to have as "agent" the software or the algorithm shown. As for the "creator" of the algorithm, seems important to have to, while there are many branches (spin offs) created by the community.

As for crediting contributors to make the algorithm evolve that's also quite tricky as many can be involved to make it evolve, including the users themselves by using it ... so would not go down that road.

matdillen · 2021-02-18T13:11:21Z

I think crediting the authors of software or builders of hardware is not within the scope of Darwin Core. That is up to the resources we link to identifying the software/hardware.

Identifying a software version is much more straightforward than identifying a software instance, including configuration details, system specs, training datasets... While this is obviously interesting for the sake of repeatability, it also constitutes considerable overhead and goes further than where we are currently at with human agents. We don't qualify their state of mind at the time of their action, nor the physical context in which they performed it. When you make the analogy to human agents, you can also notice how there may be some privacy concerns.

danstowell · 2021-02-18T16:11:17Z

Over in the "DWC for biologging" group there's a closely related discussion tdwg/dwc-for-biologging#29 which includes one suggestion that basisOfRecord is the right way to distinguish between human and machine identifications. @matdillen would this answer the "Do we need to indicate the nature of the agent" question?

Perhaps if basisOfRecord:HumanObservation, the agent should be "people, groups, or organizations" (as in the current identifiedBy definition), whereas if basisOfRecord:MachineObservation, the agent should be an instance of some class/subclass representing a machine or an algorithm instance.

matdillen · 2021-02-18T17:37:01Z

Over in the "DWC for biologging" group there's a closely related discussion tdwg/dwc-for-biologging#29 which includes one suggestion that basisOfRecord is the right way to distinguish between human and machine identifications. @matdillen would this answer the "Do we need to indicate the nature of the agent" question?

Perhaps if basisOfRecord:HumanObservation, the agent should be "people, groups, or organizations" (as in the current identifiedBy definition), whereas if basisOfRecord:MachineObservation, the agent should be an instance of some class/subclass representing a machine or an algorithm instance.

When dealing with HumanObservation and MachineObservation this works, but not with specimens such as PreservedSpecimen. But that is a more fundamental problem with basisOfRecord when it comes to specimens (tdwg/dwc/issues/302). We also have no way to distinguish between human and nonhuman (or further, such as hardware vs software) for other actions than the observing/recording. We may need terms such as basisOfIdentification for that.

matdillen mentioned this issue Feb 18, 2021

Are software agents legitimate values for xBy terms? tdwg/dwc#318

Open

danstowell mentioned this issue Feb 18, 2021

What or who is the agent? tdwg/dwc-for-biologging#29

Open

ymgan mentioned this issue Feb 19, 2021

How to credit contributors in a sequencing project tdwg/gbwg#2

Open

ymgan mentioned this issue Mar 3, 2021

update binning software definition and expected values GenomicsStandardsConsortium/mixs#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action by software: identification of the software #38

Action by software: identification of the software #38

mtrekels commented Jun 24, 2020

mtrekels commented Jun 24, 2020

matdillen commented Jun 24, 2020

mtrekels commented Jun 25, 2020

matdillen commented Jun 25, 2020

mtrekels commented Sep 29, 2020

deepreef commented Sep 29, 2020

dshorthouse commented Sep 29, 2020

mtrekels commented Sep 29, 2020

wouteraddink commented Sep 29, 2020 via email

deepreef commented Sep 29, 2020 •

edited

Loading

matdillen commented Sep 30, 2020

danstowell commented Feb 18, 2021 •

edited

Loading

matdillen commented Feb 18, 2021

wouteraddink commented Feb 18, 2021

pmergen commented Feb 18, 2021

matdillen commented Feb 18, 2021

danstowell commented Feb 18, 2021 •

edited

Loading

matdillen commented Feb 18, 2021

Action by software: identification of the software #38

Action by software: identification of the software #38

Comments

mtrekels commented Jun 24, 2020

mtrekels commented Jun 24, 2020

matdillen commented Jun 24, 2020

mtrekels commented Jun 25, 2020

matdillen commented Jun 25, 2020

mtrekels commented Sep 29, 2020

deepreef commented Sep 29, 2020

dshorthouse commented Sep 29, 2020

mtrekels commented Sep 29, 2020

wouteraddink commented Sep 29, 2020 via email

deepreef commented Sep 29, 2020 • edited Loading

matdillen commented Sep 30, 2020

danstowell commented Feb 18, 2021 • edited Loading

matdillen commented Feb 18, 2021

wouteraddink commented Feb 18, 2021

pmergen commented Feb 18, 2021

matdillen commented Feb 18, 2021

danstowell commented Feb 18, 2021 • edited Loading

matdillen commented Feb 18, 2021

deepreef commented Sep 29, 2020 •

edited

Loading

danstowell commented Feb 18, 2021 •

edited

Loading

danstowell commented Feb 18, 2021 •

edited

Loading