Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change term - occurrenceStatus #339

Open
tucotuco opened this issue Apr 18, 2021 · 51 comments
Open

Change term - occurrenceStatus #339

tucotuco opened this issue Apr 18, 2021 · 51 comments

Comments

@tucotuco
Copy link
Member

tucotuco commented Apr 18, 2021

Change term

  • Submitter: John Wieczorek (following discussion initiated by Steve Baskauf @baskaufs - see below)
  • Justification (why is this change necessary?): Clarity
  • Proponents (who needs this change): Everyone

Current Term definition: https://dwc.tdwg.org/list/#dwc_occurrenceStatus

Proposed new attributes of the term:

  • Term name (in lowerCamelCase): occurrenceStatus
  • Organized in Class (e.g. Location, Taxon): Occurrence
  • Current definition of the term: A statement about the presence or absence of a Taxon at a Location.
  • Proposed new definition of the term: A statement about the presence or absence of an Organism within a bounded place and time.
  • Usage comments (recommendations regarding content, etc.): Recommended best practice is to use a controlled vocabulary consisting of the two distinct concepts "present" and "absent". This term is not apt for breeding status, for which the term reproductiveCondition should be used. This term is not apt for threat status, for which one might consider using the Species Distribution Extension (http://rs.gbif.org/extension/gbif/1.0/distribution.xml - not part of the Darwin Core standard).
  • Examples: present, absent
  • Refines (identifier of the broader term this term refines, if applicable): None
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): http://rs.tdwg.org/dwc/terms/version/occurrenceStatus-2017-10-06
  • ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG, if applicable): not in ABCD

Discussion leading up to this change proposal can be found in Issue #238.

@baskaufs
Copy link

To facilitate use with the controlled vocabulary proposed in #342, Change Usage comments to "Recommended best practice is to use controlled value strings from the controlled vocabulary designated for use with this term, listed at (insert vocabulary page here). This term is not apt..." to be consistent with other terms having controlled vocabularies.

Due to the simplicity of the CV, I would leave the examples so as to not require people who know English to follow the link to the CV page. However, note that this term exists in both the dwc: and dwciri: namespaces. For the dwciri: variant of the term, the examples will need to be changed to the term IRIs once they are assigned.

@baskaufs
Copy link

For those who wonder why defining a controlled vocabulary is necessary for a term as simple as this, it is only simple for those whose native language is English. The establishment of a formal controlled vocabulary provides the framework for linking labels and definitions in many languages to the controlled value terms. Establishing the controlled vocabulary has no effect on the recommended controlled value strings, which are identical to those already in common use.

@baskaufs
Copy link

Has an existing dwciri: analog, so use the same changed definition and usage comments, but substitute IRI values in the examples.

@nielsklazenga
Copy link
Member

I am sort of okay with the change in definition, given that an the Darwin Core definition of Organism can mean all members of a Taxon in a given area (and 'absent' only makes sense in that way), but I do not agree with the usage comments or the extremely restricted vocabulary (I will address that in #342). The comment that threat status is better dealt with in the Species Distribution Extension, while I agree with it, is neither here nor there, as the Species Distribution Extension goes with a Taxon Core and, when using a Taxon Core, occurrenceStatus is in the Species Distribution Extension too.

@tucotuco
Copy link
Member Author

tucotuco commented May 1, 2021

@nielsklazenga The vocabulary proposal here is only a formalization of what the precise vocabulary restriction was meant to be from inception. The term was only ever meant to be to distinguish presence from absence. I substituted Organism for Occurrence, because it isn't an Occurrence that was present (or absent), but more precisely an Organism was present, or any instance of an organism of a given Taxon was absent. The bottom line is that the vocabulary isn't really open for dispute unless you propose to change the semantics of the term. The usage comments were to nail down how to deal with things that people might confuse as occurrenceStatus.

@nielsklazenga
Copy link
Member

nielsklazenga commented May 1, 2021

@tucotuco, does not the definition of Organism in Darwin Core make any entry in a species checklist or flora etc. more or less an Occurrence? I agree that the terms related to breeding status and threat status, which were explicitly excluded in the usage comments should indeed not be in occurrenceStatus. However, terms like doubtful and excluded, which were in the previous GBIF Occurrence Status Vocabulary, are statements about the presence or absence of an Organism in an area and therefore fall within the definition of occurrenceStatus. excluded can be mapped to absent, although I think there is still value in having it in the vocabulary, but where does one go with doubtful?

@deepreef
Copy link

deepreef commented May 2, 2021

I commented on this here.

@tucotuco
Copy link
Member Author

tucotuco commented May 7, 2021

Sorry, but there is so much to catch up on here.

Discussion relevant to this proposal has developed in two distinct issues, this one and the proposal for the controlled vocabulary for occurrenceStatus. Though the two proposals are intimately related, they can be ratified, or not, independently. Even so, I would like to summarize the state of the proposals together here.

Is the proposed definition of occurrenceStatus satisfactory? Controversial
Current Definition: A statement about the presence or absence of a Taxon at a Location.
Proposed Definition: A statement about the presence or absence of an Organism within a bounded place and time.
Comments: This is the real crux of all issues that have been raised. The DwC Maintenance Group originally labelled this as a non-normative change, feeling that the change was to clarify the definition based on our understanding of the term. That understanding was based on the placement of the term within the Occurrence class and the thinking that went into the creation of the term to begin with. That thinking can be elaborated (beyond the actual normative definition) as
to make a definitive statement about whether any Organism given an Identification that posits it as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection using a samplingProtocol.
Only what is known need be included in the Occurrence record (just because you don't have an explicit samplingProtocol doesn't mean you can't say something about occurrenceStatus). The current definition lacks any such rigor. The proposed definition isn't actually much better, but at least it tries to disambiguate that a Taxon is not something that can be present while an Organism is.

To me (also beyond the normative definition), a Taxon is an organizational hypothesis supported by discriminative evidence using organisms or derivatives of organisms as evidence for its validity. We use those hypotheses to label and group Organisms based on acceptable similarities of characters. A hypothesis isn't something that can be present in the occurrenceStatus sense. What can be present (or absent) is evidence of an Organism that would be classified as a given Taxon.  Damned semantics!

The actual definition of a Darwin Core Taxon doesn't reflect my personal opinion, however. The current definition of Taxon looks like a closeMatch synonym of Organism:
A group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit.

I think this definition is an underlying source of a real problem, because it reflects the superficial way in which we talk about what a Taxon is. 

The rest of the controversy seems to be around the limits on the scope of an Organism. The definition of the Organism class includes no limit to scope,
A particular organism or defined group of organisms considered to be taxonomically homogeneous. 

The definition of the term organismScope, meant to say explicitly what level of aggregation we are talking about, also holds no further constraint,
A description of the kind of Organism instance. Can be used to indicate whether the Organism instance represents a discrete organism or if it represents a particular type of aggregation. 
Based on just the current definitions, it should be perfectly fine to have an Organism that includes every member of a Taxon at any scale, including the whole planet, but recommended practice would be to provide the scope explicitly.

An aside: To accept the entire continuum of scope as valid has a side benefit that it alleviates the arbitrary distinction of an Occurrence record from a record in a Checklist (of which there is no defined concept in Darwin Core, on purpose). The Checklist record indicates the accepted presence of at least one member of a Taxon within a bounded place and time, where the place and time, if provided, are usually captured in data set metadata or in a Species Distribution Extension. Because proponents of checklists treat them as a type of Taxon dataset for the purpose of sharing data, the illusion that occurrenceStatus can apply as easily to a Taxon as it does to any Organism that is a member of a Taxon arises.

Concretely, I suggest to amend the proposed definition of occurrenceStatus to:
A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection.

If this is an acceptable definition, the concept extirpated could be captured in an Occurrence record with a geographic scope defined by the values in the  terms of the Location class in that record, with the temporal scope defined in the values in the terms of the Event class along with the detection methods employed, with the taxonomic scope of the detection effort in the values of the terms in the Identification and Taxon classes, and with absent in occurrenceStatus.

The concept of doubt as another vocabulary term is controversial in discussions. This concept (along with all others that are not present or absent) arose from the GBIF vocabulary for occurrenceStatus, which is a GBIF artifact, not generated from community activity through TDWG. You may be interested to know that there are a total of 19980 Occurrence records with the value 'doubtful' in all of GBIF (as of 2021-01-12), and all of these are from one dataset, the identity of which is being withheld to protect the innocent. You may also be interested to know that there are 3073 case-insensitive distinct values of occurrenceStatus in that same data snapshot.

There are many ways in which the specific nature of doubt can be expressed independent of a statement about presence or absence. The nature or source of the doubt would be lost if all that was left of it was the value doubtful in occurrenceStatus. For that reason, the label doubtful is probably a very poor choice. Here are some examples of ways to capture explicit sources of uncertainty about an Occurrence:

  • Occurrences that are doubtful because of the Identification should capture that uncertainty in some combination of identificationQualifier, identificationVerificationStatus, and identificationRemarks. The Organism was present, but the Identification is the thing that is doubtful, not its presence. 

  • Occurrences that are doubtful because of uncertainty about the Location should capture that uncertainty in some combination of the Location terms to reflect the geographic scope within which the presence or absence was asserted.

  • Occurrences that are doubtful because of uncertainty about the timing should capture the temporal uncertainty with beginning and ending bounds in the Event terms, as should parameters that affect detectability(such as samplingEffort, sampleSizeValue, sampleSizeUnit, samplingProtocol).

Independent doubt about presence is a strange concept. If I don't know if I detected something or not, why am I bothering to say so? What knowledge am I imparting? I think the concept that is missing is not doubt at all, but rather a measure of certainty (likelihood). I can see that it might be useful to know that it is unlikely that something was at a given place and time, especially for large scales of place and time, because it can impart broadly useful expert knowledge. We don't have another way to capture that in Darwin Core that isn't essentially an occurrenceRemark or an interpretation of the results of a samplingProtocol and its measurement results.

I don't see categories of likelihoods as good candidates for values in a controlled vocabulary. No one would really know what they mean. It seems more the stuff of summary analysis and would be a confounding value for occurrenceStatus. I urge that this concept be captured in a separate term, if needed, just as other kinds of uncertainties are captured in Darwin Core.

On to the rest of the summary...
Does the proposal to have a controlled vocabulary for occurrenceStatus hold merit? 'Yes'

Should the values 'present' and 'absent' be in the controlled vocabulary? 'Yes'

Is the proposed definition of present satisfactory? 'No' 
Comment: The definition will have to be in accord with the final accepted definition of occurrenceStatus. A generic definition that will almost certainly work regardless of other concerns is something like the following:
the target was detected within the given bounds of place and time

Is the proposed definition of absent satisfactory? 'No' 
Comment: The definition will have to be in accord with the final accepted definition of occurrenceStatus. A generic definition that will almost certainly work regardless of other concerns is something like the following:
the target was not detected within the given bounds of place and time

Should there be additional values in the controlled vocabulary? Controversial
Comment: So far there is no consensus on the inclusion of any terms other than present and absent.

@deepreef
Copy link

deepreef commented May 7, 2021

Many thanks, @tucotuco. One note of protocol for epic posts: always best to warn @timrobertson100 to go get a cup of tea up at the top.

I like how you addressed the relationship between Taxon and Organism as effectively representing alternate ends of the same spectrum.

The definition of the Organism class includes no limit to scope

I agree this is true in principle, but maybe not so much in practice. We wrestled with this when Organism was first proposed (specifically whether it extended to "population", and I think it does require more wrestling. To me, the distinction ought to be: when an aggregate Organism (i.e., more than one individual) is broad enough that a taxonomist has taken the trouble to assign a formal name to it, then it crosses into the ream of Taxon.

Based on just the current definitions, it should be perfectly fine to have an Organism that includes every member of a Taxon at any scale, including the whole planet, but recommended practice would be to provide the scope explicitly.

Hence my earlier comment that we should treat the scope of an Organism as extending from a single individual through various degrees of aggregates up to but not exceeding (or even including?) something that can be referenced by a formal taxonomic name.

That aside...

...an Organism within a bounded place and time

You addressed the Taxon-->Organism change (which I wholeheartedly agree with), but I wonder if you could comment whether the Location-->"bounded place and time" change might be better solved with a change of Location-->Event. The crux of this (discussed on other issues here) is whether an Event can be defined as a "bounded place and time"; or perhaps more accurately, "an action that is bounded in place and time"?

@tucotuco
Copy link
Member Author

tucotuco commented May 7, 2021

To clarify, the proposal when the public review opened was:

A statement about the presence or absence of an Organism within a bounded place and time.

which, based on discussion so far, I proposed to amend to the following:

A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection.

I maintain that a Taxon is a theoretical construct and an Organism is not, so that there is no way to have a continuum from one to another and no issue about where to draw the line in that respect. With respect to the scope of Organism, however, I don't feel there is a defensible line that can be drawn, nor can I imagine what is to be gained by making one. I guess I am not very creative. Clearly if I was a taxonomist I would be a lumper.

@nielsklazenga
Copy link
Member

nielsklazenga commented May 7, 2021

Thanks @tucotuco , I am a taxonomist and I have no problem at all with your characterisation of a Taxon. Our disagreement is in whether or not there can be an occurrence (of an Organism/Taxon, does not matter) independent of an (dwc:) Event.

occurrenceStatus has been successfully used as long as Darwin Core has existed in statements about the distribution of organisms, most notably in Darwin Core Archives with the GBIF Species Distribution Extension in combination with a Taxon Core. Even though we might want to replace the Taxon Core with something else, the distribution statements and hence this particular usage of occurrenceStatus, will always be an important part of Species Description data.

The last proposed definition:

A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection.

..., which I agree reflects one side of the discussion, mainly in issue #342, feels to me less like clarification and more like hijacking of a term that works perfectly well elsewhere (and does not need extra clarity) for a different purpose.

@tucotuco
Copy link
Member Author

tucotuco commented May 7, 2021

Thanks @tucotuco , I am a taxonomist and I have no problem at all with your characterisation of a Taxon. Our disagreement is in whether or not there can be an occurrence (of an Organism/Taxon, does not matter) independent of an (dwc:) Event.

occurrenceStatus has been successfully used as long as Darwin Core has existed in statements about the distribution of organisms, most notably in Darwin Core Archives with the GBIF Species Distribution Extension in combination with a Taxon Core. Even though we might want to replace the Taxon Core with something else, the distribution statements and hence this particular usage of occurrenceStatus, will always be an important part of Species Description data.

The last proposed definition:

A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection.

..., which I agree reflects one side of the discussion, mainly in issue #342, feels to me less like clarification and more like hijacking of a term that works perfectly well elsewhere (and does not need extra clarity) for a different purpose.

What worries me (deeply) is precisely that "for a different purpose". If it is really a different purpose, then the semantics must necessarily be different, and as soon as we go into a semantic framework the term will break down. The different purpose has different semantics and should have a distinct term.

@deepreef
Copy link

deepreef commented May 8, 2021

@tucotuco :
I'm trying to reconcile this statement:

I maintain that a Taxon is a theoretical construct and an Organism is not, so that there is no way to have a continuum from one to another and no issue about where to draw the line in that respect.

With this statement:

Based on just the current definitions, it should be perfectly fine to have an Organism that includes every member of a Taxon at any scale, including the whole planet, but recommended practice would be to provide the scope explicitly.

I certainly agree with the latter, but I guess the problem I have with the former statement is that ultimately "wolf pack" and "taxon" are at two different ends of a continuum of "organisms implicitly or explicitly included within a circumscribed set". The main difference is that an Organism becomes a Taxon as soon as the circumscribed set has been labelled with a scientificName.

Perhaps your thinking is that Organism is understood to be limited to "explicitly included" organisms, and Taxon essentially always extends to "implicitly included" organisms. If that's what you mean between "theoretical construct" (Taxon) and "not" theoretical construct (Organism), then I guess I'm tracking your meaning. However, that would effectively eliminate "population" as representing an instance of Organism (let alone every member of a Taxon across the whole planet -- which would be extremely hard to explicitly enumerate except in the most endangered of species).

I have no problem using instances of Occurrence to effectively represent assertions of Taxon-at-Location within some range of time (which to me means Taxon-at-Event), provided that such Occurrence instances are implied to mean "one or more Organisms identified as this Taxon occurred at this Location within this window of time". I see that as an easy use-case to handle with Occurrence instances (although I agree with you that another term besides occurrenceStatus should be proposed to accommodate some statement of certainty about the existence of the asserted Occurrence).

The problem I have is in representing a statement like "this taxon is introduced in Hawaii". Such a statement is bounded in space ("Hawaii"). It's also bounded in time (i.e., the time range starting when the first organism of that taxon occurred in Hawaii, and ending at the time when the assertion is made). But the trouble I have is how to represent the Organism. If, as I believe you are suggesting, an instance of Organism can be defined as "every individual of a specific Taxon that has ever occurred in Hawaii", then it fits nicely into my understanding of Occurrence as Organism-at-Event.

I guess there are several questions in play here; namely how, using dwc terms, could one represent statements such as:

  • "Reports of organisms identifiable to this Taxon in Hawaii are doubtful."
  • "Organisms identifiable to this Taxon in Hawaii are non-native."
  • Any other statement that hopes to represent Occurrence instances in the form of statements of Taxon-at-Location[within a specified timeframe].

@nielsklazenga
Copy link
Member

Hi @deepreef , I think this goes with something I have seen you wrote in another issue that taxa cannot be observed and what I said that an Organism will never become a Taxon. A taxon is a human construct that you superimpose over the organisms that are observed. The same organism can, to different observers, belong to different taxa and the same taxon name can be used for different Organisms with an organism scope of 'taxon'.

With populations you do not have this problem, even if people will have different opinions on what populations are, as they do not have the taxonomic baggage.

@nielsklazenga
Copy link
Member

nielsklazenga commented May 8, 2021

@tucotuco:

What worries me (deeply) is precisely that "for a different purpose". If it is really a different purpose, then the semantics must necessarily be different, and as soon as we go into a semantic framework the term will break down. The different purpose has different semantics and should have a distinct term.

That is actually exactly the point I was trying to make. I think both use case fit the semantics of occurrenceStatus as it is now. That last proposed definition that makes an occurrence dependent on an Event, the vocabulary in #342 that does not allow more than half of the terms that are currently in use, and the proposal/suggestion to change the meaning of 'present' and 'absent' amount to a significant change to the semantics of occurreneStatus. Semantics aside, the proposed "clarifications" break current usage of the term.

I have never said that occurrenceStatus cannot be used for presence/absence data, as I do not really understand the nature of the absence data people are talking about. All am I saying is that the term is already used for other things, so you should be careful with fitting the term to a particular use case, as you'll leave other use cases out in the cold. If the semantics of a term already fits your use case, you should not need to change the definition. On the other hand, other people have said that occurrenceStatus should not be used for species distributions.

I do not mind the change from 'Taxon' to 'Organism' in the original proposal. I do not think it is necessary, but I can see where it is coming from. I would also like a controlled vocabulary on ocurrenceStatus, but it does matter what is in the vocabulary and having no vocabulary is better than the vocabulary in #342, which only allows 'present' and 'absent'. So, if people cannot agree to add the extra terms I suggested, I think we should stick a pin in it. It is probably not such a good idea to write a vocabulary during public review anyway.

I also think we can improve the definitions of 'present' and 'absent', but I suggest we go with the dictionary definitions rather than something that skewes the meaning toward some use cases at the expense of others.

@deepreef
Copy link

deepreef commented May 8, 2021

With populations you do not have this problem, even if people will have different opinions on what populations are, as they do not have the taxonomic baggage.

Perhaps, but it's still a continuum along a scale of "sets of circumscribed organisms", with the only difference being whether or not a scientificName has been used to label the implied circumscription. Every piece of taxonomic baggage (except formal Linnean nomenclature) applies to populations the same as to taxa. We just don't think of it that way because we don't have formal labels for populations that are use as widely as scientific names are used.

@nielsklazenga
Copy link
Member

nielsklazenga commented May 8, 2021

@deepreef , yes exactly. It is never going to be easy.

@deepreef
Copy link

Now that I've had several showers, traffic jams, and ceiling-staring sessions to contemplate some of the points raised by @tucotuco, I'd like to explore this statement a bit more:

Based on just the current definitions, it should be perfectly fine to have an Organism that includes every member of a Taxon at any scale, including the whole planet, but recommended practice would be to provide the scope explicitly.

As noted here and elsewhere, I'm on board with the "every member of a Taxon and any scale" part. What I've been thinking more about is the "recommended practice would be to provide the scope explicitly" part. Specifically, I'm wondering whether we might want to clarify the definition, comments and examples for organismScope.

I actually think the definition is fine as is, so perhaps all that would be needed is (non-normative?) alterations to the Comments and examples. In the Comments, I think the statement "This term is not intended to be used to specify a type of taxon." is correct, but potentially could be misinterpreted. Perhaps something more like this would be better:

  • The scope of an Organism may include every member of a Taxon at any scale (including the whole planet), but term is not intended to be used to specify a type of taxon.

Also, perhaps another example or two could help clarify what sorts of terms might be included on a controlled vocabulary. At the very least I would add "population".

I'm also a little uneasy with the inclusion of "multicellular organism", "virus" and "clone" in the list of Examples. I'm sure there was some rationale/use-case for including those terms (it's entirely possible I either proposed or actively supported their inclusion -- I can't remember), but they seem a little out of place in the context of the definition, and aren't mutually exclusive (e.g., could not an Organism be both a multicellular organism and a clone at the same time?)

I'm not sure if this warrants a new Change Term issue.

@tucotuco tucotuco added the Controversial The solution for the issue has not reached a consensus. label May 20, 2021
@tucotuco
Copy link
Member Author

tucotuco commented May 20, 2021

This proposal has been labelled as controversial. If no evidence of consensus can be reached by the 30-day minimum review period, the proposal will be deferred for later consideration. If there is evidence that a consensus can be reached, the review period will be extended for an additional 30 days from the time apparent consensus is established (everyone participating in the discussion expresses their satisfaction with the proposed solution).

@tucotuco
Copy link
Member Author

tucotuco commented May 20, 2021

I would like to try to summarize the controversy. I maintain that an Organism and a Taxon are conceptually distinct. One is the manifestation of a theory and the other a manifestation of biological entities. There is no continuum leading from one into the other and one is not semantically a subtype of the other, because theories are not living beings. Stated in another way, the attributes of a Taxon do not apply to an Organism, nor vice versa.

It isn't clear that there is agreement on this much. But there is more. Even if everyone accepts that the two classes are semantically distinct, I am of the opinion that the distinctness means that the term should not be a property of both classes. Granted, in Darwin Core we do not make the formal assignment of properties to Classes, we merely annotate the properties to be "organized in" a class. This was done on purpose in the absence of a rigorous community-wide conceptual schema for the Classes we manifest in various contexts. For Darwin Core, the conceptual schema was expected to be a separate exercise, after which formal assignments of properties to Classes could be made in the future. That future still hasn't arrived.

Thus, whereas nothing in Darwin Core prevents a term from being used as if it were a property of any Class, nor of it being assigned to no class at all (see Record-level terms), it is not a good way to prepare for a semantically rigorous future. Herein lies my principle objection. It isn't clear if there is any agreement about my position on this. If there is, it suggests that two distinct terms are necessary - one for Occurrence and one for Taxon.

Even if there is agreement on that last point we have contention built off the legacy built around the term to date. The name of the term and its organizational placement in Occurrence suggest that the term should apply to Occurrences only. That was indeed the (only) original purpose of the term, and the reason for the two original examples only. But the original definition is at odds with all the foregoing, being guilty of laziness in the use of the word Taxon and therefore opening the door for it to be used in a different way. Since the door was open, it was used in a different way. It was incorporated into the Species Distribution Extension, but it was given an entirely different definition (Term describing the status of the organism in the given area based on how frequently the species occurs.) and even a controlled vocabulary, none of which was done following the standards process in which Darwin Core is developed. This was a regrettable development brought on by a poor definition on the part of the term, and an interpretation of the term in a different context without reconciliation in the context of the standard. In this proposal, we are trying to correct the problem with the definition of the term in the standard.

For my part, I would be perfectly happy to help with the correction of downstream problems that arose with the Species Distribution Extension re-defining the term for its own purposes, but I think the only reasonable way forward with that is to use a different term in that extension, whether or not that term (or its recommended controlled vocabulary) is also incorporated into Darwin Core.

Though I feel strongly about all of this, I recognize my part in creating the problem to begin with. I also recognize that I have to put my role as convenor and mediator before my role as a stakeholder in the community, so if I am the only one that has the viewpoints I have expressed, I am happy to sequester my objections from consideration of achieving consensus and let the proposal move forward without them. Even so, I am not sure we have achieved a clear consensus. Thoughts?

@albenson-usgs
Copy link

I would like to put forward my agreement with what @tucotuco has laid out and voice my support for occurrenceStatus being strictly related to occurrences, Organism and Taxon being conceptually distinct, and the definition of occurrenceStatus being amended to A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection. Given this, I agree that the best path forward for the Species Distribution Extension would be adoption of a different term in that extension.

@deepreef
Copy link

I would like to try to summarize the controversy. I maintain that an Organism and a Taxon are conceptually distinct. One is the manifestation of a theory and the other a manifestation of biological entities. There is no continuum leading from one into the other and one is not semantically a subtype of the other, because theories are not living beings. Stated in another way, the attributes of a Taxon do not apply to an Organism, nor vice versa.

OK, I could quibble with this, but only in an effort to play Devil's advocate (to wit, both Organism and Taxon classes could be construed as conceptually identical in the sense of "circumscribed set of one or more living things"; the only real difference being the formality of the label). HOWEVER, in the context of DwC (not to mention "common sense"), I am in full support of what @tucotuco asserts above. My only concern is that a single instance of Organism can be scoped up to (and including) "every organism of a particular Taxon that ever has or ever will exist". In other words, the only upper bound to the scope of an instance of Organism is that all members remain "taxonomically homogeneous".

It isn't clear that there is agreement on this much.

I'm pretty sure I'm the only one who floated the idea that Organism and Taxon could be interpreted as existing on the same continuum, and if so, then my comments above should dispense with any perceived disagreement.

But there is more. Even if everyone accepts that the two classes are semantically distinct, I am of the opinion that the distinctness means that the term should not be a property of both classes.

I completely agree with this opinion.

If there is, it suggests that two distinct terms are necessary - one for Occurrence and one for Taxon.

I'm still unclear on why we need something like this to be a property of a Taxon instance. As long as we can define an Organism to include up to (and including) every member of a Taxon (as pointed out above), then we shouldn't have any problem capturing what most people think of as "Taxon-at-Location" instances in the form of Occurrence instances (which I view as Organism-at-Event instances; but could also be represented as Organism-at-Location instances). So, unless I've misunderstood previous comments (@nielsklazenga ?), I think there is consensus that occurrenceStatus can remain as a property within the Occurrence class (and not within the Taxon class).

Thoughts?

Like @albenson-usgs, I'm in agreement with your main points, and am likewise OK with A statement about whether any Organism identified as being a member of a given Taxon was detected during an Event that had any members of that Taxon as a target of detection. as the definition for dwc:organismStatus. I'm a little queasy about the that had any members of that Taxon as a target of detection part. I understand what it's intention is (i.e., to clarify the meaning/context of an "absence"), but read literally it implies that you can't have a value of "present" for any Organisms identified to a Taxon that were not among the targets of detection. Many are the times when I record an Organism as present at an Event, when I had no expectation of it occurring there/then (and, hence, could not have been the "target" of detection). Such is the fodder for new records...

@timrobertson100
Copy link
Member

timrobertson100 commented May 22, 2021

I think we need to be pragmatic at this point. Using GBIF as an example:

Occurrence record use

  1. 6,348 datasets covering 1,150,021,961 records populate this field with a value
  2. All records are interpreted to present/absent only (inferred from other metadata when necessary) following the latest GBIF vocabulary

Species distribution extension use

  1. 352 dataset use distribution extensions with occurrence_status populated (covering 3,859,858 name_usage)
  2. 38 datasets use distributions with occurrence_status populated with values that are not PRESENT/ABSENT covering (covering 89,036 name_usage).

The values are according to the GBIF vocabulary from 2009 and populated as:

 occurrence_status | name_usages  
-------------------+------------
 PRESENT           |  3757777
 COMMON            |     3399
 RARE              |     3019
 IRREGULAR         |     7619
 DOUBTFUL          |    69735
 EXCLUDED          |     5264
 ABSENT            |    13045
 NULL              |  1022671

For pragmatic reasons (it's been used for a decade), GBIF continues to use occurrenceStatus as the field name in the species distribution extension, but call the vocabulary distributionStatus. When a new edition is created, distributionStatus will be considered as the term name. This decision was taken to 1) avoid breaking existing datasets, 2) because the GBIF checklist indexing is maintained but not significantly developed at the moment, as we focus on the Catalogue of Life infrastructure (i.e. resource constraints).

Suggestion for consideration

Maintain the existing phrasing of "taxon at a location" as it covers both Occurrence and is equally suitable for a species distribution use.

A pragmatic solution, allowing Darwin Core to be as reusable as possible could be to refine the usage comments to say:

Recommended best practice is to use a controlled vocabulary. When using this with an Occurrence the default vocabulary is recommended to be "present/absent" but can be extended by implementers with good justification.

@tucotuco
Copy link
Member Author

Though the pragmatic solution proposed by @timrobertson100 takes occurrenceStatus in the exact opposite direction to the proposed changes in this issue, it would be a non-normative change with no bearing on existing implementations, and could be adopted instead of the original proposal without need for public review.

@deepreef
Copy link

If all are fine with the non-normative change proposed by @timrobertson100, I can live with that solution as well.

However, I am very strongly in favor of the original direction @tucotuco had been pushing this. I desperately hope we (TDWG community) are moving in the direction of @tucotuco's "bigger dream", and that following the proposal of @timrobertson100 now is understood to be only a pragmatic solution to simplify the path immediately in front of us, and that this issue will need to be resolved more robustly in the (not-too-distant) future. Moreover, I hope that this pragmatic solution does not preclude @tucotuco's suggestion for (and my endorsement of) forming a task group to come up with a more appropriate and robust solution.

That's my short response.

Much of my long response was already written, but lost when I closed the browser before clicking the "Comment" button yesterday. As I indicated in my previous post, a lot of it relates to the idea of evidence-based checklists, and occurrences being the inversion of existing taxonomic checklists.

At the beginning of DwC, the predecessor of what is now called Occurrence was essentially the concept of specimen-in-collection (i.e., the original point was to be able to share and aggregate data about specimens in museum collections in a standardized and consistent way). At some stage early on in the formation of the standard, the community decided the lowest-hanging fruit of scientific interest represented by museum specimens was as a source of information for Taxon-at-Location. Once that bridge had been crossed (and perhaps also the main motivation for crossing it), additional records of unvouchered instances of Taxon-at-Location in the form of Observations came into scope, and the notion of Occurrence was formalized.

That transformation from specimen-in-collection to Taxon-at-Location was a fundamental and important step forward for DwC. However, it was only one step on a longer journey. As a community, I'd like to think that we're ready to make progress on the next step, which is an absolute prerequisite to realizing @tucotuco 's (and my, and many others') "bigger dream" (DwC as a true information model for biodiversity that can serve a much broader array of scientific uses). That next step is to formalize the transition of Occurrence from Taxon-at-Location to Organism-at-Event.

This discussion had really given me hope that we had achieved critical mass to take that step. But perhaps we're not quite there yet. If we can follow through with a Task group (perhaps integrated with the proposed task group for #314), then I have at least some hope for keeping the dream alive.

@tucotuco : This issue might not have broken the camel's back (yet), but I hope at least the camel is in serious need of the services of a chiropractor.

@nielsklazenga
Copy link
Member

Just to make myself really clear, I just want a property that I can use on taxon-area statements; it does not have to be occurrenceStatus. I totally accept that, if there are going to be two terms, occurrenceStatus goes with the presence/absence observations, which is the way it has been used much more frequently, including by myself. I would be happy with the distributionStatus mentioned by @timrobertson100.

There is an analogy with the "establishment means" terms here as well, as establishmentMeans as it is used now might better be called 'originStatus', whereas 'establishmentMeans' would better go with vector (this was indicated in the proposal at the time, so not some other problem I "discovered").

I think this is still a simple choice between one or two terms. I fail to see how the data model comes into it, or how anything @timrobertson100 suggested goes against the "bigger dream" of "Darwin core as a true information model for biodiversity", although I am not sure what 'true' means here.

@deepreef
Copy link

I fail to see how the data model comes into it, or how anything @timrobertson100 suggested goes against the "bigger dream" of "Darwin core as a true information model for biodiversity"

Not to belabor, but what I meant was that we come to a clear agreement on what actual entity is meant for an instance of the Occurrence class. At the moment, we do not have a clean working definition of what Occurrence instances represent (the definition per se is not so much the problem, but rather how records are actually presented by content providers). Some content providers treat each unique instance of an Occurrence as a "specimen" (=instance of MaterialSample). Some treat it as an instance of a Taxon-at-Location (e.g., "taxon-area"). Some treat it as an instance of Organism-at-Location. Some treat it as an instance of Organism-at-Event. While this might seem like semantic technicalities, semantics matter (especially for the semantic web).

This sort of imprecision is tolerable so long as DwC is a "bag of terms" that are merely "organized" in classes. What I meant by the "bigger dream" of "Darwin core as a true information model for biodiversity" is that each DwC class can achieve a precise definition in the context of an information model, and each term can be clearly mapped as a property of instances in exactly one of those precisely-defined classes.

For example, right now, a unique occurrenceID value might identify an instance representing unique combination of taxonID+locationID, or it might might identify an instance representing unique combination of organismID+locationID, or it might identify an instance representing unique combination of organismID+eventID, or it might represent some sort of "evidence" supporting any one of those things.

That degree of ambiguity in the conceptual entity represented by an instance uniquely identified by occurrenceID is one of the things that currently limits the power of information shared through the DwC standard.

@nielsklazenga
Copy link
Member

I get that we need to distinguish between the primary occurrence data and the distribution data that is inferred from it, but what has it got to do with occurrenceStatus?

@deepreef
Copy link

I get that we need to distinguish between the primary occurrence data and the distribution data that is inferred from it, but what has it got to do with occurrenceStatus?

From my perspective: it's not helpful to have a single term serve to represent a property of more than one class of "thing". As @tucotuco outlined, this particular term has a mixed interpretation (both through its definition history and its actual usage history) for application to both "primary" Occurrence data and taxon distribution data. I think the goal would be to clarify its definition and usage to be more consistent to a single purpose.

@nielsklazenga
Copy link
Member

I have already said that I am fine with two terms. My understanding of the nature of absence data is not enough to argue one way or another. It seems to me though that some people who want to make the change are uncomfortable with it. You cannot bring the data model into it though, as then you would have to do the same for several other terms.

Some content providers treat each unique instance of an Occurrence as a "specimen" (=instance of MaterialSample)

A "specimen" is an instance of PreservedSpecimen and/or Occurrence, not MaterialSample (see ABCD equivalent in https://dwc.tdwg.org/list/#dwc_Occurrence). It is not occurrenceStatus that is going to break the camel's back, but MaterialSample.

From my perspective: it's not helpful to have a single term serve to represent a property of more than one class of "thing".

That is diametrically opposite to the Standard Maintenance Specification, as well as circular, as you can always create a superclass that a property uniquely applies to.

@deepreef
Copy link

I guess we'll just have to agree to disagree on some of these things. But I definitely agree with this:

It is not occurrenceStatus that is going to break the camel's back, but MaterialSample.

Keeping in mind, of course, that the metaphorical broken camel's back is a good thing (i.e., progress...), in this context.

@mdoering
Copy link
Contributor

Hoping not to cause yet more trouble, may I ask how an absence occurrence should be interpreted if it contains further properties than just a location, time and "subject"?

If the Occurrence specifies a chicken with sex=female, basisOfRecord=PreservedSpecimen. Does that mean there can be male chicken specimens around? It would make absence data very hard to use. Should that be something to add to comments?

@nielsklazenga
Copy link
Member

PreservedSpecimens are always present. A female chicken might be absent if it has an organismID. There might even be other female chickens around.

@deepreef
Copy link

Just to elaborate a bit, the underlying assumption is that when an instance of PreservedSpecimen is represented (inappropriately, in my view, but we'll break that camel's back another time) as an instance of Occurrence, the implication is that the Occurrence is tied to the Event at which the associated Organism (or derivative of an Organism) was extracted from nature. As such, I completely agree with @nielsklazenga that a PreservedSpecimen always implies present in the context of an Occurrence, and any records that combine basisOfRecord=PreservedSpecimen with occurrenceStatus=absent are logically inconsistent (and should be flagged as such).

I also fully agree that a female chicken could be absent, either as a specific Organism, or in the general sense ("we only saw male chickens"). One slight twist to this, though: I always associate sex as a property of an Occurrence, not as an Organism, because so many organisms change sex throughout their lifetime. Thus, even though birds do not change sex, I still wouldn't associate sex=female with any specific organismID. But that's a subtle issue outside the context of the question that @mdoering posed.

@deepreef
Copy link

deepreef commented May 25, 2021

@timrobertson100 removed duplicate post

1 similar comment
@deepreef
Copy link

deepreef commented May 25, 2021

@timrobertson100 removed duplicate post

@tucotuco
Copy link
Member Author

...and one more minor elaboration. A preserved specimen that is no longer in a collection can be indicated using the term disposition. That is a different kind of absence, not in the Occurrence, but in the material in a physical collection.

@deepreef
Copy link

That is a different kind of absence, not in the Occurrence, but in the material in a physical collection.

So... before I had a better idea of the distinction between MaterialSample and Organism, we actually went through an experimental phase where we tracked the physical/temporal location of specimens using Occurrence records. We start with the idea of an Organism in nature, encountered by another Organism in nature (two Occurrences at the same Event). In the vast majority of cases, one of those Organism instances is reliably identifiable to Homo sapiens sec. Wilson & Reeder 2005 (our go-to accordingTo for that particular species). That Organism performs some sort of action that extracts (and usually leads to the demise of) the other Organism, at that particular Occurrence instance. This extracted Organism then participates in another Occurrence instance, at the time and place where the human Organism prepares and photographs the extracted Organism (e.g., back at the field station). As the extracted Organism continues to move through space and time (arriving at a Museum, being unpacked and curated, stored in a jar on a shelf in a room of a building somewhere, later taken off the shelf and moved to a laboratory for examination or subsequent imaging or whatever, then perhaps transported to a different Museum on loan, or being subsampled, etc., etc., etc.) ... each notable moment in the space-time trajectory of that extracted Organism is documented in the form of an Occurrence instance.

The concept actually worked extremely elegantly as a way of tracking a specimen through space and time -- no different than, say, a satellite tag affixed to a cougar or whale or shark or something. Each Organism accumulates potentially dozens of Occurrence instances, only one of which represents the point of extraction from nature (which most DwC content providers are focused on as the only Occurrence of note for the Organism). In this context, one could indeed score the "absence" of said Organism within a collection (e.g., when an inventory is done and the specimen is missing or disintegrated).

Both of these weird ways of looking at things:

  1. the "collector" and the "collected" representing two Occurrence instances at the same Event, the former performing the action of collecting, and the latter performing the action of being collected; and
  2. Tracking a PreservedSpecimen through space and time using Occurrence instances

dramatically simplified the data model and provided some very cool ways of parsing and representing patterns in the data.

However, of the two, I think the first still has merit, but the second breaks down when we clarify the distinction between Organism and MaterialSample.

Again, I digress...

@timrobertson100
Copy link
Member

timrobertson100 commented May 26, 2021

We see two cases where basisOfRecord = preservedSpecimen and occurrenceStatus = absent are combined in the GBIF publishing community.

  1. Where the remains of a species are collected (e.g. bones)
  2. Where environmental material (e.g. water, soil) is preserved as the evidence supporting the claim that the species does not exist there (e.g. some sequencing and clustering)

materialSample is likely a better option for 2. but due to the loose nature of DwC documentation it's understandable that someone assumes specimen may refer to environmental material.

Should that be something to add to comments?

I agree with @mdoering that expanding the comments to guide users when combining other terms is needed. We've identified disposition, basisOfRecord, trait related terms (sex, reproductiveCondition and we need to add lifeStage). I'd suggest we also cover those relating to measurements (e.g. individualCount), behaviour and occurrenceRemarks which could lead to confusing records.

(Aside - I feel much of the discussion on this thread belongs elsewhere to attract the visibility it deserves. There are good remarks that will help DwC but this thread should focus on comments, and ideally proposals, that help refine occurrenceStatus only)

@albenson-usgs
Copy link

For 1 the How did it die? Task Group and a possible new term vitality may help? tdwg/how-did-it-die#1.

this thread belongs elsewhere to attract the visibility it deserves

As a relative newcomer to TDWG, where is that exactly? I'm not trying to be difficult and I don't disagree that this has veered far from the original purpose. I've just had a hard time figuring out where the core discussions happen around the standard. Some discussions seem to happen in the GBIF Github, some discussions in the OBIS Github, some (but fewer) discussions in the TDWG Q & A repo, very few on the listserve (or maybe I'm not on the right one). This is actually the most discussion about the standard I've seen is only on issues where changes are proposed. Where should this discussion happen to get the most visibility?

@baskaufs
Copy link

@albenson-usgs This sort of discussion used to happen on the tdwg-content email list, and that list was once considered the "official" place to discuss issues related to proposed changes to standards. However, there were complaints that long threads like this one overwhelmed people's inboxes and it was suggested that they would be better documented in issue tracker comments like this, which also allow people to opt in or out of following them.

I think that using GitHub issues comments like this is a significant improvement over using the email list. If I am to busy on a given day to read everything that comes in as notifications, at a later time I can just go to the issue tracker page and scroll through the thread.

However, I agree that there are problems with the system once discussions veer away from direct comments about the proposal at hand. To some extent, that should probably be handled by opening a new issue more directly related to the other issues being discussed. But that still has the problem that discussions only include people who are paying attention to that repo or who are tagged in the thread. For example, I don't watch either the OBIS or GBIF repos, so I am unaware of discussions taking place there.

I would like to see some more attention be paid to how people can opt in to a more general system for discussion. That used to be tdwg-content, but with Slack, Twitter, the TDWG newsletter, and GitHub issues, the platforms that are used for communication are much more diffuse, with many people preferentially following some but not all of the venues. Despite the negative aspects of tdwg-content that I mentioned, it did have the advantage that everyone knew about it, everyone had email, and anyone could sign up for it. That isn't really the case for GitHub, Twitter, and Slack, which all require signing up for an account. I'm actually not sure how one gets subscribed to the TDWG newsletter. I get it, but don't know how I got subscribed, and one can't just post to it as one can in the other platforms, so it isn't really a discussion venue.

@jdpye
Copy link
Member

jdpye commented May 26, 2021

I really appreciate the rigor to which you have all been hammering out this change. I am very strongly supportive of how this change season has been put on here in the GitHub Issues for all to engage with and I thank you all for putting this crucial work in. I've been reading this thread specifically, to see where consensus might emerge, and whether I can still see a fit for my use case in the proposed solutions. That use case is basically: An animal acoustically detected and identified to the individual level by an electronic code recorded at a series of listening stations around the world. (MOTUS will have a similar model on the bird side of things)

Since these listening stations can sometimes decode random noise into a numeric code, and that code can match to a tag deployed on another animal somewhere in the world, there are filters that researchers apply to assess the likelihood that any given ping is truly of their tagged individual (who has for the sake of my use case, been identified taxonomically and issued an organismID at the point of their being tagged).

Since those filters are often not conclusive, I've got a range of possibilities when reporting (or not reporting) these detections that are flagged by filters. The occurrenceStatus field, if not one of the ways I should accomplish this reporting of imprecision (though not temporal, spatial, or taxonomic imprecision!), is at least a place where I have to make absolutely sure I don't mislead anyone. I wasn't sure how to apply the originally proposed changes to my niche example, and I figure there are possibly other ways to make my records speak clearly about false detections, but if there's a task team coming together and looking for edge cases to explore, ✋ .

@tucotuco
Copy link
Member Author

@jdpye Fantastic eye-opening use case for Organisms (individuals even) that may or not be there, and why you would care to say something if you weren't sure they were there. This suggests that the controlled vocabulary proposal (#342) is insufficient even without the issues about distinctions between Organism and Taxon, and the limits on the scope of Organism. This also supports the need for a Task Group.

@tucotuco
Copy link
Member Author

this thread belongs elsewhere to attract the visibility it deserves

As a relative newcomer to TDWG, where is that exactly? I'm not trying to be difficult and I don't disagree that this has veered far from the original purpose. I've just had a hard time figuring out where the core discussions happen around the standard. Some discussions seem to happen in the GBIF Github, some discussions in the OBIS Github, some (but fewer) discussions in the TDWG Q & A repo, very few on the listserve (or maybe I'm not on the right one). This is actually the most discussion about the standard I've seen is only on issues where changes are proposed. Where should this discussion happen to get the most visibility?

The TDWG Darwin Core Q&A repository (and associated Darwin Core Hour of live seminars) was developed exactly for the purpose you described - for uncertainties about the standard to generate discussions that can be summarized in answers, documented, and inform changes (if necessary, and embodied in change requests here in this repository) to the standard.

@tucotuco
Copy link
Member Author

At this point in the review process, my assessment is that will not achieve consensus on this issue as proposed. A more comprehensive solution (potentially involving two terms and their respective vocabularies) will be required and that is a perfect job for a task group.
Even so, I believe we can achieve consensus on an alternative, non-normative change proposed by @timrobertson100. I would appreciate it if we could assess that fully in the hopes of including it in this round of changes. To that end I have created a new term change issue. Please comment there on that specific proposal.

@Jegelewicz
Copy link

I am very strongly in favor of the original direction @tucotuco had been pushing this. I desperately hope we (TDWG community) are moving in the direction of @tucotuco's "bigger dream", and that following the proposal of @timrobertson100 now is understood to be only a pragmatic solution to simplify the path immediately in front of us, and that this issue will need to be resolved more robustly in the (not-too-distant) future.

I am tired of kicking cans on everything and allowing issues to pile up. I don't really have massive amounts of free time to devote to all of this, but I think it needs to be addressed sooner rather than later.

@tucotuco
Copy link
Member Author

@Jegelewicz I understand the frustration, but we are following a community-defined and accepted practice and consensus is a part of that process. We can seek consensus, but we can't force it. The reason for deferring the proposal and recommending the task group for it is so it doesn't hold up the 39 other proposals in this massive cleanup effort for which there is consensus.

@tucotuco
Copy link
Member Author

tucotuco commented Jun 2, 2021

This proposal has been labeled as 'Controversial' and in need of a task group to for resolution. It is no longer part of an active public review.

@tucotuco tucotuco removed this from the Public Review 2021-05-01 milestone Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants