Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diveristy(): dealing with NA's #187

Closed
baxter-jeremy opened this issue Aug 2, 2016 · 4 comments
Closed

diveristy(): dealing with NA's #187

baxter-jeremy opened this issue Aug 2, 2016 · 4 comments
Assignees
Milestone

Comments

@baxter-jeremy
Copy link

baxter-jeremy commented Aug 2, 2016

Hi,

I am very new to diversity/ecological statistical analysis (day 1 in fact!). Thank you for a very useful package and the documentation. A quick comment/observation (and given my lack of ecological experience I am not sure if this is a feature): Consider a data frame named thedf with counts of various species (captured as columns) where each row is a particular site, then
diversity(thedf)
will return the Shannon diversity measure, but a particular site on a particular day (i.e. a row of the df) might be lost/missing for some reason, i.e. the original data might be correctly coded as NA for that entire row in the data frame, but diversity() will return
apply( -x*xlog(x,exp(1), margin=1,sum,na.rm=TRUE)
where x <- sweep(thedf, 1, total= apply(df,1,sum), "/")
i.e. 0 (zero)
Should diversity() not return a NA? Or should there not at least be a warning:

theNAindicies <- which(is.na(rowSums(thedf)))
if ( length(theNAindicies) == 0 ) { warning( "rows of missing data" ) }

Thank you.
Jeremy

@jarioksa
Copy link
Contributor

jarioksa commented Aug 4, 2016

diversity is a pretty simple function that does not check its input, but that is left as user's responsibility. For instance, it does not even check that input are non-negative. Giving zero-diversity for NA abundances does not look too bad. It would not be too complicated to fix both of these features (reject negative input, give NA if observation has any NA), but this can make diversity slower -- and the function can be called millions of times in simulations. Got to see this.

jarioksa pushed a commit that referenced this issue Aug 4, 2016
used to give diversity=0 if any observations were NA. Reported as
issue #187 in github
@gavinsimpson
Copy link
Contributor

@jarioksa Both changes sound useful, from a user point of view. If such checks slow this down to the extent that the negatively impact simulations it sounds like we need a diversity.fit() function that does the actual diversity calculations on known good data, and that we make diversity() more of a user function. (All in the sense of lm and lm.fit).

@jarioksa
Copy link
Contributor

jarioksa commented Aug 4, 2016

microbenchmark showed no consistent difference in moderate data sets (BCI, Oribatid mites). Haven't merged this yet, but it seemed to work both for a single site and multi-site data sets.

@jarioksa jarioksa self-assigned this Aug 4, 2016
jarioksa pushed a commit that referenced this issue Aug 5, 2016
@jarioksa
Copy link
Contributor

jarioksa commented Aug 5, 2016

Solved with commit 014b250. This commit also checks that data are non-negative.

@jarioksa jarioksa closed this as completed Aug 5, 2016
@jarioksa jarioksa added this to the 2.4-1 milestone Aug 22, 2016
jarioksa pushed a commit that referenced this issue Aug 30, 2016
used to give diversity=0 if any observations were NA. Reported as
issue #187 in github

(cherry picked from commit 5859e3a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants