behaviour of rrarefy() #144

FabianRoger · 2015-10-05T12:50:08Z

Hello,

I use vegane::rarefy to rarefy my community datasets. Because I have a lot of samples and and the sampling depth (it's sequencing data) is quite uneven, I choose to rarefy at depth X and exclude samples with N < X.

however, the function attempts to rarefy each community and throws the error cannot take a sample larger than the population when replace = FALSE

Therefore I always have to manually exclude those samples first. I was wondering if it wouldn't be possible to handle this internally? i.e exclude all samples with N < X and issue a warning along the lines of WARNING: the following samples had less counts than the chosen rarefaction depth and where therefore excluded from the dataset: sample[1], sample[6] (...)

If you agree that this behaviour would be desirable, I could also rewrite the function and submit it as a pull request.

thanks for providing this great resource!

best

Fabian

The text was updated successfully, but these errors were encountered:

jarioksa · 2015-10-05T13:22:31Z

The sample argument can be a vector, and this allows handling the problem:

data(mite)
rrarefy(mite, pmin(rowSums(mite), 100))

I would consider adding alternative tricks, but they should be well thought and designed. For instance, how would you align results with other variables after clandestine removal of some unspecified rows?

A simple thing, and an improvement to the current policy, is to return non-rarefied community (with a warning) if sample is larger than the row sum.

FabianRoger · 2015-10-06T09:41:33Z

I agree that silently removing samples is inadmissible and I am probably not familiar enough with the vegans universe to judge all the ramifications. If I get it right, rrarefy(mite, pmin(rowSums(mite), 100)) does what you suggest next,

to return non-rarefied community (with a warning) if sample is larger than the row sum.

although without the warning. I maintain though that it could be a legit choice to exclude samples that have (way) fewer individuals (as opposed to keep them non-rarefied).

Would you consider including an option as excludeSamples = FALSE ? So that if excludeSamples=FALSE rrarefy would return the unrarefied samples with a warning and if excludeSamples=TRUE it would exclude the samples with a waring that also points out which samples where excluded ?

jarioksa · 2015-10-06T11:21:35Z

I'd rather have an option to turn these rows into NA. I don't like the idea of deleting rows of data. It will cause trouble and mystic errors to innocent users. So will NA rows, but then they'll know there is a problem and they'll know where the problem is. Warnings will be temporary: they are issued when you generate the data, but lost afterwards with their information. Adding a vector of omitted rows as a new attribute to the data does not sound as a good user interface.

FabianRoger · 2015-10-06T11:39:24Z

Good point about the warnings. I was thinking of setting the option to FALSE by default. So any user who wants the rows to be gone has to make a conscious choice (and hopefully remembers his choice). If the rows are set to NA the user has to remove them manually anyway so little is gained. But it's of cause up to you, I just wanted to let you know how I think about it!

thanks for taking the feedback seriously and thanks again for providing this resource to the community!

jarioksa · 2015-10-06T12:29:44Z

This would call for four alternative strategies:

"fail": the current behaviour of stopping with error.
"asis": return non-rarefied community.
"NA": return NA row (or NA in place of >0 values)
"remove": remove lines (your suggestion).

jarioksa · 2015-10-06T14:07:51Z

Commit f5ff44b in branch rrarefy suggests an implementation of all four alternatives I outlined above (not yet documented). The design must be discussed. One side effect is that the behaviour is now different from other functions: rarefy and drarefy use always "asis" strategy: rarefy gives a warning, though, but drarefy just returns sampling probability = 1 for every species present in sites for which total < sample.

jarioksa · 2015-10-06T17:06:02Z

I really don't like the change in f5ff44b: it adds unnecessary complexity to the function, produces confusing output and makes it inconsistent with other rarefy functions. If you only want to remove sites which cannot be rarefied, you can achieve this very easily with

rrarefy(mite[rowSums(mite) >= 100,], 100)

without need of building complicated branches in functions.

However, the current behaviour of rrarefy is inconsistent: both rarefy and drarefy work when the number of individuals is lower than sample, and they will return non-rarefied community. I suggest another fix 328fc83 that removes this inconsistency. This will leave as user responsibility to detect and handle those cases if so desired.

jarioksa · 2015-10-08T05:56:03Z

PR #145 addresses this issue. It is based on the minimal change where rrarefy will work when sample > population and return the non-rarefied community with a warning. Along with this, drarefy will also issue a warning in this case. After this rarefy, drarefy and rrarefy behave consistently.

I won't implement the fix of f5ff44b: no vegan function should work so that random rows are removed in x <- foo(x). If you want to have this behaviour, you can use your own wrapper function:

rrarewithdrop <- 
    function(x, sample) 
{
    rrarefy(x[rowSums(x) >= sample, , drop = FALSE], sample)
}

jarioksa mentioned this issue Oct 8, 2015

rrarefy behaviour #145

Merged

jarioksa closed this as completed Oct 8, 2015

jarioksa pushed a commit that referenced this issue Nov 13, 2015

add ref to issue #144 in NEWS

05da200

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

behaviour of rrarefy() #144

behaviour of rrarefy() #144

FabianRoger commented Oct 5, 2015

jarioksa commented Oct 5, 2015

FabianRoger commented Oct 6, 2015

jarioksa commented Oct 6, 2015

FabianRoger commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 8, 2015

behaviour of rrarefy() #144

behaviour of rrarefy() #144

Comments

FabianRoger commented Oct 5, 2015

jarioksa commented Oct 5, 2015

FabianRoger commented Oct 6, 2015

jarioksa commented Oct 6, 2015

FabianRoger commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 6, 2015

jarioksa commented Oct 8, 2015