New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behaviour of rrarefy() #144

Closed
FabianRoger opened this Issue Oct 5, 2015 · 8 comments

Comments

Projects
None yet
2 participants
@FabianRoger

FabianRoger commented Oct 5, 2015

Hello,

I use vegane::rarefy to rarefy my community datasets. Because I have a lot of samples and and the sampling depth (it's sequencing data) is quite uneven, I choose to rarefy at depth X and exclude samples with N < X.

however, the function attempts to rarefy each community and throws the error cannot take a sample larger than the population when replace = FALSE

Therefore I always have to manually exclude those samples first. I was wondering if it wouldn't be possible to handle this internally? i.e exclude all samples with N < X and issue a warning along the lines of WARNING: the following samples had less counts than the chosen rarefaction depth and where therefore excluded from the dataset: sample[1], sample[6] (...)

If you agree that this behaviour would be desirable, I could also rewrite the function and submit it as a pull request.

thanks for providing this great resource!

best

Fabian

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 5, 2015

Contributor

The sample argument can be a vector, and this allows handling the problem:

data(mite)
rrarefy(mite, pmin(rowSums(mite), 100))

I would consider adding alternative tricks, but they should be well thought and designed. For instance, how would you align results with other variables after clandestine removal of some unspecified rows?

A simple thing, and an improvement to the current policy, is to return non-rarefied community (with a warning) if sample is larger than the row sum.

Contributor

jarioksa commented Oct 5, 2015

The sample argument can be a vector, and this allows handling the problem:

data(mite)
rrarefy(mite, pmin(rowSums(mite), 100))

I would consider adding alternative tricks, but they should be well thought and designed. For instance, how would you align results with other variables after clandestine removal of some unspecified rows?

A simple thing, and an improvement to the current policy, is to return non-rarefied community (with a warning) if sample is larger than the row sum.

@FabianRoger

This comment has been minimized.

Show comment
Hide comment
@FabianRoger

FabianRoger Oct 6, 2015

I agree that silently removing samples is inadmissible and I am probably not familiar enough with the vegans universe to judge all the ramifications. If I get it right, rrarefy(mite, pmin(rowSums(mite), 100)) does what you suggest next,

to return non-rarefied community (with a warning) if sample is larger than the row sum.

although without the warning. I maintain though that it could be a legit choice to exclude samples that have (way) fewer individuals (as opposed to keep them non-rarefied).

Would you consider including an option as excludeSamples = FALSE ? So that if excludeSamples=FALSE rrarefy would return the unrarefied samples with a warning and if excludeSamples=TRUE it would exclude the samples with a waring that also points out which samples where excluded ?

FabianRoger commented Oct 6, 2015

I agree that silently removing samples is inadmissible and I am probably not familiar enough with the vegans universe to judge all the ramifications. If I get it right, rrarefy(mite, pmin(rowSums(mite), 100)) does what you suggest next,

to return non-rarefied community (with a warning) if sample is larger than the row sum.

although without the warning. I maintain though that it could be a legit choice to exclude samples that have (way) fewer individuals (as opposed to keep them non-rarefied).

Would you consider including an option as excludeSamples = FALSE ? So that if excludeSamples=FALSE rrarefy would return the unrarefied samples with a warning and if excludeSamples=TRUE it would exclude the samples with a waring that also points out which samples where excluded ?

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 6, 2015

Contributor

I'd rather have an option to turn these rows into NA. I don't like the idea of deleting rows of data. It will cause trouble and mystic errors to innocent users. So will NA rows, but then they'll know there is a problem and they'll know where the problem is. Warnings will be temporary: they are issued when you generate the data, but lost afterwards with their information. Adding a vector of omitted rows as a new attribute to the data does not sound as a good user interface.

Contributor

jarioksa commented Oct 6, 2015

I'd rather have an option to turn these rows into NA. I don't like the idea of deleting rows of data. It will cause trouble and mystic errors to innocent users. So will NA rows, but then they'll know there is a problem and they'll know where the problem is. Warnings will be temporary: they are issued when you generate the data, but lost afterwards with their information. Adding a vector of omitted rows as a new attribute to the data does not sound as a good user interface.

@FabianRoger

This comment has been minimized.

Show comment
Hide comment
@FabianRoger

FabianRoger Oct 6, 2015

Good point about the warnings. I was thinking of setting the option to FALSE by default. So any user who wants the rows to be gone has to make a conscious choice (and hopefully remembers his choice). If the rows are set to NA the user has to remove them manually anyway so little is gained. But it's of cause up to you, I just wanted to let you know how I think about it!

thanks for taking the feedback seriously and thanks again for providing this resource to the community!

FabianRoger commented Oct 6, 2015

Good point about the warnings. I was thinking of setting the option to FALSE by default. So any user who wants the rows to be gone has to make a conscious choice (and hopefully remembers his choice). If the rows are set to NA the user has to remove them manually anyway so little is gained. But it's of cause up to you, I just wanted to let you know how I think about it!

thanks for taking the feedback seriously and thanks again for providing this resource to the community!

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 6, 2015

Contributor

This would call for four alternative strategies:

  1. "fail": the current behaviour of stopping with error.
  2. "asis": return non-rarefied community.
  3. "NA": return NA row (or NA in place of >0 values)
  4. "remove": remove lines (your suggestion).
Contributor

jarioksa commented Oct 6, 2015

This would call for four alternative strategies:

  1. "fail": the current behaviour of stopping with error.
  2. "asis": return non-rarefied community.
  3. "NA": return NA row (or NA in place of >0 values)
  4. "remove": remove lines (your suggestion).
@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 6, 2015

Contributor

Commit f5ff44b in branch rrarefy suggests an implementation of all four alternatives I outlined above (not yet documented). The design must be discussed. One side effect is that the behaviour is now different from other functions: rarefy and drarefy use always "asis" strategy: rarefy gives a warning, though, but drarefy just returns sampling probability = 1 for every species present in sites for which total < sample.

Contributor

jarioksa commented Oct 6, 2015

Commit f5ff44b in branch rrarefy suggests an implementation of all four alternatives I outlined above (not yet documented). The design must be discussed. One side effect is that the behaviour is now different from other functions: rarefy and drarefy use always "asis" strategy: rarefy gives a warning, though, but drarefy just returns sampling probability = 1 for every species present in sites for which total < sample.

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 6, 2015

Contributor

I really don't like the change in f5ff44b: it adds unnecessary complexity to the function, produces confusing output and makes it inconsistent with other rarefy functions. If you only want to remove sites which cannot be rarefied, you can achieve this very easily with

rrarefy(mite[rowSums(mite) >= 100,], 100)

without need of building complicated branches in functions.

However, the current behaviour of rrarefy is inconsistent: both rarefy and drarefy work when the number of individuals is lower than sample, and they will return non-rarefied community. I suggest another fix 328fc83 that removes this inconsistency. This will leave as user responsibility to detect and handle those cases if so desired.

Contributor

jarioksa commented Oct 6, 2015

I really don't like the change in f5ff44b: it adds unnecessary complexity to the function, produces confusing output and makes it inconsistent with other rarefy functions. If you only want to remove sites which cannot be rarefied, you can achieve this very easily with

rrarefy(mite[rowSums(mite) >= 100,], 100)

without need of building complicated branches in functions.

However, the current behaviour of rrarefy is inconsistent: both rarefy and drarefy work when the number of individuals is lower than sample, and they will return non-rarefied community. I suggest another fix 328fc83 that removes this inconsistency. This will leave as user responsibility to detect and handle those cases if so desired.

@jarioksa

This comment has been minimized.

Show comment
Hide comment
@jarioksa

jarioksa Oct 8, 2015

Contributor

PR #145 addresses this issue. It is based on the minimal change where rrarefy will work when sample > population and return the non-rarefied community with a warning. Along with this, drarefy will also issue a warning in this case. After this rarefy, drarefy and rrarefy behave consistently.

I won't implement the fix of f5ff44b: no vegan function should work so that random rows are removed in x <- foo(x). If you want to have this behaviour, you can use your own wrapper function:

rrarewithdrop <- 
    function(x, sample) 
{
    rrarefy(x[rowSums(x) >= sample, , drop = FALSE], sample)
}
Contributor

jarioksa commented Oct 8, 2015

PR #145 addresses this issue. It is based on the minimal change where rrarefy will work when sample > population and return the non-rarefied community with a warning. Along with this, drarefy will also issue a warning in this case. After this rarefy, drarefy and rrarefy behave consistently.

I won't implement the fix of f5ff44b: no vegan function should work so that random rows are removed in x <- foo(x). If you want to have this behaviour, you can use your own wrapper function:

rrarewithdrop <- 
    function(x, sample) 
{
    rrarefy(x[rowSums(x) >= sample, , drop = FALSE], sample)
}

@jarioksa jarioksa closed this Oct 8, 2015

jarioksa added a commit that referenced this issue Nov 13, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment