New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how many people have computed jaccard distances incorrectly using vegdist? #153
Comments
|
Thank you for the response. I use the package often and appreciate your contribution.
vegdist(x, jaccard) vegdist(x, raup-crick) the warning message says: Warning: if you wish to calculate distances on a presence/absence matrix, please specify binary = T
|
I don't understand how this is could be a problem. As long as your input is a presence/absence matrix, that is a matrix filled only with 0/1, it doesn't matter if you specify Here is an example code to produce the two scenarios
|
Maybe the |
The
Even some printed papers claimed that you cannot have certain indices (such as Sørensen or binary Jaccard) in vegan, and I don't want to go back to that situation. |
I like EDiLD comment, perhaps an error would be better if it is NOT in binary format. |
@micromania2 Why should a legal and a very well and extensively documented call trigger an error? If you want a binary index, call a binary index. That's all that's needed. (The extensive documentation includes this thread that I have not closed.) |
I beg to disagree with @Edild : removing argument |
I still hold that a simple warning message would prevent any misunderstandings. This was my comment back in Jan 2016. ...it is easy to incorrectly calculate jaccard distances (most are taught it is on presence/absence data) when using vegdist. Since Jaccard and only a few other indices (Raup-Crick, others?) are the only indices that are generally though of as presence/absence metrics/measures, would it be possible to return a warning message? For instance, if you run: vegdist(x, raup-crick) the warning message says: Warning: if you wish to calculate distances on a presence/absence matrix, please specify binary = T I often use the package phyloseq that depends on vegan and they have since updated their help files to indicate the need to specify binary = T. It is now just a standard part of my code and I have since forgotten about this post. I still think for new users of vegan it may be helpful to clarify and clear up some of the confusion, but Jari you are in charge so I support your decision! |
Since I realized that you must specify binary = T in vegdist to compute distance matrices on presence/absence data as opposed to abundance data, I keep seeing more and more places where others are incorrectly using jaccard in vegdist (by not specifying binary = T).
I assumed for at least three years that when you would specify distance = jaccard the command would calculate it on presence/absence data. I think most expect that. Now guaranteed it is in the help file that that is not the case, it is still tricky.
Vegan is used in multiple other packages for computing distances, and it is not only jaccard, but a few other metrics that one would assume are on presence/absence data, but without specifying binary = T, you would not get the expected results. Again I am noticing more and more places where people are missing this critical point.
TWO questions:
Is there any way to adjust the code so people (not me anymore at least!) would not make that mistake of the need to specify binary = T for computing distances for measures that are known to only be computed based on presence/absence data?
May I also ask how it is possible that "Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity." I can't seem to find much information about the mathematical relationship between the two measures and am no math whiz. I'm just curious how that is possible as I have been taught that those are two separate measures and not that Bray-Curtis is a transformation of Jaccard + abundance information.
Thank you!
The text was updated successfully, but these errors were encountered: