Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
how many people have computed jaccard distances incorrectly using vegdist? #153
Comments
|
CarlyRae
commented
Jan 21, 2016
|
Thank you for the response. I use the package often and appreciate your contribution.
vegdist(x, jaccard) vegdist(x, raup-crick) the warning message says: Warning: if you wish to calculate distances on a presence/absence matrix, please specify binary = T
|
georgeblck
commented
Jun 15, 2016
•
|
I don't understand how this is could be a problem. As long as your input is a presence/absence matrix, that is a matrix filled only with 0/1, it doesn't matter if you specify Here is an example code to produce the two scenarios
|
|
Maybe the |
|
The
Even some printed papers claimed that you cannot have certain indices (such as Sørensen or binary Jaccard) in vegan, and I don't want to go back to that situation. |
CarlyRae commentedJan 20, 2016
Since I realized that you must specify binary = T in vegdist to compute distance matrices on presence/absence data as opposed to abundance data, I keep seeing more and more places where others are incorrectly using jaccard in vegdist (by not specifying binary = T).
I assumed for at least three years that when you would specify distance = jaccard the command would calculate it on presence/absence data. I think most expect that. Now guaranteed it is in the help file that that is not the case, it is still tricky.
Vegan is used in multiple other packages for computing distances, and it is not only jaccard, but a few other metrics that one would assume are on presence/absence data, but without specifying binary = T, you would not get the expected results. Again I am noticing more and more places where people are missing this critical point.
TWO questions:
Is there any way to adjust the code so people (not me anymore at least!) would not make that mistake of the need to specify binary = T for computing distances for measures that are known to only be computed based on presence/absence data?
May I also ask how it is possible that "Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity." I can't seem to find much information about the mathematical relationship between the two measures and am no math whiz. I'm just curious how that is possible as I have been taught that those are two separate measures and not that Bray-Curtis is a transformation of Jaccard + abundance information.
Thank you!