Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of distance correlation - dcor vs scipy #21

Closed
amjass12 opened this issue Jan 24, 2021 · 4 comments
Closed

Clarification of distance correlation - dcor vs scipy #21

amjass12 opened this issue Jan 24, 2021 · 4 comments

Comments

@amjass12
Copy link

Hi!

I have started using dcor as as I need to find pairwise correlations between two variables/vectors for every pairwise comparison in a dataframe. I am using the distance correlation as i need to find correlations not just for linear pairwise correlations but also non-linear correlations.

Having read the documentation, I know this is the correct implementation for this purpose, however, as I understand it, Scipy also provides a distance correlation function. I am getting different results when using both dcor and scipy and was wondering if you could explain why? I am unsure if Scipy is actually using the same distance correlation, or if their implementation contains something obvious I have missed which leads to the different results:

from scipy.spatial import distance
distance.correlation(data['column1'], data['column2'])
= 0.57

import dcor
dcor.distance_correlation(data['column1'], data['column2'])
= 0.41

There is a large discrepancy here and would appreciate clarification!

thank you!

@vnmabus
Copy link
Owner

vnmabus commented Jan 25, 2021

This is because scipy is not computing distance correlation, but transforming the usual (Pearson) correlation R into a (semi)metric, as 1 - R, so that highly correlated variables (correlation near 1) are close using this metric (distance near 0). The naming of that functionality is unfortunate, and I am afraid that it has confused some people before (see https://stackoverflow.com/questions/35988933/scipy-distance-correlation-is-higher-than-1 and https://stackoverflow.com/questions/60392972/scipy-distance-correlation-scale, for example).

@amjass12
Copy link
Author

Thank you @vnmabus for clarifying this makes perfect sense!

so just to clarify, dcor is the right package to calculate the distance correlation that is able to find pairwise comparisons that can find both linear and non-linear correlations as per the definition of the distance correlation. (sorry, just want to be absolutely sure I am using the intended analyses!)

thanks again

@vnmabus
Copy link
Owner

vnmabus commented Jan 25, 2021

Yes, this package can find nonlinear correlations, as it implements Székely's distance correlation (https://en.wikipedia.org/wiki/Distance_correlation).

@amjass12
Copy link
Author

perfect, thank you for confirming and thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants