Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process killed due to very large array #38

Closed
IslamAAli opened this issue Oct 12, 2022 · 2 comments
Closed

Process killed due to very large array #38

IslamAAli opened this issue Oct 12, 2022 · 2 comments

Comments

@IslamAAli
Copy link

Hello,
I am trying to get the distance correlation between two very large vectors (25k each), and the dcor function gets killed due to out of memory error. How can we fix that?

dcor.distance_correlation(np.array(x, dtype=np.float32), np.array(y, dtype=np.float32), exponent=0.5)

@vnmabus
Copy link
Owner

vnmabus commented Oct 12, 2022

The order of the naive algorithm for computing distance covariance is $O(N^2)$, which is problematic for large inputs. However, by default a fast $O(NlogN)$ algorithm is used whenever possible. The conditions for using it are that the input random vectors are unidimensional, and that the exponent used is one.

In your case you have exponent=0.5, so the naive algorithm is used. Do you have a particular reason for that choice? Otherwise I recommend you to stick to the default value of exponent=1.

@IslamAAli
Copy link
Author

Thanks for your reply. I don't have a specific reason for having exp=0.5 at the moment. I changed that to 1, and the process is running smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants