Process killed due to very large array #38

IslamAAli · 2022-10-12T06:57:09Z

Hello,
I am trying to get the distance correlation between two very large vectors (25k each), and the dcor function gets killed due to out of memory error. How can we fix that?

dcor.distance_correlation(np.array(x, dtype=np.float32), np.array(y, dtype=np.float32), exponent=0.5)

The text was updated successfully, but these errors were encountered:

vnmabus · 2022-10-12T07:23:37Z

The order of the naive algorithm for computing distance covariance is $O(N^2)$, which is problematic for large inputs. However, by default a fast $O(NlogN)$ algorithm is used whenever possible. The conditions for using it are that the input random vectors are unidimensional, and that the exponent used is one.

In your case you have exponent=0.5, so the naive algorithm is used. Do you have a particular reason for that choice? Otherwise I recommend you to stick to the default value of exponent=1.

IslamAAli · 2022-10-12T09:49:15Z

Thanks for your reply. I don't have a specific reason for having exp=0.5 at the moment. I changed that to 1, and the process is running smoothly.

IslamAAli closed this as completed Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process killed due to very large array #38

Process killed due to very large array #38

IslamAAli commented Oct 12, 2022

vnmabus commented Oct 12, 2022

IslamAAli commented Oct 12, 2022

Process killed due to very large array #38

Process killed due to very large array #38

Comments

IslamAAli commented Oct 12, 2022

vnmabus commented Oct 12, 2022

IslamAAli commented Oct 12, 2022