-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multivariate Distance Covariance metric and the corresponding test of independence #62
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your effort making this PR! I did not check the math yet (I assume you implemented it well), but I have several code suggestions.
dcor/_dcor.py
Outdated
np.bool = np.bool_ | ||
|
||
|
||
def gamma_ratio(p): return np.exp(gammaln((p+1)/2) - gammaln(p/2)) # For Calculating C_p and C_q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The function body needs to be in its own line.
- Functions should be type-annotated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the type-annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I was not very clear: I meant to use Python type hints, not to add the types in the docstring. Types in a docstring cannot be easily verified, while type hints can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes
based distance covariance
I found some issues regarding the cyclic recalling of rowwise module in _dcor.py. Maybe you can help me to solve it. |
dcor/_dcor.py
Outdated
return X_new | ||
|
||
|
||
def u_dist_cov_sqr_mv(X, Y, n_projs = 500, method ='mergesort'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def u_dist_cov_sqr_mv(X, Y, n_projs = 500, method ='mergesort'): | |
def u_dist_cov_sqr_mv(X, Y, n_projs = 500, method ="mergesort"): |
Please, add double quotes everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes
dcor/_dcor.py
Outdated
|
||
|
||
""" | ||
A Statistically and Numerically Efficient Independence Test Based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring needs to be inside the corresponding function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completed the task
dcor/_dcor.py
Outdated
|
||
References | ||
---------- | ||
.. bibliography:: ../refs.bib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use :footcite:
and .. footbibliography
instead, as it is easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really appreciate it if you kindly guide me how to do that, I did search this names on the internet, but I am not quite certain how to do it. It would be really nice if you guide me step-by-step process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use :footcite:
instead of :cite:
inside the text to place the footnote, and .. footbibliography::
instead of .. biblioggraphy::
. You do not need to specify the .bib file, as it is done in the global config.
You can use this docstring as an example:
https://github.com/GAA-UAM/scikit-fda/blob/3eb8ace81220011559df83be0ffb9154a11fb001/skfda/exploratory/stats/_fisher_rao.py#L66-L99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the exact format that you have mentioned
dcor/_dcor.py
Outdated
np.bool = np.bool_ | ||
|
||
|
||
def gamma_ratio(p): return np.exp(gammaln((p+1)/2) - gammaln(p/2)) # For Calculating C_p and C_q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I was not very clear: I meant to use Python type hints, not to add the types in the docstring. Types in a docstring cannot be easily verified, while type hints can.
@njit(fastmath=True, parallel=True, cache=True) | ||
def dist_sum(X): # It is nothing but the parallelized version of pdist function----------------------- | ||
''' | ||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You closed the conversation without changing it, please don't do that.
dcor/_dcor.py
Outdated
''' | ||
|
||
# X_std = multivariate_normal.rvs( np.zeros(dim_X), np.identity(dim_X), size = 1) | ||
X_std = np.random.standard_normal(dim_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what random projections are. What I meant, is that it should have a random_state
argument, similar to how distance_covariance_test
works. This way, it is still possible to generate random data, but you can also pass a fixed seed to obtain reproducible results.
dcor/_dcor.py
Outdated
''' | ||
|
||
# X_std = multivariate_normal.rvs( np.zeros(dim_X), np.identity(dim_X), size = 1) | ||
X_std = np.random.standard_normal(dim_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, do not resolve conversations until the comments have been addressed.
dcor/_dcor.py
Outdated
X_std = np.random.standard_normal(dim_X) | ||
X_norm = np.sum([i**2 for i in X_std]) | ||
U_sphere = np.array(X_std / np.sqrt(X_norm)) | ||
if dim_X > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but does the general code not work in that case?
Note that the last commit shows a lot of whitespace changes. Please, make sure that your editor is not changing the Unix newlines (LF, or "\n") to Windows ones (CRLF, or "\r\n"). |
I use actually "spyder", not sure why it did happen. But I can use "autopipe" for better indentation if it's necessary. |
Removed White space
I have removed the white space. |
As we discussed, I have added the code corresponding to the random projection based multivariate distance covariance, as discussed in https://www.frontiersin.org/articles/10.3389/fams.2021.779841/full.
I spent some time regarding how to use the gamma function in the best possible way so that the correct p_value can be calculated. (One need to install the "mpmath" python module for gamma_cdf function, for details: https://mpmath.org/doc/current/functions/expintegrals.html)
Please take a look and let me know whether it is working fine or not.