Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky-based precision calculation #3067

Open
david-cortes-intel opened this issue Feb 10, 2025 · 0 comments
Open

Cholesky-based precision calculation #3067

david-cortes-intel opened this issue Feb 10, 2025 · 0 comments

Comments

@david-cortes-intel
Copy link
Contributor

Note: this is a transcription from the docs section on ideas for contributors.

In line with scikit-learn's EmpiricalCovariance estimator the Covariance algorithm from scikit-learn-intelex by default also calculates and stores the Precision - i.e. the inverse of the covariance. This inverse is obtained by eigendecomposition, and it may be used within the scikit-learn-intelex interface to calculate Mahalanobis distances.

However, for full-rank matrices, it's likely faster to obtain the precision matrix out of the covariance by a Cholesky-based inversion, at the expense of slightly reduced numerical accuracy. This could be implemented on the oneDAL side by handling the option to calculate the precision in the C++ interface, storing the precision in the C++ object, and calculating it with Cholesky when possible, falling back to eigenvalue-based decomposition if Cholesky fails or is too inexact. Note that implementation of the idea about partial eigendecompositions would also be of use here, as Cholesky-based inversion would not be applicable to rank-deficient matrices, in which case it should go directly for eigendecomposition.

Having a triangular factorization of the precision would also open the possibility of speeding up Mahalanobis distance calculations, which would be faster with triangular matrices than with full-rank square root matrices as produced by eigendecomposition. While Mahalanobis distance is typically calculated with the Cholesky of the precision, a different Cholesky-like factorization would also suffice - for example, it would be faster to obtain a factorization of the precision from the Cholesky of the covariance, such as suggested in this StackExchange answer, which could then be stored on the C++ object and used for Mahalanobis distance calculations by adding a new method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant