You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: this is a transcription from the docs section on ideas for contributors.
In line with scikit-learn's EmpiricalCovariance estimator the Covariance algorithm from scikit-learn-intelex by default also calculates and stores the Precision - i.e. the inverse of the covariance. This inverse is obtained by eigendecomposition, and it may be used within the scikit-learn-intelex interface to calculate Mahalanobis distances.
However, for full-rank matrices, it's likely faster to obtain the precision matrix out of the covariance by a Cholesky-based inversion, at the expense of slightly reduced numerical accuracy. This could be implemented on the oneDAL side by handling the option to calculate the precision in the C++ interface, storing the precision in the C++ object, and calculating it with Cholesky when possible, falling back to eigenvalue-based decomposition if Cholesky fails or is too inexact. Note that implementation of the idea about partial eigendecompositions would also be of use here, as Cholesky-based inversion would not be applicable to rank-deficient matrices, in which case it should go directly for eigendecomposition.
Having a triangular factorization of the precision would also open the possibility of speeding up Mahalanobis distance calculations, which would be faster with triangular matrices than with full-rank square root matrices as produced by eigendecomposition. While Mahalanobis distance is typically calculated with the Cholesky of the precision, a different Cholesky-like factorization would also suffice - for example, it would be faster to obtain a factorization of the precision from the Cholesky of the covariance, such as suggested in this StackExchange answer, which could then be stored on the C++ object and used for Mahalanobis distance calculations by adding a new method.
The text was updated successfully, but these errors were encountered:
Note: this is a transcription from the docs section on ideas for contributors.
In line with scikit-learn's EmpiricalCovariance estimator the Covariance algorithm from scikit-learn-intelex by default also calculates and stores the Precision - i.e. the inverse of the covariance. This inverse is obtained by eigendecomposition, and it may be used within the scikit-learn-intelex interface to calculate Mahalanobis distances.
However, for full-rank matrices, it's likely faster to obtain the precision matrix out of the covariance by a Cholesky-based inversion, at the expense of slightly reduced numerical accuracy. This could be implemented on the oneDAL side by handling the option to calculate the precision in the C++ interface, storing the precision in the C++ object, and calculating it with Cholesky when possible, falling back to eigenvalue-based decomposition if Cholesky fails or is too inexact. Note that implementation of the idea about partial eigendecompositions would also be of use here, as Cholesky-based inversion would not be applicable to rank-deficient matrices, in which case it should go directly for eigendecomposition.
Having a triangular factorization of the precision would also open the possibility of speeding up Mahalanobis distance calculations, which would be faster with triangular matrices than with full-rank square root matrices as produced by eigendecomposition. While Mahalanobis distance is typically calculated with the Cholesky of the precision, a different Cholesky-like factorization would also suffice - for example, it would be faster to obtain a factorization of the precision from the Cholesky of the covariance, such as suggested in this StackExchange answer, which could then be stored on the C++ object and used for Mahalanobis distance calculations by adding a new method.
The text was updated successfully, but these errors were encountered: