Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding correlation distance metric in oneDAL primitives #3059

Merged

Conversation

richardnorth3
Copy link
Contributor

@richardnorth3 richardnorth3 commented Feb 3, 2025

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@icfaust
Copy link
Contributor

icfaust commented Feb 3, 2025

inline sycl::event reduce_by_columns(sycl::queue& q,

https://github.com/uxlfoundation/oneDAL/blob/main/cpp/oneapi/dal/backend/primitives/stat/cov.hpp#L34
should help you in calculating the means for each row (column_wise reduction to generate the sum, then calculate the mean using the second function)

this will leverage the GPUs capabilities

ndview<Float, 2>& out,
const event_vector& deps = {}) const {
const std::int64_t n = u.get_dimension(0);
auto u_sum = ndarray<Float, 1>::empty({ 1 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @richardnorth3 ! https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.correlation.html should be a good reference for the means necessary. The mean doesn't need to be a single value, but a vector the size of a single sample with each being the mean value for each feature (i.e. for each row). This will then be a vector subtraction. Let me know if I am misunderstanding things about your implementation. Good progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry about that, the definition is the mean of the 1d array, so my recommendations have been a little bit wrong.

@richardnorth3
Copy link
Contributor Author

richardnorth3 commented Feb 24, 2025

https://github.com/richardnorth3/oneDAL/blob/9b09822dd84b8eb2ee555e08b4576c8ce7d9c0e2/cpp/oneapi/dal/backend/primitives/distance/correlation_distance_dpc.cpp#L50-L95

After heavily mediating over numerous drafts and codebase reviews, I ultimately came back to your original advice @icfaust after an epiphany. This version of the correlation distance reuses most of the cosine distance code but calculates the deviations first, passes those values to get_inversed_norms(), and proceeds with the same functionality as the cosine distance computation.

@richardnorth3 richardnorth3 added dpc++ Issue/PR related to DPC++ functionality oneAPI Issue/PR related to oneAPI part labels Mar 3, 2025
@Vika-F
Copy link
Contributor

Vika-F commented Mar 5, 2025

@Mergifyio rebase

Copy link
Contributor

mergify bot commented Mar 5, 2025

rebase

☑️ Nothing to do

  • -conflict [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]
  • any of:
    • #commits > 1 [📌 rebase requirement]
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]

richard.north.iii and others added 13 commits March 5, 2025 08:05
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
Signed-off-by: North Iii <richard.north.iii@intel.com>
@richardnorth3 richardnorth3 force-pushed the dev/correlation-metric branch from 4658d8f to 7c877d4 Compare March 5, 2025 16:16
Copy link
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me! Well done! I have a few minor questions and suggestions for now

@Alexandr-Solovev
Copy link
Contributor

/intelci: run

Signed-off-by: richard.north.iii <rnorthii@smtp.jf.intel.com>
Signed-off-by: richard.north.iii <rnorthii@smtp.jf.intel.com>
@richardnorth3
Copy link
Contributor Author

/intelci: run

Signed-off-by: richard.north.iii <rnorthii@smtp.jf.intel.com>
@Vika-F
Copy link
Contributor

Vika-F commented Mar 6, 2025

/intelci: run

Copy link
Contributor

@Alexandr-Solovev Alexandr-Solovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The private ci in the commit before apply clang format is green! Well done!

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure in latest CI is not related to the changes. LGTM.

@richardnorth3 richardnorth3 merged commit 2c316ac into uxlfoundation:main Mar 6, 2025
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dpc++ Issue/PR related to DPC++ functionality oneAPI Issue/PR related to oneAPI part
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants