-
Notifications
You must be signed in to change notification settings - Fork 183
[enhancement] accelerate array_api inputs for sklearnex's validate_data
and _check_sample_weight
#2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
validate_data
and _check_sample_weight
validate_data
and _check_sample_weight
Codecov ReportAttention: Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 44 files with indirect coverage changes 🚀 New features to boost your workflow:
|
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
|
||
def is_contiguous(X): | ||
if hasattr(X, "flags"): | ||
return X.flags["C_CONTIGUOUS"] or X.flags["F_CONTIGUOUS"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure: Both keys C_CONTIGUOUS
and F_CONTIGUOUS
will always be defined, with the appropriate True
/False
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! This works for numpy/dpctl and dpnp, and is a numpy standard but not available in array_api.
Description
Continues dlpack work from #2275.
validate_data
and_check_sample_weight
do not follow standard sklearnex offloading practice, namely they compute always wherever the data is (as the data movement could ruin any speedup provided by oneDAL, the algorithm is extraordinarily simple), and they do not patch out sklearn functions. Therefore, they must be enabled separately for array_api support. Since they are to be included in every zero-copy array_api supported algorithm, it is a prerequisite for enabling every other estimator.Previously this aspect was controlled by the looking for the
flags
attribute, which is not in the array_api standard. The array api standard does not include python-facing attributes or methods which can show if C-contiguous or F-contiguous. However, the array_api standard requires dlpack support. The attributes of from a DLPack tensor can be checked for the memory layout instead. This PR introduces a special onedal backend function which extracts and checks the necessary memory layout (without taking ownership of the tensor). A python function is created which first checks and queries theflags
or__dlpack__
attributes. If neither are available, it will return False triggering the sklearn_assert_all_finite
. This is done asto_table
will attempt to convert to a contiguous memory layout, which again will ruin the performance gain.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance