-
Notifications
You must be signed in to change notification settings - Fork 182
[docs] Add a prototype estimator for array api enabling #2534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 46 files with indirect coverage changes 🚀 New features to boost your workflow:
|
# | ||
# 1) All sklearnex estimators must inherit oneDALestimator and the sklearn | ||
# estimator that it is replicating (i.e. before in the mro). If there is | ||
# not an equivalent sklearn estimator, then sklearn's BaseEstimator must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also inherit from the corresponding type for what the estimator does, like RegressorMixin.
# inherited. | ||
# | ||
# 2) ``check_is_fitted`` is required for any method in an estimator which | ||
# requires first calling ``fit`` or ``partial_fit``. This is a sklearn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not a requirement to call this specific sklearn function within .fit, only to make the estimator work correctly when that function is called on it:
https://scikit-learn.org/stable/developers/develop.html#developer-api-for-check-is-fitted
sklearnex/tests/prototypes.py
Outdated
# | ||
# Sklearnex estimator methods can be thought of in 3 major tiers. | ||
# | ||
# Tier 1: Those methods which offload to oneDAL using ``dispatch``. Typical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Tier 1: Those methods which offload to oneDAL using ``dispatch``. Typical | |
# Tier 1: Methods which offload to oneDAL using ``dispatch``. Typical |
# examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL | ||
# function for evaluation. These methods are of highest priority and have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# examples are ``fit`` and ``predict``. They use a direct equivalent oneDAL | |
# function for evaluation. These methods are of highest priority and have | |
# examples are ``fit`` and ``predict``. They use a direct equivalent function | |
# from oneDAL. These methods are of highest priority and have |
sklearnex/tests/prototypes.py
Outdated
# sklearn, giving the ability to use raw data directly and avoiding | ||
# sklearn pre-processing checks as necessary. | ||
# | ||
# 3) Pybind11 interfaces should be not be made public to the user unless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 3) Pybind11 interfaces should be not be made public to the user unless | |
# 3) Pybind11 interfaces should not be be made public to the user unless |
sklearnex/tests/prototypes.py
Outdated
# 4) Either sklearn is called, or a object from onedal is created and called | ||
# using the input data. This process is handled in a function which has the | ||
# prefix "_onedal_" followed by the method name. When fitting data, the | ||
# returned onedal estimator object is stored as the ``_onedal_estimator`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should serialization work for such objects? Should it all happen in PyBind11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up on that. It is yet to be addressed in the equivalent onedal/tests/prototype.py file, and will be guaranteed to exist when PR is moved from draft.
sklearnex/tests/prototypes.py
Outdated
# 3) ``_onedal_gpu_supported`` or ``_onedal_cpu_supported`` creates a | ||
# PatchingConditionsChain object, takes the input data and estimator | ||
# parameters, and evaluates whether the estimator and data can be run using | ||
# oneDAL. This information is logged to the `sklearnex` logger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logger is not imported in this example.
|
||
|
||
################ | ||
# IMPORT NOTES # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the requirement that estimators based off existing sklearn classes should add entries to the support table in the docs when added.
sklearnex/tests/prototypes.py
Outdated
# are ``score`` and ``predict_log_proba``. Oftentimes the additional | ||
# calculations are trivial, meaning benchmarking is not required. | ||
# | ||
# Tier 3: Those methods which directly use sklearn functionality. Typically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the mechanism that we're using to inherit docstrings.
sklearnex/tests/prototypes.py
Outdated
|
||
def fit(self, X, y): | ||
# | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should always return 'self'.
Description
First start by generating code which represents the basic requirements of a sklearnex and onedal estimator.
This PR serves two purposes: to ease understanding of the codebase for external development and standardize development
occurring for array API support.
Next will be to make the necessary doc page links to various aspects to act as a guide for array API development. Which will help in external user contribution.
My goal will be to see if I can get an LLM with this information to generate StandardScaler using BasicStatistics. If it can, that means an LLM can help guide a user with this starting prompt in more difficult scenarios.
PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance