[maintenance] lazy load dpnp.tensor/dpnp and prepare for array_api lazy importing #2509

icfaust · 2025-06-05T12:40:15Z

Description

Dpctl and dpnp are quasi-dependencies which will silently error out if not installed. This is done at import time throughout the codebase, meaning that it is mixed into the codebase in a difficult manner. As the number of supported data frameworks are increased, such a strategy is unsustainable. Lazy loading of the necessary packages must be done, as the load time of follow-on frameworks like PyTorch are non-negligible (>1s). If we were to follow the same strategy, load times of sklearnex would be even longer even if pytorch isn't used but is available. This will compound as we would add framework support. Cleanly separating and isolating their use is necessary.

Therefore we need to first move dpnp and dpctl.tensor support to a lazy loading approach which will then be extended by follow-on frameworks. The next step will be pytorch queue extraction, which will require this infrastructure.

The strategy will follow that of array_api_compat which can check for namespaces without importing the actual modules, and for the direct use of the frameworks, a depedency injection + monkeypatching scheme is used with decorator lazy_import.

NOTE TO REVIEWERS: Let me know if I should do a performance benchmarks for this.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

…rn-intelex into dev/lazy_load

david-cortes-intel · 2025-06-05T14:45:59Z

sklearnex/utils/validation.py

+    try:
+        too_small = X.size < 32768
+    except TypeError:
+        too_small = math.prod(X.shape) < 32768


Could also use np.prod, since numpy is already imported throughout the codebase.

https://github.com/scikit-learn/scikit-learn/blob/73a8a656b8df6d02cf88ef8f9cf98373a3f42051/sklearn/utils/_array_api.py#L215 Not entirely sure how numpy would interact with pytorch in that case. Could check that if you want, but its following the precedent set by sklearn itself

david-cortes-intel · 2025-06-05T14:47:35Z

onedal/utils/_third_party.py

+
+
+@functools.lru_cache(100)
+def _is_subclass_fast(cls: type, modname: str, clsname: str) -> bool:


Would this work if one of those array classes is subsetted by the user?

Nope, but neither would array_api_compat, meaning that steps before in sklearnex are likely to have thrown an error: https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/common/_helpers.py#L63

actually let me check this, i may be wrong

Here is an example:

import functools, sys import numpy as np @functools.lru_cache(100) def _is_subclass_fast(cls: type, modname: str, clsname: str) -> bool: try: mod = sys.modules[modname] except KeyError: return False parent_cls = getattr(mod, clsname) return issubclass(cls, parent_cls) class test(np.ndarray): pass testobj = test((3,5)) print(type(testobj)) print(issubclass(type(testobj), np.ndarray)) print(_is_subclass_fast(type(testobj), "numpy", "ndarray"))

Will print:
<class '__main__.test'>
True
True

onedal/utils/_third_party.py

david-cortes-intel · 2025-06-05T14:52:49Z

onedal/datatypes/_sycl_usm.py

+        return array
+
+
+@lazy_import("dpctl.memory")


Wouldn't importing the module inside the function have the same effect?

Trying to avoid adding an unnecessary slowdown via the dictionary search of sys.modules. I don't think it impacts the readability as it is, and follows precedent set by other codebases like sqlite3: https://stackoverflow.com/a/61647085

I don't follow. Their idea is to use the module multiple times, but here it gets only used inside a single function. Why would that lazy loader decorator be more efficient than importing the module inside of the function?

Since this is in line with this discussion: #2509 (comment) , the monkeypatch is attempting remove the slow down of the additional sys.module checks that would otherwise be added by the lazy load (if import was just added in the function). Just trying to have my cake and eat it too.

icfaust · 2025-06-18T14:11:15Z

/intelci: run

onedal/utils/_third_party.py

Alexsandruss · 2025-06-18T15:14:08Z

onedal/utils/_third_party.py

I'm not sure if third_party is the most correct term for these frameworks. Is frameworks_support or frameworks_compat better?

I'd agree with that except for the fact that we centralize the import of SyclQueue for use in a number of locations there (which isn't part of a framework) and that we already have an equivalent 'datatypes' onedal module.

Alexsandruss · 2025-06-18T15:15:29Z

sklearnex/ensemble/_forest.py

+                self.classes_ = xp.unique(y)
+            except AttributeError:


A comment why this error type might be expected is needed.

Added for every use of this (in onedal logistic_regression, onedal forest, and sklearnex forest). It looks like get_unique_values_with_dpep was not extended to all of our classifiers, so there is definitely some gap here. I will make a follow up ticket to investigate.

icfaust · 2025-06-20T09:01:55Z

/intelci: run

icfaust added 9 commits June 3, 2025 09:44

starting point

7d14b79

Merge branch 'dev/lazy_load' of https://github.com/icfaust/scikit-lea…

4a83297

…rn-intelex into dev/lazy_load

first cut

523e84b

rename

6f4775f

fix various testing imports

219e26f

don't get ahead of my skis

54af074

attempt to further move things apart

f3c5d5b

remove get_unique_values_with_dpep

bfdd3e0

remove actually

436405c

david-cortes-intel reviewed Jun 5, 2025

View reviewed changes

icfaust and others added 20 commits June 5, 2025 23:33

Update _array_api.py

55eab86

try to fix

a7c8fb0

Update _device_offload.py

982c7c4

Update _device_offload.py

e975a4f

Update _device_offload.py

d46d175

Update _device_offload.py

8e8b6d9

Update _sycl_usm.py

125e727

Update _third_party.py

fc6fa24

Update _device_offload.py

c9244b8

Update _device_offload.py

c171175

Update _device_offload.py

18308b2

Update _device_offload.py

603e7d3

Update _sycl_usm.py

bc1c0e3

Update _sycl_usm.py

51a6b06

Update _third_party.py

1f1648c

Update _third_party.py

0ec3ed8

Update _sycl_usm.py

5688076

Update _third_party.py

39d300e

Update _third_party.py

62611c0

Update _array_api.py

3688b1b

icfaust added 4 commits June 8, 2025 11:48

Update _third_party.py

474ab8f

Update _third_party.py

f744a6e

Update _third_party.py

c1ce7af

Merge branch 'uxlfoundation:main' into dev/lazy_load

fc41abc

This was referenced Jun 12, 2025

[CI, enhancement] add pytorch+gpu testing ci #2494

Merged

[WIP, CI] remove dpctl dependency from onedal/tests/utils/_device_selection.py #2549

Draft

icfaust added 7 commits June 18, 2025 12:48

Merge branch 'main' into dev/lazy_load

e697ca3

Update _data_conversion.py

273f4a7

Update __init__.py

a6013e1

Update __init__.py

c1176b4

Update _data_conversion.py

49ba4e6

Update _device_offload.py

d4317b4

Update _third_party.py

70d7557

icfaust marked this pull request as ready for review June 18, 2025 14:12

icfaust requested review from Alexsandruss, yuejiaointel, ahuber21, ethanglaser, razdoburdin, avolkov-intel and Vika-F as code owners June 18, 2025 14:12

Alexsandruss reviewed Jun 18, 2025

View reviewed changes

icfaust and others added 4 commits June 20, 2025 10:14

add requested comments to code

1a2e04c

add requested comments to code

f401c89

fix codespell hits

1231735

Merge branch 'uxlfoundation:main' into dev/lazy_load

d4e3e4d

icfaust requested review from Alexsandruss and david-cortes-intel June 20, 2025 09:18



		@functools.lru_cache(100)
		def _is_subclass_fast(cls: type, modname: str, clsname: str) -> bool:

[maintenance] lazy load dpnp.tensor/dpnp and prepare for array_api lazy importing #2509

Are you sure you want to change the base?

[maintenance] lazy load dpnp.tensor/dpnp and prepare for array_api lazy importing #2509

Uh oh!

Conversation

icfaust commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icfaust commented Jun 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

icfaust commented Jun 20, 2025

Uh oh!

Uh oh!

icfaust commented Jun 5, 2025 •

edited

Loading