Skip to content

KNN on max series seems slower than cuda-based implementation on comparable devices ? #1441

Open
@fcharras

Description

@fcharras

Initial report contained an error, please follow through the first comment for a better explanation.

import numpy as np
from sklearn.neighbors import NearestNeighbors
import sklearn

device = "
# device = "gpu:0"
from sklearnex import patch_sklearn
patch_sklearn()
sklearn.set_config(target_offload=f"{device}")

seed = 123
rng = np.random.default_rng(seed)

n_samples = 10_000_000
dim = 100
n_queries = 10_000
k = 100

data = rng.random((n_samples, dim), dtype=np.float32)
query = rng.random((n_queries, dim), dtype=np.float32)

knn = NearestNeighbors(n_neighbors=k, algorithm="brute")
knn.fit(data)
%time knn.kneighbors(X=query)

show following results:

  • if device=cpu:
CPU times: user 25min 40s, sys: 18 s, total: 25min 58s
Wall time: 14.1 s
  • if device=gpu (Max Series on intel beta cloud):
CPU times: user 25min 42s, sys: 21.7 s, total: 26min 4s
Wall time: 14.1 s

but one could expect a significant speedup on GPU.

Comparing on A100 with cuml implementation (in fact inherited from OSS implementation from FAISS):

import numpy as np
from cuml.neighbors import NearestNeighbors
import cupy

seed = 123
rng = np.random.default_rng(seed)

n_samples = 10_000_000
dim = 100
n_queries = 10_000
k = 100

data = rng.random((n_samples, dim), dtype=np.float32)
query = rng.random((n_queries, dim), dtype=np.float32)

data = cupy.asarray(data)
query = cupy.asarray(query)

knn = NearestNeighbors(n_neighbors=k, algorithm="brute")
knn.fit(data)
%time knn.kneighbors(X=query)

it's about 3sc:

CPU times: user 2.71 s, sys: 8.49 ms, total: 2.72 s
Wall time: 2.73 s
Also, looking at total total cpu times with scikit-learn-intelex it's unexpected that I see 25mins+ for both cpu and gpu runs despite the walltime being <15sc, it suggests cpu is also under heavy load for the gpu call snippet, is this possibility really dismissed by https://github.com//issues/1416 ?

Environment:

sklearn-intelex + dpcpp_cpp_rt install with conda with max series gpu on intel beta cloud.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions