Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore processing of query points #4

Closed
juldaani opened this issue Dec 18, 2020 · 6 comments
Closed

Multicore processing of query points #4

juldaani opened this issue Dec 18, 2020 · 6 comments

Comments

@juldaani
Copy link

juldaani commented Dec 18, 2020

Hey,

Another feature I've in mind is if it is possible to implement parallellized multicore processing for queries. For instance, in Scipy cKDTree (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query) you can define number of cores with n_jobs parameter.

What I mean is that we could set 'n_cores' parameter when executing the query:

import pynanoflann
import numpy as np

target = np.random.rand(10000, 3)
query = np.random.rand(2000, 3)

kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(target)
d, nn_idx = kd_tree.kneighbors(query, n_cores=10)

This would create a single kd-tree for the data in 'target' and then split the data in 'query' for multiple cores to do the queries.

Python implementation of this will introduce quite much overhead when working with relatively small data sets. I assume that this would be much faster when implemented in c++ side.

Best regards
Juho

@u1234x1234
Copy link
Owner

Hi Juho,

I've added an option for multicore querying kd_tree.kneighbors(pts_query, n_jobs=4):

pip install git+https://github.com/u1234x1234/pynanoflann.git@0.0.4

Full example:
https://github.com/u1234x1234/pynanoflann/blob/master/tests/test_multithreaded.py#L19

The speed-up mostly depends on the number of queries. In the case of a small query set (2000, like in your example) it is slightly faster than single core performance, but for large query sets (I tried few millions of points) the speedup is almost linear.

@juldaani
Copy link
Author

Thanks for the fast response.

I will test this today after the dinner.

I think you are right that for small queries the performance doesn't differ much between multi and single core. However, I'm also using kd-trees for very large point clouds so this feature is more than welcome. Thanks a lot.

@juldaani
Copy link
Author

I tested the feature and it seems that there is a minor bug in the indexes.

If i run the following code the indexes do not match:

import pynanoflann
import numpy as np


a = np.array([[-1.65822375,  8.29091549,  1.69109333],
       [-2.55392027,  9.14873791,  0.57508534],
       [-0.48824483, 12.70754147, -0.77094924],
       [ 2.60444188,  1.768641  ,  1.20891106],
       [ 3.2674644 , -4.37506008, -0.87698722],
       [-5.2069087 ,  7.67119074, -1.28311527],
       [11.18597317, -5.59029245,  0.74802291],
       [ 1.45142472,  5.49197388,  2.08605385],
       [ 4.44315147, 12.89360046,  1.21553278],
       [ 1.40066028,  6.81189775,  1.32086086],
       [-0.96798885,  8.48570728, -1.57162476],
       [ 7.39387703,  2.37927413,  1.0226326 ],
       [ 0.9753269 ,  8.57738113,  1.75330448],
       [ 4.06463003, 11.8688879 ,  0.10702024],
       [-2.6403656 ,  1.09355235, -1.06388557],
       [-4.55271149,  9.36354256,  2.45670676],
       [-3.24715614, -1.84484696, -0.88164014],
       [ 0.67729777, -1.50539744,  0.43235415],
       [-5.5375247 ,  7.23835421, -0.67688894],
       [-1.07082331, -3.00007129, -1.66152179],
       [ 3.6330297 , -4.45702457, -0.62901032],
       [ 1.88146007, 15.80526638,  1.91470706],
       [ 6.26283598,  5.25627804,  0.94044268],
       [ 7.60514402, -5.0185051 ,  0.18425676],
       [-0.50298601, 13.87367153, -1.1920346 ],
       [-2.86218667,  5.47483587, -1.24373996],
       [ 0.54232329, 15.88754368,  0.27608338],
       [-3.91043758,  7.08590221,  2.5814743 ],
       [-3.41587186,  8.19709778,  0.76717484],
       [-0.99566239,  5.38674688,  1.80337858]])

b = np.array([[-0.96798885,  8.48570728, -1.57162476],
       [ 7.39387703,  2.37927413,  1.0226326 ],
       [ 0.9753269 ,  8.57738113,  1.75330448],
       [ 4.06463003, 11.8688879 ,  0.10702024],
       [-2.6403656 ,  1.09355235, -1.06388557],
       [-4.55271149,  9.36354256,  2.45670676],
       [-3.24715614, -1.84484696, -0.88164014],
       [ 0.67729777, -1.50539744,  0.43235415],
       [-5.5375247 ,  7.23835421, -0.67688894],
       [-1.07082331, -3.00007129, -1.66152179],
       [ 3.6330297 , -4.45702457, -0.62901032],
       [ 1.88146007, 15.80526638,  1.91470706],
       [ 6.26283598,  5.25627804,  0.94044268],
       [ 7.60514402, -5.0185051 ,  0.18425676],
       [-0.50298601, 13.87367153, -1.1920346 ],
       [-2.86218667,  5.47483587, -1.24373996],
       [ 0.54232329, 15.88754368,  0.27608338],
       [-3.91043758,  7.08590221,  2.5814743 ],
       [-3.41587186,  8.19709778,  0.76717484],
       [-0.99566239,  5.38674688,  1.80337858]])


kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(b)

d, nn_idx = kd_tree.kneighbors(a)
d2, nn_idx2 = kd_tree.kneighbors(a, n_jobs=4)

assert np.allclose(d, d2)
assert (nn_idx == nn_idx2).all()

@u1234x1234
Copy link
Owner

Thank you for the feedback. Fixed:

pip install git+https://github.com/u1234x1234/pynanoflann.git@0.0.5

@juldaani
Copy link
Author

I tested the fix and it seems to work.

Thank you very much.

@u1234x1234
Copy link
Owner

Feel free to reopen if you find any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants