-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelized batch processing #3
Comments
Hi Juho, I get your point, but it seems like a custom solution, which will require additional efforts as there is no such option in the original nanoflann library, so I have no plans for this now. Contributions are welcome. |
Do you need import pynanoflann
import numpy as np
n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)
distances, nn_indexes = batched_search(target, query, n_neighbors=1, metric='L2', leaf_size=20) |
Hey Dmitrii, Thanks for the very fast response. In my current application I don't need the trees so they can be thrown away. |
I added Parallelized batch processing.
Full example: Internally it works for the list of numpy arrays of arbitrary shapes ie: target = [
np.random.rand(10000, 3),
np.random.rand(5000, 3),
np.random.rand(40000, 3),
] I've implemented a more general interface at the cost of small overhead. |
Hey, I tested the implementation and it works very well. No bugs found. And have to say the library is now very fast with these multicore capabilities. As far as I know this is the fastest kd-tree implementation for python. This helped me a lot and made my algorithm many times faster. Thank you very much. |
I'm glad it helped you. |
Hey,
I'm wondering whether it could be possible to implement multi-core implementation for processing multiple batches of data simultaneously.
My current approach is:
Instead, I'd like to do something like that:
This would create a kd-tree for each batch in 'target'. Corresponding batches in 'query' then would be used to make nearest neighbor searches with corresponding kd-trees.
Currently, if I want to implement this kind of parallelized processing I have do it in python. For large data sets this is not a problem but for smaller data sets it will cause too much overhead. I assume that it would be much faster to implement in c++ side of the code.
Best regards
Juho
The text was updated successfully, but these errors were encountered: