Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is NGT::Index::search thread-safe? #117

Closed
wbthomason opened this issue May 13, 2022 · 4 comments
Closed

Is NGT::Index::search thread-safe? #117

wbthomason opened this issue May 13, 2022 · 4 comments

Comments

@wbthomason
Copy link

First off, thank you for NGT! It's quite impressive.

I'm trying to parallelize a large number of queries with OpenMP. My parallelized loop calls search() on a read-only index, then further processes the results. However, I seem to get nondeterministic results from this, and I'm wondering if the problem is that this method is not thread-safe.

I should also note that I've tried both with and without the shared memory allocator - I did not observe a difference in behavior.

@masajiro
Copy link
Member

NGT::Index::search is thread-safe. There are cases where search returns different results for the same query, because search uses randomly select nodes from the leaf of the tree as seeds to explore the graph. However, the possibility is quite low. Also the possibility depends on the indexed dataset.

I would like to confirm that you don't find the issue when you don't use OpenMP. If so, could you provide your simplified search source code and your NGT index? I will check them.

@wbthomason
Copy link
Author

Thanks! I have not been able to reproduce the issue without OpenMP, but it's possible that I'm missing something.

My simplified search code looks like:

struct Edge {
  std::size_t id;
  std::size_t neighborId;
  float distance;
};

#pragma omp declare reduction (merge: std::vector<Edge> : omp_out.insert(omp_out.end(), std::make_move_iterator(omp_in.begin()), std::make_move_iterator(omp_in.end())))

#pragma omp parallel default(none)                                             \
    shared(edges, points, numPoints, radius, indexPath)
{
  NGT::Index localIndex(indexPath, true);
#pragma omp for nowait reduction(merge : edges)
  for (std::size_t i = 0; i < points.size(); ++i) {
    const auto &point = points[i];
    NGT::SearchQuery q(point);
    NGT::ObjectDistances neighbors;
    q.setResults(&neighbors);
    // To ensure that we get everything within range
    q.setSize(numPoints);
    q.setRadius(radius);
// NOTE: Including this critical section removes the bug behavior in my testing - it just also removes the parallelism
// #pragma omp critical 
// {
    localIndex.search(q);
// }
    for (const auto &neighbor : neighbors) {
      // NGT indices start at 1; ours start at 0
      const auto neighborId = neighbor.id - 1;
      if (idx != neighborId) {
        edges.emplace_back(Edge{idx, neighborId, neighbor.distance});
      }
    }
  }
}

points is the entire set of nodes added to the NGT index; numPoints = points.size(), and radius is any number in [0.0, 1.0) - 0.33 typically works for exhibiting this behavior.

I can also provide generation code to produce points and the index, if this would be helpful.

The index: ngt_index.tar.gz

@masajiro
Copy link
Member

Thank you for providing your code and index. However, I could not reproduce differences between parallelism and non-parallelism even by using points extracted from your index. It might depend on the environment. In addition, I am wondering if emplace_back is thread-safe, even though there is no problem on my environment. Apart from this issue, the opened NGT index placed outside of the parallel section can be shared.

@wbthomason
Copy link
Author

Interesting! Thank you for trying to reproduce the issue. The emplace_back should be thread-safe since edges is a reduction variable (meaning that each thread should have a private copy), but perhaps something strange is happening there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants