Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure we rethrow exceptions in async tasks #355

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

zhengbuqian
Copy link
Collaborator

in many places where folly::Future is used, we only wait but not try to get the value. if the async task thrown an exception wait() will unblock but not rethrow it, causing the async thread to crash.

now using folly::collect().get() to make sure we always wait for all futures and rethrow exceptions so callers may catch and handle them.

/kind improvement

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alexanderguzhva
Copy link
Collaborator

No.

Basically, we want to wait for all the futures to finish and only then to process the exception. Alternatively, we could have something that stops further futures from getting spawned if one of them experienced a problem, but this is not really needed.

I've tried the following snippet:

int main() {
  auto pool = ThreadPool::GetGlobalSearchThreadPool(); // say, it can run 4 threads concurrently
  std::vector<folly::Future<folly::Unit>> futures;

  for (size_t i = 0; i < 10; ++i) {
    futures.emplace_back(pool->push([&, id = i]() {
      if (id == 0)
        throw std::runtime_error("foo");

      using namespace std::chrono_literals;
      std::this_thread::sleep_for(2000ms);

      printf("%zd ", id);
    }));
  }

  try {
    folly::collect(futures).get();
  } catch (std::exception &e) {
    std::cout << " caught " << e.what() << std::endl;
  }
  printf(" done\n");
}

And this code may produce the following:

1 2 3  caught foo
 done
4 5 6 7 8 9

or

1 2 3 4 5  caught foo
6  done
7 8 9 

This is wrong, because it means that some futures continue working after the synchronization code folly::collect(futures).get(); finishes.

What we'd like ideally to have is

1 2 3 4 5 6 7 8 9 
caught foo
done

The original snippet

  try {
    for (auto &f : futures)
      f.wait();
    for (auto &f : futures)
      f.result().value();
  } catch (std::exception &e) {
    std::cout << " caught " << e.what() << std::endl;
  }

produces

1 2 3 4 5 6 7 8 9  caught foo
 done

which is a desired behavior.

@alexanderguzhva
Copy link
Collaborator

/hold

…olly::Future::wait but not trying to get the values; use folly::collect to simplify code

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@zhengbuqian
Copy link
Collaborator Author

thanks @alexanderguzhva for the clarification! I should have used collectAll instead of collect, I thought collect waits for all futures.

@mergify mergify bot added the ci-passed label Jan 20, 2024
Copy link
Collaborator

@alexanderguzhva alexanderguzhva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectAll works, but possible memory leaks need to be addressed. Please change T* x = new T[N]; into std::vector<T> x(N); or std::unique_ptr<T[]> x(N); whenever possible.

}

bool failed = TryDiskANNCall([&]() { WaitAllSuccess(futures); }) != Status::success;

if (warmup != nullptr) {
diskann::aligned_free(warmup);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a memory leak in case of exception

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryDiskANNCall catches all exceptions and returns non success Status, thus memory leak won't happen here.

for (auto &future : futures) {
future.wait();
}
knowhere::WaitAllSuccess(futures);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory leaks in case of exception

    if (samples != nullptr)
      delete[] samples;
    if (pq_code != nullptr)
      delete[] pq_code;

for (auto &future : futures) {
future.wait();
}
knowhere::WaitAllSuccess(futures);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a memory leak in case of exception

delete[] stats;

for (auto &future : futures) {
future.wait();
}
knowhere::WaitAllSuccess(futures);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory leaks in case of exception

    delete[] distances;
    delete[] center;

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
@mergify mergify bot removed the ci-passed label Jan 23, 2024
Copy link
Collaborator Author

@zhengbuqian zhengbuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Alex!

}

bool failed = TryDiskANNCall([&]() { WaitAllSuccess(futures); }) != Status::success;

if (warmup != nullptr) {
diskann::aligned_free(warmup);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryDiskANNCall catches all exceptions and returns non success Status, thus memory leak won't happen here.

@mergify mergify bot added the ci-passed label Jan 23, 2024
@alexanderguzhva
Copy link
Collaborator

/lgtm

@alexanderguzhva
Copy link
Collaborator

/unhold

@sre-ci-robot sre-ci-robot merged commit 042d20d into zilliztech:main Jan 23, 2024
9 checks passed
@zhengbuqian zhengbuqian deleted the folly-improvement branch January 24, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants