Skip to content

non_max_suppression is very slow and doesn't appear to have a cuda or multi-threaded implementation #7511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrei-pokrovsky opened this issue Feb 15, 2017 · 13 comments
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:contribution welcome Status - Contributions welcome type:feature Feature requests

Comments

@andrei-pokrovsky
Copy link

It appears that tf.image.non_max_suppression currently takes about 200ms for about 8000 boxes, runs on a single CPU thread and doesn't have a GPU implementation.

Environment info

Operating System:
Ubuntu 16.04

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):
8.0, 5.1.5

If installed from binary pip package, provide:

  1. A link to the pip package you installed:
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)".
    0.12.0-rc1
@andrei-pokrovsky andrei-pokrovsky changed the title non_max_suppression is very slow. no cuda or multi-threaded implementation for non_max_suppression non_max_suppression is very slow and doesn't appear to have a cuda or multi-threaded implementation Feb 15, 2017
@aselle aselle added type:feature Feature requests stat:contribution welcome Status - Contributions welcome labels Feb 15, 2017
@RobSalzwedel
Copy link

Is there any progress on this feature?

@aluo-x
Copy link

aluo-x commented Aug 18, 2017

Also interested in this as well.

@harishannavajjala
Copy link

Are there any good examples online on how to use tf.image.non_max_suppression ?? And does using this in-built nms function over our own function improves performance of the model ?

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Nov 16, 2017

@harishannavajjala The documentation is pretty clear IMHO, which part do you have questions?
For speed the answer is always the same: it depends on the speed of "your own function" (which only you know) plus the overhead to transfer data between TF and your own function.

@machanic
Copy link

machanic commented Jan 9, 2018

@ppwwyyxx also want GPU version of NMS? Does there any progress here?

@MaskVulcan
Copy link

@ppwwyyxx when I use nms. I set the profile and find mns is in CPU.Is there any way to use NMS on GPU?

@zhenglaizhang
Copy link

This feature will be fantastic

@jola6897
Copy link

In Work-efficient parallel non-maximum suppression for embedded GPU architectures the authors describe how to bring NMS to the GPU.

Also does anyone know what the difference is between the implementation of NonMaxSuppressionV2 and NonMaxSuppression in Tensorflow?

@keineahnung2345
Copy link

keineahnung2345 commented Sep 15, 2018

@jonla1 Someone has implemented a CUDA version based on the mentioned paper here:
https://github.com/jeetkanjani7/Parallel_NMS
And following is my PyCUDA version:
https://github.com/keineahnung2345/Parallel_NMS/tree/pycuda/PyCUDA

@zjjott
Copy link
Contributor

zjjott commented Jan 25, 2019

+1

@joeyearsley
Copy link
Contributor

https://github.com/zengarden/light_head_rcnn/tree/master/lib/lib_kernel/lib_nms_dev

Here is a GPU version which can be built for TF.

@SpaceInvader61
Copy link

There seems to be a related commit #28745

@github-actions
Copy link

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:contribution welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

No branches or pull requests