Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

Closed
cancan101 opened this issue Apr 3, 2017 · 5 comments

Comments

@cancan101
Copy link
Contributor

cancan101 commented Apr 3, 2017

Following up on #7187 (comment), why does Tensorflow use cudnnGetConvolutionForwardAlgorithm rather than cudnnFindConvolutionForwardAlgorithmEx? It looks like Tensorflow tries to do the more complete profiling itself.

For reference, cudnnGetConvolutionForwardAlgorithm serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionForwardAlgorithm.

Whereas:
cudnnFindConvolutionForwardAlgorithmEx function attempts all available cuDNN algorithms for cudnnConvolutionForward, using user-allocated GPU memory, and outputs performance metrics to a user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.

Looking at a number of other DNN, they seem to use cudnnFindConvolutionForwardAlgorithmEx / cudnnFindConvolutionForwardAlgorithm:

  • pytorch (when benchmark is on):
  • Theano (if time_once or time_on_shape_change)
  • cntk (non-static finder)

/CC @Yangqing @zheng-xq

@asimshankar
Copy link
Contributor

@zheng-xq @vrv : Might one of you have some historical background on this choice, or general comments?

@asimshankar asimshankar added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 3, 2017
@zheng-xq
Copy link
Contributor

zheng-xq commented Apr 3, 2017

cudnnGetConvolutionForwardAlgorithm is the fallback path. By default, TensorFlow does the autotuning by itself before cudnnFindConvolutionForwardAlgorithmEx is available. Also the custom implementation enables us to filter out the noise through multiple run steps. At this point, cudnnFindConvolutionForwardAlgorithmEx doesn't seem to offer more functionalities to justify a change.

In the future, the plan is to autotune both Cudnn algorithms and other custom kernels together, so we can also pick the fastest among both worlds.

@asimshankar asimshankar added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 3, 2017
@asimshankar
Copy link
Contributor

@cancan101 : Does that answer your question? (Will wait a while before closing this out as intended behavior)

@cancan101
Copy link
Contributor Author

Yea, it does make sense. As an aside, might be nice to logout the results of the profiling runs. I think pytorch / torch7 has an option to do this.

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 3, 2017
@asimshankar
Copy link
Contributor

Thanks. Closing this out.

It might make sense for the selected algorithm to be logged either to the logging system or maybe in the RunMetadata protocol buffer. If you'd like to make a contribution towards that, we'll be glad to take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants