-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928
Comments
cudnnGetConvolutionForwardAlgorithm is the fallback path. By default, TensorFlow does the autotuning by itself before cudnnFindConvolutionForwardAlgorithmEx is available. Also the custom implementation enables us to filter out the noise through multiple run steps. At this point, cudnnFindConvolutionForwardAlgorithmEx doesn't seem to offer more functionalities to justify a change. In the future, the plan is to autotune both Cudnn algorithms and other custom kernels together, so we can also pick the fastest among both worlds. |
@cancan101 : Does that answer your question? (Will wait a while before closing this out as intended behavior) |
Yea, it does make sense. As an aside, might be nice to logout the results of the profiling runs. I think pytorch / torch7 has an option to do this. |
Thanks. Closing this out. It might make sense for the selected algorithm to be logged either to the logging system or maybe in the |
Following up on #7187 (comment), why does Tensorflow use
cudnnGetConvolutionForwardAlgorithm
rather thancudnnFindConvolutionForwardAlgorithmEx
? It looks like Tensorflow tries to do the more complete profiling itself.For reference,
cudnnGetConvolutionForwardAlgorithm
serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please usecudnnFindConvolutionForwardAlgorithm
.Whereas:
cudnnFindConvolutionForwardAlgorithmEx
function attempts all available cuDNN algorithms for cudnnConvolutionForward, using user-allocated GPU memory, and outputs performance metrics to a user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.Looking at a number of other DNN, they seem to use
cudnnFindConvolutionForwardAlgorithmEx
/cudnnFindConvolutionForwardAlgorithm
:time_once
ortime_on_shape_change
)/CC @Yangqing @zheng-xq
The text was updated successfully, but these errors were encountered: