cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

cancan101 · 2017-04-03T16:38:37Z

Following up on #7187 (comment), why does Tensorflow use cudnnGetConvolutionForwardAlgorithm rather than cudnnFindConvolutionForwardAlgorithmEx? It looks like Tensorflow tries to do the more complete profiling itself.

For reference, cudnnGetConvolutionForwardAlgorithm serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionForwardAlgorithm.

Whereas:
cudnnFindConvolutionForwardAlgorithmEx function attempts all available cuDNN algorithms for cudnnConvolutionForward, using user-allocated GPU memory, and outputs performance metrics to a user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.

Looking at a number of other DNN, they seem to use cudnnFindConvolutionForwardAlgorithmEx / cudnnFindConvolutionForwardAlgorithm:

pytorch (when benchmark is on):
Theano (if time_once or time_on_shape_change)
cntk (non-static finder)

/CC @Yangqing @zheng-xq

The text was updated successfully, but these errors were encountered:

asimshankar · 2017-04-03T17:39:47Z

@zheng-xq @vrv : Might one of you have some historical background on this choice, or general comments?

zheng-xq · 2017-04-03T17:50:23Z

cudnnGetConvolutionForwardAlgorithm is the fallback path. By default, TensorFlow does the autotuning by itself before cudnnFindConvolutionForwardAlgorithmEx is available. Also the custom implementation enables us to filter out the noise through multiple run steps. At this point, cudnnFindConvolutionForwardAlgorithmEx doesn't seem to offer more functionalities to justify a change.

In the future, the plan is to autotune both Cudnn algorithms and other custom kernels together, so we can also pick the fastest among both worlds.

asimshankar · 2017-04-03T18:02:44Z

@cancan101 : Does that answer your question? (Will wait a while before closing this out as intended behavior)

cancan101 · 2017-04-03T18:06:17Z

Yea, it does make sense. As an aside, might be nice to logout the results of the profiling runs. I think pytorch / torch7 has an option to do this.

asimshankar · 2017-04-04T00:03:39Z

Thanks. Closing this out.

It might make sense for the selected algorithm to be logged either to the logging system or maybe in the RunMetadata protocol buffer. If you'd like to make a contribution towards that, we'll be glad to take a look!

asimshankar added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 3, 2017

asimshankar added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 3, 2017

aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 3, 2017

asimshankar closed this as completed Apr 4, 2017

cancan101 mentioned this issue Apr 4, 2017

Log Selected Convolution Algorithm #8941

Closed

Yangqing mentioned this issue Aug 31, 2017

enable CuDNN's autotuner flag in PyTorch ilkarman/DeepLearningFrameworks#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

cancan101 commented Apr 3, 2017 •

edited

asimshankar commented Apr 3, 2017

zheng-xq commented Apr 3, 2017

asimshankar commented Apr 3, 2017

cancan101 commented Apr 3, 2017

asimshankar commented Apr 4, 2017

cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

cudnnFindConvolutionForwardAlgorithmEx vs cudnnGetConvolutionForwardAlgorithm #8928

Comments

cancan101 commented Apr 3, 2017 • edited

asimshankar commented Apr 3, 2017

zheng-xq commented Apr 3, 2017

asimshankar commented Apr 3, 2017

cancan101 commented Apr 3, 2017

asimshankar commented Apr 4, 2017

cancan101 commented Apr 3, 2017 •

edited