Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for configuring deterministic options of cudNN Conv routines #18096

Open
yoavz opened this issue Mar 29, 2018 · 17 comments
Open

Comments

@yoavz
Copy link

@yoavz yoavz commented Mar 29, 2018

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.6
  • Python version: 3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 7.1
  • GPU model and memory: GPU
  • Exact command to reproduce: N/A

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#reproducibility
cudNN documentation indicates that there are several routine options for cudnnConvolutionBackwardFilter, cudnnConvolutionBackwardData, and cudnnPoolingBackward operations. They default to non-deterministic atomic operations, but have the option to run in a deterministic mode. To achieve determinism on TensorFlow GPU, I would like to be able to make this performance trade-off, but currently cannot find a way to enable these options in TensorFlow.

Can a user-facing option be added, perhaps in tf.ConfigProto, to configurate these cudNN routines? This could be configured in a similar way as inter_op_parallelism_threads and intra_op_parallelism_threads are set to 1 to achieve determinism on CPU (https://stackoverflow.com/questions/41233635/meaning-of-inter-op-parallelism-threads-and-intra-op-parallelism-threads)

Source code / logs

N/A

@yoavz yoavz changed the title Support for configuring deterministic options of cudNN routines Feature Request: Support for configuring deterministic options of cudNN routines Mar 29, 2018
@jart

This comment has been minimized.

Copy link
Contributor

@jart jart commented Mar 31, 2018

@vrv wrote inter_op_parallelism_threads. This is a feature request that additional tf.ConfigProto fields be added that let us do this in a deterministic way on NVidia hardware.

It's possible #16889 can be rolled into this issue. @ekelsen did work relating to determinism in #12871. @drpngx closed a "we trade determinism for speed" doc contribution in #10636, saying we're working on the problem.

@tensorflowbutler

This comment has been minimized.

Copy link
Member

@tensorflowbutler tensorflowbutler commented Apr 15, 2018

Nagging Assignee @jart: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

1 similar comment
@tensorflowbutler

This comment has been minimized.

Copy link
Member

@tensorflowbutler tensorflowbutler commented Apr 30, 2018

Nagging Assignee @jart: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler

This comment has been minimized.

Copy link
Member

@tensorflowbutler tensorflowbutler commented May 15, 2018

Nagging Assignee @jart: It has been 44 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@jart

This comment has been minimized.

Copy link
Contributor

@jart jart commented May 15, 2018

I'm going to close this out as a duplicate of #16889. Please follow that issue for updates.

@jart jart closed this May 15, 2018
@yoavz

This comment has been minimized.

Copy link
Author

@yoavz yoavz commented May 16, 2018

I don't believe that this issue is a duplicate of #16889. #16889 refers to achieving determinism on CPU -- this feature request refers to enabling determinism on GPU via surfacing cudNN routine options. @jart -- can you take another look at this issue?

@jart

This comment has been minimized.

Copy link
Contributor

@jart jart commented May 16, 2018

There were GPU Non-determinism Docs contributed in #10636. @drpngx closed the PR last year, mentioning it'd be obsolete soon. I'll reopen and assign to him.

@jart jart reopened this May 16, 2018
@jart jart assigned drpngx and unassigned jart May 16, 2018
@drpngx drpngx assigned ekelsen and unassigned drpngx May 16, 2018
@drpngx

This comment has been minimized.

Copy link
Contributor

@drpngx drpngx commented May 16, 2018

/CC @protoget

@protoget protoget changed the title Feature Request: Support for configuring deterministic options of cudNN routines Feature Request: Support for configuring deterministic options of cudNN Conv routines May 16, 2018
@protoget

This comment has been minimized.

Copy link
Member

@protoget protoget commented May 16, 2018

If you're referring to conv ops, TF currently does autotuning underneath to pick the best algorithm for the input shape, by first running a few trial steps. I imagine disabling autotune would give you determinism in conv.

You can set TF_CUDNN_USE_AUTOTUNE env var to '0'.

@yzhwang to confirm.

@yoavz

This comment has been minimized.

Copy link
Author

@yoavz yoavz commented May 16, 2018

In my opinion, it would be helpful to get a fine-grained option to enable / disable the specific cudNN operations that are non-deterministic. For example, theano's config surfaces the config.dnn.conv.algo_bwd_filter and config.dnn.conv.algo_bwd_data options [1] for the aforementioned conv ops.

This is opposed to using a big hammer to enable or disabling autotuning for all cudNN routines, which couples determinism to how autotuning logic behaves.

[1] http://deeplearning.net/software/theano/library/config.html

@yzhwang

This comment has been minimized.

Copy link
Contributor

@yzhwang yzhwang commented May 16, 2018

Completely agree with a separate knob on determinism. As currently we do not have any immediate plan on adding that, I added a contribution welcomed tag to this issue.

@tensorflowbutler

This comment has been minimized.

Copy link
Member

@tensorflowbutler tensorflowbutler commented Jun 2, 2018

Please remove the assignee, as this issue is inviting external contributions. Otherwise, remove the contributions welcome label. Thank you.

@AdrienDeliege

This comment has been minimized.

Copy link

@AdrienDeliege AdrienDeliege commented Oct 5, 2018

Hi. Is there any update on this feature request? It would indeed be a valuable tool to be able to switch on/off determinism.

@Kaju-Bubanja

This comment has been minimized.

Copy link

@Kaju-Bubanja Kaju-Bubanja commented Apr 10, 2019

Hi. Is there any update on this feature request? It would indeed be a valuable tool to be able to switch on/off determinism.

1 similar comment
@Freyb

This comment has been minimized.

Copy link

@Freyb Freyb commented Jul 12, 2019

Hi. Is there any update on this feature request? It would indeed be a valuable tool to be able to switch on/off determinism.

@bersbersbers

This comment has been minimized.

Copy link

@bersbersbers bersbersbers commented Sep 30, 2019

From TF2 release notes:

Add environment variable TF_CUDNN_DETERMINISTIC. Setting to TRUE or "1" forces the selection of deterministic cuDNN convolution and max-pooling algorithms. When this is enabled, the algorithm selection procedure itself is also deterministic.

@duncanriach

This comment has been minimized.

Copy link
Contributor

@duncanriach duncanriach commented Oct 8, 2019

To add more information to @bersbersbers' report above, this feature has been added to TensorFlow with the following PRs: 24747, 25796, 29667, 31389, and 32979. Related PRs: 25269 and 31465.

For more information about GPU determinism in TensorFlow, please see: https://github.com/NVIDIA/tensorflow-determinism

This feature request can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.