RFC: TensorFloat-32 support in TensorFlow #247

reedwm · 2020-05-21T02:38:53Z

This RFC will be open for comment until Wednesday, June 3rd, 2020.

Status	Proposed
RFC #	247
Author(s)	Reed Wanderman-Milne (reedwm@google.com)
Sponsor	Sanjoy Das (sanjoy@google.com)
Updated	2020-05-20

Objective

Allow TensorFloat-32 to be used in TensorFlow to improve performance.

Also fix date format

sanjoy · 2020-05-21T02:43:38Z

rfcs/20200520-tensor-float-32.md

+
+## Motivation
+
+NVIDIA Ampere, an upcoming generation of NVidia GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.


Link to announcement?

sanjoy · 2020-05-21T02:44:08Z

rfcs/20200520-tensor-float-32.md

+
+NVIDIA Ampere, an upcoming generation of NVidia GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.
+TF32 has the range of float32/bfloat16 (i.e. 8 bits of exponent) and the precision of fp16 (i.e. 10 bits of mantissa).
+For the most part, it is not an in-memory format, but tensor cores natively support it as a computation format.


Why "For the most part"?

Removed this part. There is an intrinsic to convert from float32 to tf32, so technically it can be stored in memory, but I don't think "for the most part" clarified anything.

sanjoy · 2020-05-21T02:45:09Z

rfcs/20200520-tensor-float-32.md

+TF32 has the range of float32/bfloat16 (i.e. 8 bits of exponent) and the precision of fp16 (i.e. 10 bits of mantissa).
+For the most part, it is not an in-memory format, but tensor cores natively support it as a computation format.
+TF32 should not be thought of as an in-memory dtype but instead a computation mode that increases performance and decreases numeric precision for certain float32 operations.
+Nvidia has not found any cases where TF32 reduces the convergence of deep learning models.


I believe this should be NVIDIA as per the NVIDIA Brand Guidelines.

sanjoy · 2020-05-21T02:47:32Z

rfcs/20200520-tensor-float-32.md

+
+Since TF32 only affects Ampere GPUs, moving an op to a GPU can affect numerics. Grappler and other graph optimizations will not consider this, and will freely move ops between devices without regard to numeric stability. As a result, explicitly putting an op on the CPU does not ensure it will use the full float32 precision instead of TF32.
+
+Since TensorFlow 2.3 will not support CUDA 11, which is required for TF32, this API will first be exposed in TensorFlow 2.4. However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3, so they can offer TF32 support to their customers who use TensorFlow 2.3.


The "However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3..." sentence should not be part of the RFC I believe.

Why not? It provides information on when the API will be available. And also motivation for why we need to have the RFC so early despite TF 2.4 not coming out for months.

Replacing "google cloud will likely" with "downstream repackagers of tensorflow (such as google cloud) are encouraged to" will make this read better

sanjoy · 2020-05-21T02:50:43Z

rfcs/20200520-tensor-float-32.md

+3. Do not turn it on by default.
+
+
+The advantage of (1) is that all Ampere float32 users get the performance benefit unless they opt out. Additionally, Ampere numerics will not be loosened in a new release: TensorFlow 2.4 will be the first release with Ampere support, and it will immediately default to TF32 being enabled. The disadvantage is that we cannot collect as much feedback from users before defaulting to TF32, because no stable version of TensorFlow will support TF32 but not have it enabled by default.


because no stable version of TensorFlow will support TF32 but not have it enabled by default.

I don't buy this: the models that TF32 targets use FP32 today, so I'd expect users to notice a regression even if 2.4 enables it by default, which they can corroborate further by comparing the accuracy with disabling it explicitly.

I don't fully understand your argument. We'd like to have a release where users can try TF32 and give us feedback before we decide to whether to turn it on by default. If we immediately turn it on by default in 2.4, users can still give feedback, but it will be too late: we will have already made our decision.

but it will be too late: we will have already made our decision.

Is the assumption that disabling tf32 by default (if users report problems after we enable it by default in 2.4) is more of a breaking change than enabling it by default (if users try it with 2.4 and don't report problems)?

No, enabling tf32 is probably more of a breaking change. However, we only want to make such a change at most once. After enabling tf32, I don't think we should subsequently disable it.

byronyi · 2020-05-21T03:17:16Z

rfcs/20200520-tensor-float-32.md

+
+### Remote devices
+
+Enabling TF32 will affect remote Ampere GPUs in addition to local Ampere GPUs. In particular, it will affect devices on hosts connected to via [`tf.config.experimental_connect_to_host`](https://www.tensorflow.org/api_docs/python/tf/config/experimental_connect_to_host) or [`tf.config.experimental_connect_to_cluster`](https://www.tensorflow.org/api_docs/python/tf/config/experimental_connect_to_cluster). The initial, unexposed version of the function in TensorFlow 2.3 may only support local devices, not remote devices, if we do not have time to implement remote device support.


Any specific additional efforts needed here to support remote devices?

I haven't worked this out yet, which is why I state this might not be done for TensorFlow 2.3. This will likely be done by adding a field to the CreateContextRequest proto to indicate whether TF32 is used.

So it will not be part of the cluster_device_attributes, but a new field? (Also see my comment below on updating remote context)

Yes, we will add a new field. See my reply to your other comment.

byronyi · 2020-05-21T03:17:51Z

rfcs/20200520-tensor-float-32.md

+In TensorFlow, TF32 can be enabled for supported ops on Ampere GPUs with the following call:
+
+```python
+tf.config.allow_tensor_float_32_execution(True)


Why not using the Keras mixed precision policy API?

This affects ops outside Keras, so it shouldn't be under tf.keras. In a sense, TF32 is a form of mixed precision, as some ops use TF32 and others use float32. We could put it under tf.mixed_precision, but I think tf.config is better since tf32 should be though of as a mode, not a dtype.

Also the mixed precision API mostly changes the dtype of tensors, while tf32 doesn't affect tensor dtype (afaict) just the dtype of accumulators inside ops.

alextp · 2020-05-21T16:02:36Z

rfcs/20200520-tensor-float-32.md

+In TensorFlow, TF32 can be enabled for supported ops on Ampere GPUs with the following call:
+
+```python
+tf.config.allow_tensor_float_32_execution(True)


Also the mixed precision API mostly changes the dtype of tensors, while tf32 doesn't affect tensor dtype (afaict) just the dtype of accumulators inside ops.

alextp · 2020-05-21T16:03:50Z

rfcs/20200520-tensor-float-32.md

+
+Since TF32 only affects Ampere GPUs, moving an op to a GPU can affect numerics. Grappler and other graph optimizations will not consider this, and will freely move ops between devices without regard to numeric stability. As a result, explicitly putting an op on the CPU does not ensure it will use the full float32 precision instead of TF32.
+
+Since TensorFlow 2.3 will not support CUDA 11, which is required for TF32, this API will first be exposed in TensorFlow 2.4. However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3, so they can offer TF32 support to their customers who use TensorFlow 2.3.


Replacing "google cloud will likely" with "downstream repackagers of tensorflow (such as google cloud) are encouraged to" will make this read better

alextp · 2020-05-21T16:05:28Z

rfcs/20200520-tensor-float-32.md

+
+Another advantage of turning on TF32 by default is that it makes TensorFlow’s behavior with GPUs more consistent with TPUs. TPUs internally use lower precision for float32 matmuls and convolutions, similar to how Ampere GPUs will use lower precision for float32 matmuls and convolutions if TF32 is enabled.
+
+**If you know of any models whose accuracy may be impacted by TF32, please comment on this RFC.** Note that TF32 is equivalent to float32 except it has 10 bits of mantissa instead of 23 bits. It will initially be used only for matmuls and convolutions, but may be used for other ops in the future if they are implemented in terms of a matmul. Once TensorFlow 2.4 is released, you will be able to test the impact of TF32 on your models if you have Ampere GPUs. You will be able to test earlier if you use Tensorflow nightly packages, and even earlier if you build from source with CUDA 11 support.


You might want to indicate a way to receive private feedback about this too.

Do you think it's likely someone would be willing to share with us but not publicly? I could recommend emailing me for private feedback, but I would rather people post feedback publicly since I want to be transparent about why we make whatever decision we make.

+1 for the transparency.

We don't have to recommend it, but not everyone may be at freedom to talk about what they're working on in a public forum. So mentioning a private channel seems like a good idea.

Ok I'll mention this but state we much prefer it be posted publicly, even if that requires being vague about the use case. We should list at least two emails in case one of us is sick. @sanjoy should I list my email and yours?

karmel · 2020-05-21T18:41:18Z

rfcs/20200520-tensor-float-32.md

+
+The word "allow" emphasizes only certain devices (Ampere GPUs) and ops (such as matmuls and convolutions) will be affected. Once enabled, all local and remote Ampere GPUs use TF32 for supported float32 ops.
+
+Passing `False` to `allow_tensor_float_32_execution` will disable TF32 if already enabled.


What are the use-cases for toggling between these? Are there potentially issues with moving between the two in a single program?

Hmmm there isn't a strong use case. I added the sentences

This is useful if multiple models are run sequentially in the same process, where only some should use TF32. It is also useful for tests, as it allows a test class to test both TF32 being enabled and disabled.

Admittedly, this is a fairly weak use case, but I think it's still probably worth having. If others disagree, I'd be happy to remove this.

Also of note, implementing this will require an RPC to enable/disable TF32 even after the eager context has been created, in order to support remote devices.

Does the UpdateContext RPC support this particular use case? I imagine one could use the cluster_device_attributes field to pass on the TF32 mode toggle, but it looks like it will update the remote session when cluster_device_attributes is not empty in current codebase. Not sure if setting TF32 worth carrying out such a heavy(?) operation.

Right now UpdateContext is only used to add/remove machines/devices. But we can add a field to allow it to also update whether TF32 is used. We can altneratively add an option to QueueItem in case there are ordering concerns between enabling/disabling TF32 and executing an op.

I asked internally, and cluster_device_attributes is only useful for propogating device information to other machines in the cluster. It is not intended for use in communicating information about how ops should run, but only fundamental information about the devices themselves.

See this post for some more context.

Not sure if setting TF32 worth carrying out such a heavy(?) operation.

Yeah, this will be a heavy operation. But users should only set/unset TF32 very rarely: only at the beginning of the model and between running one model and the next (or one test and the next). So I think it's OK.

I updated the RFC based on this discussion. I added a paragraph to the "Remote Devices" section and added a new paragraph to "Alternatives considered"

karmel · 2020-05-21T18:43:25Z

rfcs/20200520-tensor-float-32.md

+
+Another advantage of turning on TF32 by default is that it makes TensorFlow’s behavior with GPUs more consistent with TPUs. TPUs internally use lower precision for float32 matmuls and convolutions, similar to how Ampere GPUs will use lower precision for float32 matmuls and convolutions if TF32 is enabled.
+
+**If you know of any models whose accuracy may be impacted by TF32, please comment on this RFC.** Note that TF32 is equivalent to float32 except it has 10 bits of mantissa instead of 23 bits. It will initially be used only for matmuls and convolutions, but may be used for other ops in the future if they are implemented in terms of a matmul. Once TensorFlow 2.4 is released, you will be able to test the impact of TF32 on your models if you have Ampere GPUs. You will be able to test earlier if you use Tensorflow nightly packages, and even earlier if you build from source with CUDA 11 support.


Are there users that NVIDIA can help us find directly?

Nvidia will probably directly collect feedback and tell us. They have already tested themselves on many models.

karmel · 2020-05-21T18:45:46Z

rfcs/20200520-tensor-float-32.md

+tf.config.allow_tensor_float_32_execution(True)
+```
+
+The word "allow" emphasizes only certain devices (Ampere GPUs) and ops (such as matmuls and convolutions) will be affected. Once enabled, all local and remote Ampere GPUs use TF32 for supported float32 ops.


Should an error be raised (or a warning) if allow=True and the device does not support TF32? One could imagine users being surprised that no complaint is raised when when this mode is requested. I guess in that case the flag would be "use_tensor_float_32_execution" instead of allow... but maybe explicit is preferable here?

I considered this and the original draft did warn. But I think we should encourage users to put the allow_tensor_float_32_execution(True) line at the top of their program unconditionally and not warn. Otherwise, to avoid the warning, model code with have to check whether the GPUs support TF32 and only allow TF32 if the GPU supports it.

ematejska

Design review notes: This has been accepted.

Create RFC for TensorFloat-32 support in TensorFlow

d25b2e5

reedwm requested review from ematejska, ewilderj, martinwicke and theadactyl as code owners May 21, 2020 02:38

googlebot added the cla: yes label May 21, 2020

Update RFC link to point to PR

2826217

Also fix date format

sanjoy suggested changes May 21, 2020

View reviewed changes

Address review comments.

3664d6b

byronyi reviewed May 21, 2020

View reviewed changes

alextp reviewed May 21, 2020

View reviewed changes

Address review comments

b2d5d47

karmel reviewed May 21, 2020

View reviewed changes

Address comments

1d34ede

This was referenced May 24, 2020

Allow CuDNN 7.2 to convert float -> fp16 for tensor core ops tensorflow/tensorflow#36036

Closed

Tf32 tensorflow/tensorflow#39764

Merged

Add some more info on remote devices

ec1dff5

ematejska approved these changes Jun 3, 2020

View reviewed changes

Marked as accepted.

6626197

ematejska added the RFC: Accepted RFC Design Document: Accepted by Review label Jun 3, 2020

ematejska merged commit 1cb9a94 into tensorflow:master Jun 3, 2020


		## Motivation

		NVIDIA Ampere, an upcoming generation of NVidia GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.


		Since TF32 only affects Ampere GPUs, moving an op to a GPU can affect numerics. Grappler and other graph optimizations will not consider this, and will freely move ops between devices without regard to numeric stability. As a result, explicitly putting an op on the CPU does not ensure it will use the full float32 precision instead of TF32.

		Since TensorFlow 2.3 will not support CUDA 11, which is required for TF32, this API will first be exposed in TensorFlow 2.4. However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3, so they can offer TF32 support to their customers who use TensorFlow 2.3.

		3. Do not turn it on by default.


		The advantage of (1) is that all Ampere float32 users get the performance benefit unless they opt out. Additionally, Ampere numerics will not be loosened in a new release: TensorFlow 2.4 will be the first release with Ampere support, and it will immediately default to TF32 being enabled. The disadvantage is that we cannot collect as much feedback from users before defaulting to TF32, because no stable version of TensorFlow will support TF32 but not have it enabled by default.


		### Remote devices

		Enabling TF32 will affect remote Ampere GPUs in addition to local Ampere GPUs. In particular, it will affect devices on hosts connected to via [`tf.config.experimental_connect_to_host`](https://www.tensorflow.org/api_docs/python/tf/config/experimental_connect_to_host) or [`tf.config.experimental_connect_to_cluster`](https://www.tensorflow.org/api_docs/python/tf/config/experimental_connect_to_cluster). The initial, unexposed version of the function in TensorFlow 2.3 may only support local devices, not remote devices, if we do not have time to implement remote device support.


		Another advantage of turning on TF32 by default is that it makes TensorFlow’s behavior with GPUs more consistent with TPUs. TPUs internally use lower precision for float32 matmuls and convolutions, similar to how Ampere GPUs will use lower precision for float32 matmuls and convolutions if TF32 is enabled.

		If you know of any models whose accuracy may be impacted by TF32, please comment on this RFC. Note that TF32 is equivalent to float32 except it has 10 bits of mantissa instead of 23 bits. It will initially be used only for matmuls and convolutions, but may be used for other ops in the future if they are implemented in terms of a matmul. Once TensorFlow 2.4 is released, you will be able to test the impact of TF32 on your models if you have Ampere GPUs. You will be able to test earlier if you use Tensorflow nightly packages, and even earlier if you build from source with CUDA 11 support.


		The word "allow" emphasizes only certain devices (Ampere GPUs) and ops (such as matmuls and convolutions) will be affected. Once enabled, all local and remote Ampere GPUs use TF32 for supported float32 ops.

		Passing `False` to `allow_tensor_float_32_execution` will disable TF32 if already enabled.

RFC: TensorFloat-32 support in TensorFlow #247

RFC: TensorFloat-32 support in TensorFlow #247

Uh oh!

Conversation

reedwm commented May 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reedwm commented May 21, 2020 •

edited

Loading

ematejska left a comment •

edited

Loading