Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results from tf.raw_ops.LRNGrad between CPU and GPU #56849

Open
enderdzz opened this issue Jul 21, 2022 · 6 comments
Open

Inconsistent results from tf.raw_ops.LRNGrad between CPU and GPU #56849

enderdzz opened this issue Jul 21, 2022 · 6 comments
Assignees
Labels
comp:gpu GPU related issues comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug

Comments

@enderdzz
Copy link

enderdzz commented Jul 21, 2022

Click to expand!

Issue Type

Bug

Source

source

Tensorflow Version

2.10.0

Custom Code

Yes

OS Platform and Distribution

Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

3.8.10

Bazel version

5.1.1

GCC/Compiler version

9.4.0

CUDA/cuDNN version

11.2

GPU model and memory

RTX 3090 2*24G

Current Behaviour?

The results of `LRNGrad` operators are inconsistent between CPU and GPU.

I've used two calls `tf.raw_ops.LRNGrad` and `nn.lrn_grad`,
and also changed the order of calls for different devices, 
still inconsistent results.

Standalone code to reproduce the issue

import tensorflow as tf
from tensorflow.python.ops import nn
from tensorflow.python.ops import random_ops

# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/lrn_op.cc

input_grads = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
input_img   = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
output_img  = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)


with tf.device('/GPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

with tf.device('/CPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

Relevant log output

# python LRNgrad-test.py
2022-07-21 12:54:57.023583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:54:57.132843: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 12:54:57.161994: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-07-21 12:55:00.101305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:55:01.526010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22298 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6
2022-07-21 12:55:01.527383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22298 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:c8:00.0, compute capability: 8.6
2022-07-21 12:55:02.763201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100

tf.Tensor([[[[-0.29212222  0.97755533 -0.28474247]]]], shape=(1, 1, 1, 3), dtype=float32)

tf.Tensor([[[[2362.0498 1360.1172 2242.2402]]]], shape=(1, 1, 1, 3), dtype=float32)
@google-ml-butler google-ml-butler bot added the type:bug Bug label Jul 21, 2022
@tilakrayal tilakrayal added TF 2.9 Issues found in the TF 2.9 release (or RCs) comp:ops OPs related issues comp:gpu GPU related issues labels Jul 22, 2022
@tilakrayal
Copy link
Contributor

@gadagashwini,
I was able reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

@sachinprasadhs
Copy link
Contributor

@enderdzz , I tried to reproduce your issue in Tensorflow 2.9.1 version, I'm getting different error, could you please take a look into the gist here and make necessary changes. Thanks!

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Jul 28, 2022
@enderdzz
Copy link
Author

Hi, thank you for your reply : )

I installed version 2.9.1 locally (cuda 11.2) and ran the same code without any problems, still got inconsistent results. I suspect that the GPU environment of the remote colab is AMD ROCm, so the problem you encountered will occur, refer to: ROCm/ROCm#684

So it is better to change to cuda environment to test this code.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jul 29, 2022
@sachinprasadhs
Copy link
Contributor

Colab Gpu uses Nvidia and Cuda.
Below is the result you will get when you do !nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 55C P0 29W / 70W | 464MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Aug 2, 2022
@enderdzz
Copy link
Author

enderdzz commented Aug 3, 2022

Ok. I have no idea about this problem now.
Hope to get help from related TF developers.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 3, 2022
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 3, 2022
@jurahul jurahul assigned reedwm and unassigned jurahul Aug 3, 2022
@reedwm
Copy link
Member

reedwm commented Aug 3, 2022

The CPU and GPU gradients are the same if I change output_img to be

output_img = tf.nn.local_response_normalization(input_img)

I think it's reasonable for the CPU and GPU gradients to return different results if you pass invalid values to the output_image value. output_image must be the correct forward-pass output given the input_image, and if you pass an invalid value for output_image, the op has no well-defined semantics.

@rohan100jain, do you agree that it's OK for the gradient op to return different results on the CPU and GPU if given an invalid output from the forward pass?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues comp:ops OPs related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants