Inconsistent results from `tf.raw_ops.LRNGrad` between CPU and GPU #56849

enderdzz · 2022-07-21T13:10:13Z

Click to expand!

Issue Type

Bug

Source

source

Tensorflow Version

2.10.0

Custom Code

Yes

OS Platform and Distribution

Ubuntu 20.04.4 LTS

Mobile device

No response

Python version

3.8.10

Bazel version

5.1.1

GCC/Compiler version

9.4.0

CUDA/cuDNN version

11.2

GPU model and memory

RTX 3090 2*24G

Current Behaviour?

The results of `LRNGrad` operators are inconsistent between CPU and GPU.

I've used two calls `tf.raw_ops.LRNGrad` and `nn.lrn_grad`,
and also changed the order of calls for different devices, 
still inconsistent results.

Standalone code to reproduce the issue

import tensorflow as tf
from tensorflow.python.ops import nn
from tensorflow.python.ops import random_ops

# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/lrn_op.cc

input_grads = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
input_img   = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)
output_img  = random_ops.random_uniform(
        shape=[1, 1, 1, 3],
        minval=-10000,
        maxval=10000,
        dtype=tf.float32,
        seed=2022)


with tf.device('/GPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

with tf.device('/CPU:0'):
    out = tf.raw_ops.LRNGrad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    #out = nn.lrn_grad(input_grads=input_grads, input_image=input_img, output_image=output_img)
    print(out)

Relevant log output

# python LRNgrad-test.py
2022-07-21 12:54:57.023583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:54:57.132843: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-07-21 12:54:57.161994: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-07-21 12:55:00.101305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-21 12:55:01.526010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22298 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:9b:00.0, compute capability: 8.6
2022-07-21 12:55:01.527383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22298 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:c8:00.0, compute capability: 8.6
2022-07-21 12:55:02.763201: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100

tf.Tensor([[[[-0.29212222  0.97755533 -0.28474247]]]], shape=(1, 1, 1, 3), dtype=float32)

tf.Tensor([[[[2362.0498 1360.1172 2242.2402]]]], shape=(1, 1, 1, 3), dtype=float32)

tilakrayal · 2022-07-22T06:23:05Z

@gadagashwini,
I was able reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

sachinprasadhs · 2022-07-28T21:51:54Z

@enderdzz , I tried to reproduce your issue in Tensorflow 2.9.1 version, I'm getting different error, could you please take a look into the gist here and make necessary changes. Thanks!

enderdzz · 2022-07-29T07:31:21Z

Hi, thank you for your reply : )

I installed version 2.9.1 locally (cuda 11.2) and ran the same code without any problems, still got inconsistent results. I suspect that the GPU environment of the remote colab is AMD ROCm, so the problem you encountered will occur, refer to: ROCm/ROCm#684

So it is better to change to cuda environment to test this code.

sachinprasadhs · 2022-08-02T18:12:11Z

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

enderdzz · 2022-08-03T14:09:42Z

Ok. I have no idea about this problem now.
Hope to get help from related TF developers.

reedwm · 2022-08-03T17:39:59Z

The CPU and GPU gradients are the same if I change output_img to be

output_img = tf.nn.local_response_normalization(input_img)

I think it's reasonable for the CPU and GPU gradients to return different results if you pass invalid values to the output_image value. output_image must be the correct forward-pass output given the input_image, and if you pass an invalid value for output_image, the op has no well-defined semantics.

@rohan100jain, do you agree that it's OK for the gradient op to return different results on the CPU and GPU if given an invalid output from the forward pass?

google-ml-butler bot added the type:bug Bug label Jul 21, 2022

google-ml-butler bot assigned tilakrayal Jul 21, 2022

tilakrayal added TF 2.9 Issues found in the TF 2.9 release (or RCs) comp:ops OPs related issues comp:gpu GPU related issues labels Jul 22, 2022

tilakrayal assigned gadagashwini and sachinprasadhs and unassigned tilakrayal and gadagashwini Jul 22, 2022

sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Jul 28, 2022

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Jul 29, 2022

sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Aug 2, 2022

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 3, 2022

sachinprasadhs assigned jurahul Aug 3, 2022

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 3, 2022

jurahul assigned reedwm and unassigned jurahul Aug 3, 2022

tilakrayal mentioned this issue Oct 28, 2022

tf.nn.local_response_normalization lack validation for depth_radius on cpu #58122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results from `tf.raw_ops.LRNGrad` between CPU and GPU #56849

Inconsistent results from `tf.raw_ops.LRNGrad` between CPU and GPU #56849

enderdzz commented Jul 21, 2022 •

edited by google-ml-butler bot

Issue Type

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

tilakrayal commented Jul 22, 2022

sachinprasadhs commented Jul 28, 2022

enderdzz commented Jul 29, 2022

sachinprasadhs commented Aug 2, 2022

enderdzz commented Aug 3, 2022

reedwm commented Aug 3, 2022

Inconsistent results from tf.raw_ops.LRNGrad between CPU and GPU #56849

Inconsistent results from tf.raw_ops.LRNGrad between CPU and GPU #56849

Comments

enderdzz commented Jul 21, 2022 • edited by google-ml-butler bot

Issue Type

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

tilakrayal commented Jul 22, 2022

sachinprasadhs commented Jul 28, 2022

enderdzz commented Jul 29, 2022

sachinprasadhs commented Aug 2, 2022

enderdzz commented Aug 3, 2022

reedwm commented Aug 3, 2022

Inconsistent results from `tf.raw_ops.LRNGrad` between CPU and GPU #56849

Inconsistent results from `tf.raw_ops.LRNGrad` between CPU and GPU #56849

enderdzz commented Jul 21, 2022 •

edited by google-ml-butler bot