Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conv2D computes wrongly in Windows OS #64396

Open
Shuo-Sun20 opened this issue Mar 25, 2024 · 7 comments
Open

Conv2D computes wrongly in Windows OS #64396

Shuo-Sun20 opened this issue Mar 25, 2024 · 7 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:cpu-intel To track windows cpu issues subtype:windows Windows Build/Installation Issues TF 2.16 type:bug Bug

Comments

@Shuo-Sun20
Copy link

Shuo-Sun20 commented Mar 25, 2024

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.16

Custom code

Yes

OS platform and distribution

Windows 10

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

On Windows OS, Conv2D generates a wrong output in some cases, while performs correctly on some others.
This error does not occur on Linux OS, even with the same code.

An wrong execution example:
图片

You can tell that the result l(x) has a wrong shape.

I notice that an exisiting issue #63860 points out the similar error in Conv3D. I guess Conv2D and Conv3D have similar problem since they have the same parent class BaseConv.

Standalone code to reproduce the issue

#This test case works fine on linux OS, while goes wrongly on Windows.


from keras.layers import Conv2D
import numpy as np

x=np.random.rand(1,2,2,1)
print(l(x).shape)
print(l.compute_output_shape(x.shape))

Relevant log output

TensorShape([1, 2, 2, 1])
(1, 0, 0, 1)
@google-ml-butler google-ml-butler bot added the type:bug Bug label Mar 25, 2024
@Venkat6871 Venkat6871 added TF 2.16 subtype:windows Windows Build/Installation Issues labels Mar 26, 2024
@Venkat6871
Copy link

Hi @Shuo-Sun20 ,
I tried to run your code on Colab using TF v2.16.1 and i am not facing any issue. Please find the gist here for reference.

Thank you!

@Venkat6871 Venkat6871 added the stat:awaiting response Status - Awaiting response from author label Mar 26, 2024
@Shuo-Sun20
Copy link
Author

This issue only exists on Windows OS, so on Colab(linux OS) this issue will not show up.
Please try it on Windows.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 26, 2024
@SuryanarayanaY SuryanarayanaY added the subtype:cpu-intel To track windows cpu issues label Mar 27, 2024
@NeoZhangJianyu
Copy link

@Shuo-Sun20

Use following code, I got same result in both linux & windows:

from keras.layers import Conv2D
import numpy as np

x=np.random.rand(1,2,2,1)
l=Conv2D(1,3,(1,1),'valid','channels_last', [1,1],1, 'linear', True)
print(l(x).shape)
print(l.compute_output_shape(x.shape))

(1, 2, 2, 1)
(1, 0, 0, 1)

tensorflow 2.16.1
keras 3.1.1

  1. The kernel is 3, but the input is <3. It's strange case.
    Could you confirm if such input parameters are right?

@Shuo-Sun20
Copy link
Author

  1. You are right, kernel_size > input_size should be an invalid parameter combination, while now Conv2D can generate a reseult without warning. Maybe a checker should exist here?
  2. The behavior of Conv2D is diffenrent on Linux and Windows (since I failed to reproduce it on colab), this inconsisdent may need deeper inspection.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 3, 2024
@NeoZhangJianyu
Copy link

@Shuo-Sun20

  1. For the checker of abnormal input, could report it as a feature in another issue?
  2. I run the case and got same result in both windows and linux.
    You said they were different in your case.
    Could you share the whole logs of them?
    And share the result of 'pip list' in windows and linux.

@Shuo-Sun20
Copy link
Author

  1. I'll report it in another issue.
  2. I run the following code in both windows and linux:
from keras.layers import Conv2D
import numpy as np

x=np.random.rand(1,2,2,1)
l=Conv2D(1,3,(1,1),'valid','channels_last', [1,1],1, 'linear', True)
print(l(x).shape)
print(l.compute_output_shape(x.shape))

In windows the result is

2024-04-07 16:35:31.739978: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-07 16:35:32.406322: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-07 16:35:33.580266: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
(1, 2, 2, 1)
(1, 0, 0, 1)

The pip list in windows:

absl-py==2.1.0
astunparse==1.6.3
certifi==2024.2.2
charset-normalizer==3.3.2
flatbuffers==24.3.25
gast==0.5.4
google-pasta==0.2.0
grpcio==1.62.1
h5py==3.10.0
idna==3.6
importlib_metadata==7.1.0
keras==3.1.1
libclang==18.1.1
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
ml-dtypes==0.3.2
namex==0.0.7
numpy==1.26.4
opt-einsum==3.3.0
optree==0.11.0
packaging==24.0
protobuf==4.25.3
Pygments==2.17.2
requests==2.31.0
rich==13.7.1
six==1.16.0
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow==2.16.1
tensorflow-intel==2.16.1
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
typing_extensions==4.11.0
urllib3==2.2.1
Werkzeug==3.0.2
wrapt==1.16.0
zipp==3.18.1

while in this linux colab, the result is:

(1, 0, 0, 1)
(1, 0, 0, 1)

I failed to install tensorflow-intel 2.16.1 in colab(linux), so I just installed tensorflow and keras using regular pip command.
You can freely edit the code in the colab and install packages.

The pip list is a little different since colab pre installs many packages used in Deep Learning. The list is too long to show in this comment. You can find it yourself with the shared link.

@NeoZhangJianyu
Copy link

@Shuo-Sun20
In windows, my result is same as yours.
In linux, when enable oneDNN path in TF by TF_ENABLE_ONEDNN_OPTS=1, the result is same as windows.
If TF_ENABLE_ONEDNN_OPTS=0, the result is same as yours (different with windows).

I think it's the different between oneDNN code and Eigen code of TF.

But the following code can't work in colab env you provide.
I guess the TF in colab doesn't support oneDNN code path.
Maybe you could try in local/another linux.

import os
os.environ['TF_ENABLE_ONEDNN_OPTS']='1'

from keras.layers import Conv2D
import numpy as np

x=np.random.rand(1,2,2,1)
l=Conv2D(1,3,(1,1),'valid','channels_last', [1,1],1, 'linear', True)
print(l(x).shape)
print(l.compute_output_shape(x.shape))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:cpu-intel To track windows cpu issues subtype:windows Windows Build/Installation Issues TF 2.16 type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants