Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.constant with tf.float16 results in incorrect outputs with Mac M1 chip #57010

Open
slyforce opened this issue Aug 4, 2022 · 7 comments
Open
Assignees
Labels
comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author subtype:macOS macOS Build/Installation issues TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug

Comments

@slyforce
Copy link

slyforce commented Aug 4, 2022

Click to expand!

Issue Type

Bug

Source

binary

Tensorflow Version

2.9.2

Custom Code

No

OS Platform and Distribution

MacOS 12.3 arm64

Mobile device

No response

Python version

3.9.12

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

When using the pip wheel for tensorflow_macos==2.9.2 (tensorflow_macos-2.9.2-cp39-cp39-macosx_11_0_arm64.whl), using tf.constant in graph mode, or equivalently in tf.function() decorated functions, with the meta-optimizer enabled leads to incorrect tensor contents. Disabling the meta-optimizer results in correct behavior. Using tf.float32 also leads to correct behavior.

Examination of the output generated with TF_CPP_MIN_LOG_LEVEL=0 TF_CPP_MAX_VLOG_LEVEL=3 indeed shows that the binary content of the tensor is simply the value passed in, rather than the actual binary representation of float16.

This issue #53260 also suffers from the same problem.
Also this seems to be a Mac M1 ARM64 wheel-specific problem. The behaviour is correct for Linux-based systems and wheels.

Standalone code to reproduce the issue

import numpy as np
import tensorflow as tf


def f(dtype, disable_meta_optimizer: bool):
  tf.config.optimizer.set_experimental_options(
    options={
      "disable_meta_optimizer": disable_meta_optimizer,
      "min_graph_nodes": 1, # the graph will only consist of a single node, default is 4
    }
  )
  print(f"dtype={dtype} disable_meta_optimizer={disable_meta_optimizer}")
  with tf.Graph().as_default() as g, tf.compat.v1.Session(graph=g) as s:
    t = tf.constant(1.0, dtype=dtype)
    try:
      fetch = s.run(t)
      assert fetch.astype(np.float32) == np.full([], 1.0, np.float32)
    except AssertionError:
      print(f"Fail! Contents of fetched tensor: {fetch}")
    else:
      print(f"Success!")


for dtype in [tf.float32, tf.float16]:
  for disable_meta_optimizer in [True, False]:
    f(dtype, disable_meta_optimizer)

Relevant log output

dtype=<dtype: 'float32'> disable_meta_optimizer=True
Success!
dtype=<dtype: 'float32'> disable_meta_optimizer=False
Success!
dtype=<dtype: 'float16'> disable_meta_optimizer=True
Success!
dtype=<dtype: 'float16'> disable_meta_optimizer=False
Fail! Contents of fetched tensor: 0.0
@google-ml-butler google-ml-butler bot added the type:bug Bug label Aug 4, 2022
@slyforce slyforce changed the title tf.constant with tf.float16 results in incorrect outputs tf.constant with tf.float16 results in incorrect outputs with Mac M1 chip Aug 4, 2022
@mohantym mohantym added comp:ops OPs related issues TF 2.9 Issues found in the TF 2.9 release (or RCs) subtype:macOS macOS Build/Installation issues labels Aug 5, 2022
@mohantym mohantym assigned gadagashwini and unassigned mohantym Aug 5, 2022
@gadagashwini
Copy link
Contributor

Hi @slyforce, I executed your code with Tensorflow 2.9.2 on Mac.
Looks like its working as expected.

dtype=<dtype: 'float32'> disable_meta_optimizer=True
Metal device set to: AMD Radeon Pro 555X

systemMemory: 32.00 GB
maxCacheSize: 2.00 GB

2022-08-08 14:18:07.615126: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-08 14:18:07.615370: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-08-08 14:18:08.291252: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
Success!
dtype=<dtype: 'float32'> disable_meta_optimizer=False
2022-08-08 14:18:08.298983: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-08 14:18:08.299012: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-08-08 14:18:08.300201: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Success!
dtype=<dtype: 'float16'> disable_meta_optimizer=True
2022-08-08 14:18:08.303720: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-08 14:18:08.303745: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Success!
dtype=<dtype: 'float16'> disable_meta_optimizer=False
2022-08-08 14:18:08.307425: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-08 14:18:08.307450: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-08-08 14:18:08.308204: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Success!
(tensorflow-metal) gadag-macbookpro:Documents gadag$ 

@gadagashwini gadagashwini added the stat:awaiting response Status - Awaiting response from author label Aug 8, 2022
@slyforce
Copy link
Author

slyforce commented Aug 8, 2022

Thanks for looking into this!
My script output was generated using the Mac M1 Pro CPU, whilst you used the GPU device of an older Mac. Using the GPU via tensorflow-metal also results in the same wrong behaviour with tf.float16.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 8, 2022
@gadagashwini
Copy link
Contributor

Hi @slyforce, I tested it on macOS Monterey Version 12.5. with CPU version of Tensorflow. I followed the instructions mentioned here to install Tensorflow. Thank you!

@gadagashwini gadagashwini added the stat:awaiting response Status - Awaiting response from author label Aug 10, 2022
@maxhgerlach
Copy link
Contributor

@gadagashwini, I see the same issue as OP, MacBook Pro 2021 with M1 Pro (aarch64), macOS 12.5.

TensorFlow is installed via the package tensorflow-macos==2.9.2, no tensorflow-metal needed.

The issue may very well be specific to arm64 (Apple Silicon). In that case you won't see it with an Intel CPU.

@slyforce
Copy link
Author

slyforce commented Aug 11, 2022

What @maxhgerlach described reflects the current situation. (replying to remove the awaiting response tag)

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Aug 11, 2022
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 16, 2022
@sachinprasadhs
Copy link
Contributor

Thanks for reporting the issue. I was able to reproduce the behavior in Tensorflow 2.9.2 in MaCOS M1 chip.

@tilakrayal
Copy link
Contributor

@slyforce,
I tried to execute the code with the latest tensorflow version on MacOS, and observed that the output was as expected.
Kindly find the screenshot for the reference.

image (25)

Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Apr 26, 2024
@sachinprasadhs sachinprasadhs removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author subtype:macOS macOS Build/Installation issues TF 2.9 Issues found in the TF 2.9 release (or RCs) type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants