No optimized fast float implementation for `Logistic` #64981

dieterd-sentea · 2024-04-03T09:23:27Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.15

Custom code

No

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Commit https://gitlab.com/libeigen/eigen/-/commit/a30ecb7221a46824b85cad5f9016efe6e5871d69 of the third-party Eigen library disabled the fast float implementation of the Logistic (or sigmoid if you like) operator for most (all?) Tensorflow Lite backends.

The fallout of this is more severe than just the speed impact: relying on overflow behavior of exp(x) for large x, as used in the generic fallback implementation, triggers a bug on our embedded platform, causing glitches where logistic(x) in some cases becomes 0 instead of 1 for large x (around 88.9).

I would propose to bump the Eigen dependency if this gets fixed upstream (see also https://gitlab.com/libeigen/eigen/-/merge_requests/1576), as this effectively fixes the issue.

Standalone code to reproduce the issue

N/A

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

sushreebarsa · 2024-04-05T09:57:06Z

@dieterd-sentea Could you please update your build process to use the new Eigen version and let us know?
Thank you!

dieterd-sentea · 2024-04-05T10:04:50Z

Hi @sushreebarsa,

I assume that TFLite built with the newest Eigen version, with https://gitlab.com/libeigen/eigen/-/merge_requests/1576 now merged, does not exhibit the issue.

The (arguably confusing) workaround that we use at the moment, until the Eigen dependency in this repository gets bumped, is the following patch:

diff --git a/tensorflow/lite/tools/cmake/modules/eigen.cmake b/tensorflow/lite/tools/cmake/modules/eigen.cmake
index 93a63d280d8..f6b14b562b7 100644
--- a/tensorflow/lite/tools/cmake/modules/eigen.cmake
+++ b/tensorflow/lite/tools/cmake/modules/eigen.cmake
@@ -99,4 +99,5 @@ set(EIGEN_TEST_SYCL OFF CACHE BOOL "Disable Sycl test")
 set(EIGEN_SYCL_TRISYCL OFF CACHE BOOL "Disable triSYCL test")
 # Make sure only MPL2.0 or more permissively licensed code is included.
 add_compile_definitions(EIGEN_MPL2_ONLY)
+add_compile_definitions(EIGEN_GPU_CC)
 add_subdirectory("\${eigen_SOURCE_DIR}" "\${eigen_BINARY_DIR}")

Of course, this abuses the fact that EIGEN_GPU_CC is only used to enable the fast float logistic implementation again in the buggy Eigen versions (and is used nowhere else). In the newest Eigen version, this workaround is not needed anymore, and is in fact undesirable (in case Eigen want to refactor EIGEN_CPUCC to EIGEN_CPU_CC at some point, to give an example).

dieterd-sentea · 2024-04-05T10:13:33Z

Correction on previous message: I have not explicitly tested if latest Eigen master (which might contain other unrelated changes) is working for us. I'll find out and let you know.

dieterd-sentea · 2024-04-05T10:58:22Z

I can confirm that we experience no issues on tf-nightly (a700cea) with the following patch, which updates Eigen to latest master https://gitlab.com/libeigen/eigen/-/commit/b2c9ba2beef4b5fd61513d73911c678e93c8dd9d:

diff --git a/tensorflow/lite/tools/cmake/modules/eigen.cmake b/tensorflow/lite/tools/cmake/modules/eigen.cmake
index 1bb203388a0..f6cc23e578f 100644
--- a/tensorflow/lite/tools/cmake/modules/eigen.cmake
+++ b/tensorflow/lite/tools/cmake/modules/eigen.cmake
@@ -23,7 +23,7 @@ OverridableFetchContent_Declare(
   eigen
   GIT_REPOSITORY https://gitlab.com/libeigen/eigen.git
   # Sync with tensorflow/third_party/eigen3/workspace.bzl
-  GIT_TAG aa6964bf3a34fd607837dd8123bc42465185c4f8
+  GIT_TAG b2c9ba2beef4b5fd61513d73911c678e93c8dd9d
   # It's not currently (cmake 3.17) possible to shallow clone with a GIT TAG
   # as cmake attempts to git checkout the commit hash after the clone
   # which doesn't work as it's a shallow clone hence a different commit hash.
diff --git a/third_party/eigen3/workspace.bzl b/third_party/eigen3/workspace.bzl
index 027454e46dd..7065b289e04 100644
--- a/third_party/eigen3/workspace.bzl
+++ b/third_party/eigen3/workspace.bzl
@@ -7,8 +7,8 @@ def repo():
 
     # Attention: tools parse and update these lines.
     # LINT.IfChange
-    EIGEN_COMMIT = "aa6964bf3a34fd607837dd8123bc42465185c4f8"
-    EIGEN_SHA256 = "35ba771e30c735a4215ed784d7e032086cf89fe6622dce4d793c45dd74373362"
+    EIGEN_COMMIT = "b2c9ba2beef4b5fd61513d73911c678e93c8dd9d"
+    EIGEN_SHA256 = "9e0dbce1464f8e05f9e4ea66c0a92850c4aade71c330130f8960f3ab3592d5c4"
     # LINT.ThenChange(//tensorflow/lite/tools/cmake/modules/eigen.cmake)
 
     tf_http_archive(
diff --git a/third_party/xla/third_party/tsl/third_party/eigen3/workspace.bzl b/third_party/xla/third_party/tsl/third_party/eigen3/workspace.bzl
index 027454e46dd..7065b289e04 100644
--- a/third_party/xla/third_party/tsl/third_party/eigen3/workspace.bzl
+++ b/third_party/xla/third_party/tsl/third_party/eigen3/workspace.bzl
@@ -7,8 +7,8 @@ def repo():
 
     # Attention: tools parse and update these lines.
     # LINT.IfChange
-    EIGEN_COMMIT = "aa6964bf3a34fd607837dd8123bc42465185c4f8"
-    EIGEN_SHA256 = "35ba771e30c735a4215ed784d7e032086cf89fe6622dce4d793c45dd74373362"
+    EIGEN_COMMIT = "b2c9ba2beef4b5fd61513d73911c678e93c8dd9d"
+    EIGEN_SHA256 = "9e0dbce1464f8e05f9e4ea66c0a92850c4aade71c330130f8960f3ab3592d5c4"
     # LINT.ThenChange(//tensorflow/lite/tools/cmake/modules/eigen.cmake)
 
     tf_http_archive(

Funnily enough, the overflow issue that we observed does not appear on tf-nightly (but does occur on v2.14.0 and v2.15.0), but this is orthogonal to this issue which is about the fast float implementation for Logistic.

dieterd-sentea · 2024-04-08T08:48:26Z

So I've investigated this a bit further and we observe a nice speedup with the above patch (up to 6% for a certain model that contains a sigmoid layer).

Note btw that we do not use XNNPACK delegate, i.e., we configure the project with cmake -DTFLITE_ENABLE_XNNPACK=OFF. This is important to note, as otherwise float32 logistic goes through XNNPACK instead of Eigen, as can be seen here:

tensorflow/tensorflow/lite/kernels/activations.cc

Line 1130 in ade4535

#ifdef TFLITE_KERNEL_USE_XNNPACK

pkgoogle · 2024-05-13T18:33:58Z

Hi @dieterd-sentea, can we say that this is resolved now on the master branch at least? Let me know if you disagree or if there is any other action needed from our side. Thanks.

dieterd-sentea · 2024-05-14T07:03:39Z

@pkgoogle LGTM

dieterd-sentea · 2024-05-14T07:06:06Z

So indeed 476aaac resolves this. Thanks!

google-ml-butler · 2024-05-14T07:06:08Z

Are you satisfied with the resolution of your issue?
Yes
No

google-ml-butler bot added the type:bug Bug label Apr 3, 2024

google-ml-butler bot assigned sushreebarsa Apr 3, 2024

sushreebarsa added type:build/install Build and install issues TF 2.15 For issues related to 2.15.x type:bug Bug comp:ops OPs related issues stat:awaiting response Status - Awaiting response from author and removed type:bug Bug type:build/install Build and install issues labels Apr 5, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 5, 2024

sushreebarsa added the comp:lite TF Lite related issues label Apr 10, 2024

sushreebarsa assigned LakshmiKalaKadali and unassigned sushreebarsa Apr 10, 2024

LakshmiKalaKadali assigned sawantkumar and unassigned LakshmiKalaKadali Apr 18, 2024

sawantkumar added the WIP label Apr 29, 2024

sawantkumar assigned pkgoogle May 13, 2024

pkgoogle added stat:awaiting response Status - Awaiting response from author and removed WIP labels May 13, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 14, 2024

dieterd-sentea closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No optimized fast float implementation for `Logistic` #64981

No optimized fast float implementation for `Logistic` #64981

dieterd-sentea commented Apr 3, 2024 •

edited

sushreebarsa commented Apr 5, 2024

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 8, 2024 •

edited

pkgoogle commented May 13, 2024

dieterd-sentea commented May 14, 2024

dieterd-sentea commented May 14, 2024 •

edited

google-ml-butler bot commented May 14, 2024

No optimized fast float implementation for Logistic #64981

No optimized fast float implementation for Logistic #64981

Comments

dieterd-sentea commented Apr 3, 2024 • edited

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

sushreebarsa commented Apr 5, 2024

dieterd-sentea commented Apr 5, 2024 • edited

dieterd-sentea commented Apr 5, 2024 • edited

dieterd-sentea commented Apr 5, 2024 • edited

dieterd-sentea commented Apr 8, 2024 • edited

pkgoogle commented May 13, 2024

dieterd-sentea commented May 14, 2024

dieterd-sentea commented May 14, 2024 • edited

google-ml-butler bot commented May 14, 2024

No optimized fast float implementation for `Logistic` #64981

No optimized fast float implementation for `Logistic` #64981

dieterd-sentea commented Apr 3, 2024 •

edited

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 5, 2024 •

edited

dieterd-sentea commented Apr 8, 2024 •

edited

dieterd-sentea commented May 14, 2024 •

edited