-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows heap corruption caused by upgrading protobuf #53234
Labels
Comments
Hi @sanatmpa1 ! Could you please look at this Issue? |
copybara-service bot
pushed a commit
that referenced
this issue
Feb 10, 2022
… Python eager context to enable access to the functions defined via tf.function. There are some (hopefully temporary) shortcuts: * protobufs crossing pybind boundaries are serialized, until the Windows builds are fixed, see #53234 * declarations dependent on MLIR types will use opaque pointers to hide the type information, until we've created a header-only library for them, or until we get rid of header-only libraries * when we add MLIR interfaces to pybind, MLIR objects will be serialized similarly with protos, until we've added pybind conversions PiperOrigin-RevId: 427737336 Change-Id: Ie14b6d85211fd19fee846f0fd5e34f5e48a2a374
This was referenced May 17, 2022
Closed
copybara-service bot
pushed a commit
that referenced
this issue
May 20, 2022
See also: #53234 See also: protocolbuffers/protobuf#9954 See also: #56077 PiperOrigin-RevId: 450054200
tensorflow-jenkins
pushed a commit
that referenced
this issue
May 20, 2022
See also: #53234 See also: protocolbuffers/protobuf#9954 See also: #56077 PiperOrigin-RevId: 450054200
This was referenced May 20, 2022
mihaimaruseac
added a commit
that referenced
this issue
May 20, 2022
See also: #53234 See also: protocolbuffers/protobuf#9954 See also: #56077 PiperOrigin-RevId: 450054200
pranve
pushed a commit
to pranve/tensorflow
that referenced
this issue
May 21, 2022
See also: tensorflow#53234 See also: protocolbuffers/protobuf#9954 See also: tensorflow#56077 PiperOrigin-RevId: 450054200
pranve
pushed a commit
to pranve/tensorflow
that referenced
this issue
May 21, 2022
See also: tensorflow#53234 See also: protocolbuffers/protobuf#9954 See also: tensorflow#56077 PiperOrigin-RevId: 450054200
pranve
pushed a commit
to pranve/tensorflow
that referenced
this issue
May 21, 2022
See also: tensorflow#53234 See also: protocolbuffers/protobuf#9954 See also: tensorflow#56077 PiperOrigin-RevId: 450054200
georgepaw
pushed a commit
to graphcore/tensorflow
that referenced
this issue
Jul 25, 2022
Summary: See also: tensorflow/tensorflow#53234 See also: protocolbuffers/protobuf#9954 See also: tensorflow/tensorflow#56077 TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Maniphest Tasks: T63128 Differential Revision: https://phabricator.sourcevertex.net/D67981
chxin66
pushed a commit
to chxin66/tensorflow
that referenced
this issue
Sep 5, 2022
See also: tensorflow#53234 See also: protocolbuffers/protobuf#9954 See also: tensorflow#56077 PiperOrigin-RevId: 450054200
1 task
Adding @vam-google who's looking at protobuf upgrade. |
This was referenced Dec 6, 2022
Closing due to 84f4092 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows
TensorFlow version:
At Upgrade grpc to 1.41.1 and protobuf to 3.19.0 #52853
Bazel version (if compiling from source):
3.7.2
GCC/Compiler version (if compiling from source):
MSVC
Describe the problem
This is a summary of the Windows heap corruption bug discovered when upgrading protobuf in #52853 (comment).
In the PR, we tried to upgrade protobuf from 3.9.2 to 3.19.0, but some tf optimizer related tests are failing on Windows with
The error code indicates a heap corruption. After some investigation, we discovered the problem is caused by:
On Windows, most of TF C++ code is linked into
_pywrap_tensorflow_internal.pyd
and some extension code is linked into individual pyd files, like_pywrap_tf_optimizer.pyd
. The later is also linked to the former. However, the protobuf library is statically linked into both dynamic libraries.Basically, protobuf deletes some global string in a proto desctructor after comparing the address of some global default string. When there are two protobuf runtimes, there are two default strings with different addresses, which caused some memory to be accidentally deleted when mixed together.
@acozzette also explained why protobuf doesn't work well when linked in multiple places at #52853 (comment)
Provide the exact sequence of commands / steps that you executed before running into the problem
To reproduce the original error in a full build:
However, the full TF build isn't debuggable, I have constructed some smaller targets to imitate the situation.
To build the minimal reproduce case in a debug build:
When debugging in Visual Studio, you should be able to see the following:
Possible Solution
To properly solve this problem, we probably need to migrate TF to cc_shared_library, which makes dynamic linking more controllable.
The text was updated successfully, but these errors were encountered: