Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_EagerConst: Dst tensor is not initialized #52086

Open
negf opened this issue Sep 22, 2021 · 9 comments
Open

_EagerConst: Dst tensor is not initialized #52086

negf opened this issue Sep 22, 2021 · 9 comments
Assignees
Labels
2.6.0 comp:core issues related to core part of tensorflow comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug

Comments

@negf
Copy link

negf commented Sep 22, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 21.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.7.0-dev20210921 / 2.6
  • Python version: 3.9.5
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.4
  • GPU model and memory: NVIDIA TITAN RTX 24220MiB

You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

When running TF 2.7 and 2.6 with eager execution gives the following error: "in order to run _EagerConst: Dst tensor is not initialized."

It works with tf.compat.v1.disable_eager_execution() for TF 2.7/2.6 and TF 2.5.

Describe the expected behavior

LearningRate of 1.000000e-04
Epoch 1/100
2021-09-22 12:01:03.300192: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8202
2021-09-22 12:01:04.275126: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
225/225 [==============================] - ETA: 0s - loss: 114.7353 - mae: 0.3077 - mse: 0.1493 - r2: 0.9997 - lr: 1.0000e-04
2021-09-22 12:04:35.644414: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.90GiB (rounded to 5259396608)requested by op EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-09-22 12:04:35.644512: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for GPU_0_bfc
2021-09-22 12:04:35.644553: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256): Total Chunks: 82, Chunks in use: 73. 20.5KiB allocated for chunks. 18.2KiB in use in bin. 3.8KiB client-requested in use in bin.
2021-09-22 12:04:35.644571: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512): Total Chunks: 2, Chunks in use: 0. 1.2KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644587: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1024): Total Chunks: 2, Chunks in use: 1. 2.8KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-09-22 12:04:35.644602: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644620: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4096): Total Chunks: 3, Chunks in use: 2. 16.2KiB allocated for chunks. 8.5KiB in use in bin. 8.0KiB client-requested in use in bin.
2021-09-22 12:04:35.644637: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8192): Total Chunks: 4, Chunks in use: 2. 47.0KiB allocated for chunks. 19.8KiB in use in bin. 16.0KiB client-requested in use in bin.
2021-09-22 12:04:35.644653: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16384): Total Chunks: 1, Chunks in use: 1. 28.2KiB allocated for chunks. 28.2KiB in use in bin. 28.1KiB client-requested in use in bin.
2021-09-22 12:04:35.644667: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644679: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644694: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (131072): Total Chunks: 7, Chunks in use: 6. 1.42MiB allocated for chunks. 1.24MiB in use in bin. 1.15MiB client-requested in use in bin.
2021-09-22 12:04:35.644717: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (262144): Total Chunks: 3, Chunks in use: 2. 1.07MiB allocated for chunks. 641.0KiB in use in bin. 463.5KiB client-requested in use in bin.
2021-09-22 12:04:35.644732: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (524288): Total Chunks: 1, Chunks in use: 0. 840.5KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644747: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1048576): Total Chunks: 2, Chunks in use: 2. 2.85MiB allocated for chunks. 2.85MiB in use in bin. 2.85MiB client-requested in use in bin.
2021-09-22 12:04:35.644762: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2097152): Total Chunks: 3, Chunks in use: 2. 7.27MiB allocated for chunks. 4.69MiB in use in bin. 4.69MiB client-requested in use in bin.
2021-09-22 12:04:35.644788: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644795: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644801: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16777216): Total Chunks: 2, Chunks in use: 2. 53.47MiB allocated for chunks. 53.47MiB in use in bin. 53.47MiB client-requested in use in bin.
2021-09-22 12:04:35.644807: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644813: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (67108864): Total Chunks: 1, Chunks in use: 0. 80.21MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644821: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-22 12:04:35.644828: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (268435456): Total Chunks: 3, Chunks in use: 1. 22.13GiB allocated for chunks. 17.62GiB in use in bin. 17.62GiB client-requested in use in bin.
2021-09-22 12:04:35.644837: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] Bin for 4.90GiB was 256.00MiB, Chunk State:
2021-09-22 12:04:35.644863: I tensorflow/core/common_runtime/bfc_allocator.cc:1039] Size: 748.21MiB | Requested Size: 80.21MiB | in_use: 0 | bin_num: 20, prev: Size: 1.42MiB | Requested Size: 1.42MiB | in_use: 1 | bin_num: -1, next: Size: 1.42MiB | Requested Size: 1.42MiB | in_use: 1 | bin_num: -1
2021-09-22 12:04:35.644874: I tensorflow/core/common_runtime/bfc_allocator.cc:1039] Size: 3.78GiB | Requested Size: 121.76MiB | in_use: 0 | bin_num: 20, prev: Size: 1.42MiB | Requested Size: 1.42MiB | in_use: 1 | bin_num: -1
2021-09-22 12:04:35.644879: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 23913299968
2021-09-22 12:04:35.644892: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f8016000000 of size 18924364800 next 41
2021-09-22 12:04:35.644897: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f847dfae400 of size 28036096 next 79
2021-09-22 12:04:35.644902: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f847fa6b000 of size 28036096 next 80
2021-09-22 12:04:35.644908: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f8481527c00 of size 2461184 next 81
2021-09-22 12:04:35.644913: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f8481780a00 of size 2461184 next 82
2021-09-22 12:04:35.644919: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f84819d9800 of size 256 next 86
2021-09-22 12:04:35.644924: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f84819d9900 of size 256 next 87
2021-09-22 12:04:35.644929: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f84819d9a00 of size 84108288 next 60
2021-09-22 12:04:35.644943: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f8486a0fe00 of size 1492992 next 149
2021-09-22 12:04:35.644950: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f8486b7c600 of size 784551936 next 127
2021-09-22 12:04:35.644955: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f84b57b1600 of size 1492992 next 144
2021-09-22 12:04:35.644960: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f84b591de00 of size 4056293888 next 18446744073709551615
2021-09-22 12:04:35.644964: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 2097152
2021-09-22 12:04:35.644969: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00000 of size 256 next 1
2021-09-22 12:04:35.644974: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00100 of size 1280 next 2
2021-09-22 12:04:35.644978: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00600 of size 256 next 3
2021-09-22 12:04:35.644982: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446a00700 of size 256 next 4
2021-09-22 12:04:35.644987: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00800 of size 256 next 5
2021-09-22 12:04:35.644991: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00900 of size 256 next 6
2021-09-22 12:04:35.644995: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00a00 of size 256 next 7
2021-09-22 12:04:35.645000: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00b00 of size 256 next 8
2021-09-22 12:04:35.645008: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00c00 of size 256 next 9
2021-09-22 12:04:35.645013: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00d00 of size 256 next 140
2021-09-22 12:04:35.645018: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446a00e00 of size 256 next 154
2021-09-22 12:04:35.645024: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a00f00 of size 256 next 14
2021-09-22 12:04:35.645029: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a01000 of size 256 next 15
2021-09-22 12:04:35.645033: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446a01100 of size 256 next 24
2021-09-22 12:04:35.645038: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a01200 of size 256 next 161
2021-09-22 12:04:35.645042: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a01300 of size 256 next 20
2021-09-22 12:04:35.645047: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a01400 of size 256 next 21
2021-09-22 12:04:35.645052: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a01500 of size 221184 next 115
2021-09-22 12:04:35.645056: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446a37500 of size 256 next 135
2021-09-22 12:04:35.645061: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446a37600 of size 860672 next 162
2021-09-22 12:04:35.645067: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446b09800 of size 4608 next 150
2021-09-22 12:04:35.645073: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446b0aa00 of size 464896 next 110
2021-09-22 12:04:35.645078: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446b7c200 of size 8192 next 157
2021-09-22 12:04:35.645083: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446b7e200 of size 256 next 12
2021-09-22 12:04:35.645087: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446b7e300 of size 12032 next 102
2021-09-22 12:04:35.645092: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446b81200 of size 330240 next 143
2021-09-22 12:04:35.645096: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446bd1c00 of size 256 next 145
2021-09-22 12:04:35.645101: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446bd1d00 of size 256 next 95
2021-09-22 12:04:35.645105: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446bd1e00 of size 256 next 105
2021-09-22 12:04:35.645110: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446bd1f00 of size 188672 next 18446744073709551615
2021-09-22 12:04:35.645123: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 4194304
2021-09-22 12:04:35.645129: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c00000 of size 256 next 23
2021-09-22 12:04:35.645133: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00100 of size 256 next 131
2021-09-22 12:04:35.645138: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00200 of size 256 next 26
2021-09-22 12:04:35.645144: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00300 of size 256 next 27
2021-09-22 12:04:35.645149: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00400 of size 256 next 28
2021-09-22 12:04:35.645153: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00500 of size 256 next 31
2021-09-22 12:04:35.645158: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c00600 of size 256 next 32
2021-09-22 12:04:35.645162: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00700 of size 256 next 34
2021-09-22 12:04:35.645166: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00800 of size 256 next 35
2021-09-22 12:04:35.645172: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00900 of size 256 next 36
2021-09-22 12:04:35.645178: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00a00 of size 256 next 37
2021-09-22 12:04:35.645183: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00b00 of size 256 next 38
2021-09-22 12:04:35.645187: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00c00 of size 256 next 39
2021-09-22 12:04:35.645192: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00d00 of size 256 next 43
2021-09-22 12:04:35.645196: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00e00 of size 256 next 44
2021-09-22 12:04:35.645201: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c00f00 of size 256 next 45
2021-09-22 12:04:35.645205: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01000 of size 256 next 46
2021-09-22 12:04:35.645209: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01100 of size 256 next 47
2021-09-22 12:04:35.645214: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01200 of size 256 next 48
2021-09-22 12:04:35.645218: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01300 of size 256 next 49
2021-09-22 12:04:35.645223: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01400 of size 256 next 148
2021-09-22 12:04:35.645230: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01500 of size 256 next 13
2021-09-22 12:04:35.645234: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01600 of size 256 next 57
2021-09-22 12:04:35.645239: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01700 of size 256 next 51
2021-09-22 12:04:35.645243: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01800 of size 256 next 58
2021-09-22 12:04:35.645247: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01900 of size 256 next 98
2021-09-22 12:04:35.645252: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01a00 of size 256 next 106
2021-09-22 12:04:35.645256: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c01b00 of size 256 next 50
2021-09-22 12:04:35.645260: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01c00 of size 256 next 107
2021-09-22 12:04:35.645265: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01d00 of size 256 next 124
2021-09-22 12:04:35.645270: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01e00 of size 256 next 138
2021-09-22 12:04:35.645274: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c01f00 of size 256 next 158
2021-09-22 12:04:35.645278: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02000 of size 256 next 112
2021-09-22 12:04:35.645282: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02100 of size 256 next 141
2021-09-22 12:04:35.645288: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02200 of size 256 next 66
2021-09-22 12:04:35.645293: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02300 of size 256 next 67
2021-09-22 12:04:35.645297: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02400 of size 256 next 68
2021-09-22 12:04:35.645301: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02500 of size 256 next 33
2021-09-22 12:04:35.645306: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c02600 of size 768 next 92
2021-09-22 12:04:35.645310: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02900 of size 256 next 101
2021-09-22 12:04:35.645314: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02a00 of size 256 next 62
2021-09-22 12:04:35.645321: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02b00 of size 256 next 103
2021-09-22 12:04:35.645326: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02c00 of size 256 next 54
2021-09-22 12:04:35.645330: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c02d00 of size 256 next 137
2021-09-22 12:04:35.645337: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c02e00 of size 256 next 97
2021-09-22 12:04:35.645341: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c02f00 of size 13568 next 30
2021-09-22 12:04:35.645347: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c06400 of size 28928 next 42
2021-09-22 12:04:35.645351: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c0d500 of size 14336 next 120
2021-09-22 12:04:35.645356: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c10d00 of size 256 next 117
2021-09-22 12:04:35.645361: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c10e00 of size 1536 next 146
2021-09-22 12:04:35.645366: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c11400 of size 256 next 10
2021-09-22 12:04:35.645370: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c11500 of size 256 next 152
2021-09-22 12:04:35.645375: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c11600 of size 140288 next 91
2021-09-22 12:04:35.645380: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c33a00 of size 512 next 25
2021-09-22 12:04:35.645384: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c33c00 of size 256 next 53
2021-09-22 12:04:35.645389: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c33d00 of size 256 next 130
2021-09-22 12:04:35.645394: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446c33e00 of size 7936 next 22
2021-09-22 12:04:35.645398: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c35d00 of size 4096 next 56
2021-09-22 12:04:35.645403: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c36d00 of size 230144 next 63
2021-09-22 12:04:35.645408: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f000 of size 256 next 69
2021-09-22 12:04:35.645415: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f100 of size 256 next 70
2021-09-22 12:04:35.645420: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f200 of size 256 next 71
2021-09-22 12:04:35.645427: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f300 of size 256 next 72
2021-09-22 12:04:35.645432: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f400 of size 256 next 73
2021-09-22 12:04:35.645437: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f500 of size 256 next 74
2021-09-22 12:04:35.645441: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f600 of size 256 next 75
2021-09-22 12:04:35.645446: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f700 of size 256 next 76
2021-09-22 12:04:35.645451: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f800 of size 256 next 77
2021-09-22 12:04:35.645455: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6f900 of size 256 next 78
2021-09-22 12:04:35.645460: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446c6fa00 of size 228096 next 83
2021-09-22 12:04:35.645465: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446ca7500 of size 228096 next 84
2021-09-22 12:04:35.645469: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446cdf000 of size 253440 next 85
2021-09-22 12:04:35.645474: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 7f9446d1ce00 of size 326144 next 17
2021-09-22 12:04:35.645481: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 7f9446d6c800 of size 2701312 next 18446744073709551615
2021-09-22 12:04:35.645487: I tensorflow/core/common_runtime/bfc_allocator.cc:1071] Summary of in-use Chunks by size:
2021-09-22 12:04:35.645495: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 73 Chunks of size 256 totalling 18.2KiB
2021-09-22 12:04:35.645500: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-22 12:04:35.645505: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 4096 totalling 4.0KiB
2021-09-22 12:04:35.645510: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 4608 totalling 4.5KiB
2021-09-22 12:04:35.645515: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 8192 totalling 8.0KiB
2021-09-22 12:04:35.645520: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 12032 totalling 11.8KiB
2021-09-22 12:04:35.645524: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 28928 totalling 28.2KiB
2021-09-22 12:04:35.645532: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 140288 totalling 137.0KiB
2021-09-22 12:04:35.645536: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 221184 totalling 216.0KiB
2021-09-22 12:04:35.645541: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 228096 totalling 445.5KiB
2021-09-22 12:04:35.645546: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 230144 totalling 224.8KiB
2021-09-22 12:04:35.645551: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 253440 totalling 247.5KiB
2021-09-22 12:04:35.645555: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 326144 totalling 318.5KiB
2021-09-22 12:04:35.645560: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 330240 totalling 322.5KiB
2021-09-22 12:04:35.645564: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 1492992 totalling 2.85MiB
2021-09-22 12:04:35.645569: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 2461184 totalling 4.69MiB
2021-09-22 12:04:35.645574: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 28036096 totalling 53.47MiB
2021-09-22 12:04:35.645579: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 18924364800 totalling 17.62GiB
2021-09-22 12:04:35.645584: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 17.69GiB
2021-09-22 12:04:35.645588: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes
: 23919591424 memory_limit_: 23919591424 available bytes: 0 curr_region_allocation_bytes_: 34359738368
2021-09-22 12:04:35.645598: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats:
Limit: 23919591424
InUse: 18990380800
MaxInUse: 21956931328
NumAllocs: 82937
MaxAllocSize: 18924364800
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0

2021-09-22 12:04:35.645614: W tensorflow/core/common_runtime/bfc_allocator.cc:474] *********************************************************************************____________*

InternalError Traceback (most recent call last)
/tmp/ipykernel_24845/1230787370.py in
---> 16 early_history = model.fit(X_train,y_train,validation_data=(X_test,y_test),
17 epochs=EPOCHS,initial_epoch=start_step, verbose=1,batch_size=32,

~/data/python_tensorflow_env/lib/python3.9/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.traceback)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb

~/data/python_tensorflow_env/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
104 dtype = dtypes.as_dtype(dtype).as_datatype_enum
105 ctx.ensure_initialized()
--> 106 return ops.EagerTensor(value, ctx.device_name, dtype)
107
108

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

@Saduf2019
Copy link
Contributor

@negf
Please refer to similar issues and let us know:link,link1
Can you share a colab gist of the code and error reported

@Saduf2019 Saduf2019 added 2.6.0 stat:awaiting response Status - Awaiting response from author labels Sep 22, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 29, 2021
@negf
Copy link
Author

negf commented Sep 30, 2021

I don't know if the 2 links are related. As I said, I works with tf 2.5, but fails with tf >= 2.6.

I cannot run an example on colab gist due to limited resources, but here is an example which fails on my system (with the error message from above)
https://colab.research.google.com/drive/1y2YHfcZIv5lTJscgCKopZMHXGdxU1AND?usp=sharing

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Sep 30, 2021
@Saduf2019
Copy link
Contributor

@negf
Are you still facing the issue

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Oct 27, 2021
@negf
Copy link
Author

negf commented Oct 28, 2021

Yes, on 2.8.0-dev20211027 the problem persist.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 30, 2021
@Saduf2019 Saduf2019 assigned jvishnuvardhan and unassigned Saduf2019 Nov 2, 2021
@jvishnuvardhan jvishnuvardhan added comp:gpu GPU related issues comp:core issues related to core part of tensorflow stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 15, 2021
@rohan100jain
Copy link
Member

Looks like the root cause is that in eager mode, we're running out of memory on the GPU device. As a result, its unable to copy the input to the device.

Can you try out smaller values of "n" in your program to see what succeeds?
Also can you check nvidia-smi after model creation / compilation to see how much memory is consumed? And compare that between enabling and disabling eager execution?

@negf
Copy link
Author

negf commented Jan 18, 2022

With eager memory usage is way higher, see below. With eager execution the memory usage increase also from the first to the second epoch.
And I don't know if it is related, while with n=9000 it runs with eager execution on the first run, a subsequent run of model.fit gives the same error message as if it would run out of memory. Similarly with n=4500, there I can make 2 runs before I run out of memory on the third execution of model.fit. In the 2nd run the memory usage reported from nvidia-smi is double to that of the first run. Looks like the memory is not properly deallocated. This does not happen when eager execution is turned off.

                 n       MEM(e=1)   MEM(e>1)

with eager 9000 19413MiB 23765MiB
w/o eager 9000 1365MiB 1365MiB
with eager 4500 10389MiB 12629MiB
w/o eager 4500 1365MiB 1365MiB

@adriendoerig
Copy link

Hi, I am facing the same issue. My code works with tf 2.4 but fails with tf>=2.6. Has this been resolved?

For me, the error occurs when trying to create a tf.data.Dataset from large numpy arrays.

Is it because tf>=2.6 handles GPU memory differently? Is it possible that it tries to load the dataset onto the GPU, while in earlier versions the dataset was stored on the CPU?

@hugo-ricateau
Copy link

@jvishnuvardhan, any update on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 comp:core issues related to core part of tensorflow comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug
Projects
None yet
Development

No branches or pull requests

8 participants