Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

jaimeff · 2020-09-27T18:53:07Z

System information

I have used the test example from here
OS: Windows 10
Tensorflow 2.3.1 (installed with pip):
Python 3.6
CUDA 10.1
nVidia GeForce GTX 1050

I cannot make the example work with Debugger V2.

By executing the example from the link above I get the following output:

D:\src\ai\visualthing\venv\Scripts\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 50790 --file D:/src/ai/visualthing/debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

pydev debugger: process 8484 is connecting

Connected to pydev debugger (build 192.5728.105)
2020-09-27 20:31:08.451881: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
INFO:tensorflow:Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
I0927 20:31:11.284601  1260 dumping_callback.py:871] Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
2020-09-27 20:31:11.557685: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-27 20:31:11.584474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.584652: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.588047: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.591169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.592204: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.595773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.597733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.605092: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.605244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:11.605644: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-27 20:31:11.614513: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4c545b410 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:11.614778: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-27 20:31:11.615119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.615425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.615585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.615691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.615830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.615921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.616011: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.616099: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.616214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:12.188255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-27 20:31:12.188425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-27 20:31:12.188484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-27 20:31:12.188686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2987 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-09-27 20:31:12.191306: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4e366a9f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:12.191431: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2020-09-27 20:31:13.537229: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2060, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2054, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1405, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1412, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 238, in <module>
    absl.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 223, in main
    y = model(x_train)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1933, in _call_flat
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 550, in call
    ctx=ctx)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 138, in execute_with_callbacks
    tensors = quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
	 [[x/_1]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
0 successful operations.
0 derived errors ignored. [Op:__forward_model_324]

Function call stack:
model -> model

INFO:tensorflow:Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)
I0927 20:31:55.200698  1260 dumping_callback.py:895] Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)

Process finished with exit code 1

I have also tried to build my own example with no success, same error:
DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53)

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2020-09-28T15:58:42Z

@jaimeff
Please share stand alone code or if possible share a colab gist with error reported.

jaimeff · 2020-09-28T19:05:04Z

@Saduf2019
I have used the exact same example as in this repo https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/examples/v2/debug_mnist_v2.py

you can run it by executing:

python debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

In my specific case, I have run it with the following command:

D:\src\ai\visualthing\venv\Scripts\python.exe D:/src/ai/visualthing/debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

chrisacc · 2020-09-30T14:02:42Z

Same problem here. Searched all over for a solution and can't find one. Any help would be appreciated.

caisq · 2020-10-12T18:34:49Z

@Saduf2019 Does the approach suggested by @jaimeff solve your problem? I'm not able to reproduce your issue with either the latest tf-nightly (2.4.0-dev20201007) or tf 2.3.1. I'm using the command

python -m tensorflow.python.debug.examples.v2.debug_mnist_v2 \
    --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

jaimeff · 2020-10-13T13:12:21Z

I get the following using that command:

(venv) D:\src\ai\visualthing>python -m tensorflow.python.debug.examples.v2.debug_mnist_v2 --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH
2020-10-13 15:02:57.350397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
D:\src\ai\visualthing\venv\Scripts\python.exe: Error while finding module specification for 'tensorflow.python.debug.examples.v2.debug_mnist_v2' (ModuleNotFoundError: No module named 'tensorflow.python.debug.examples')

This is the output of 'pip list' that is related to tensorflow:

tensorboard              2.3.0
tensorboard-plugin-wit   1.7.0
tensorflow               2.3.1
tensorflow-addons        0.11.2
tensorflow-estimator     2.3.0

So apparently, tensorflow package 2.3.1 installed with pip doesn't have DebugV2 support?
Is there any other package that I'm missing?

caisq · 2020-10-13T13:20:46Z

@jaimeff This may be an operating system-specific issue. I see you are using Windows. I failed to reproduce the issue on Linux.

Can you try running this Python script directly (instead of using python -m ...) and see what happens? https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/examples/v2/debug_mnist_v2.py

jaimeff · 2020-10-13T13:39:26Z

Yes, that's exactly what I did when I saw I didn't have the file. So I copied it and ran it.

The full output is in the first message (click here) (the one that I posted to open this issue).

To summarize I'm getting the error message DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53)

MareSeestern · 2020-12-11T16:35:49Z

I got the same error. You have got a fix?

jaimeff · 2020-12-16T10:54:24Z

I solved my problem by removing this line:
tf.debugging.experimental.enable_dump_debug_info(path, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)

AND restarting my Kernel after removing this line.

System information
OS: Windows 10
Tensorflow 2.3.0 (installed with pip):
Python 3.8
CUDA 10.1
nVidia GeForce GTX 1050

Yes, but now you don't have debugging information. Am I right?

The problem is that we cannot use Debugger V2 on Windows 10. The whole purpose of this ticket is to figure out how to make it work. Of course, if you disable it the problem is gone :-D

…indows - A test in debug_v2_ops_test previously called `np.power(2, 53)` without specifying dtype. As a result, the output had the int32 dtype on Windows and caused overflowing. This apparently does not happen on Linux or Mac. - This CL fixes that by explicitly specifying `dtype=np.int64` in the call. - In debug_events_write.cc, check for whether the DebugEventsWriter instance is initialized and return early if not so. - This resolves a directory-not-empty test failure in debug_events_writer_test on Windows This is a step towards fixing #43608 PiperOrigin-RevId: 349569528 Change-Id: I8112f8faebe662542e80c03d5d95e8e089446fe8

Nozziel · 2021-02-17T19:28:35Z

I can still reproduce this bug on tf 2.4.1 on windows.
tf.debugging.experimental.enable_dump_debug_info still results in the mentioned exception

mjohenneken · 2021-03-25T14:46:36Z

I ran into the same issue on Windows 10 with tf 2.3.0

mjohenneken · 2021-03-25T15:19:31Z

I played around with the parameters. It seems that the debugger runs with the defaults. i.e.
tf.debugging.experimental.enable_dump_debug_info( "tfdbg_logs",tensor_debug_mode="NO_TENSOR" ). But other options for the parameter tensor_debug_mode fail.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packages\wandb\integration\keras\keras.py:119) ]]
	 
[[broadcast_weights_1/assert_broadcastable/is_valid_shape/else/_486/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/then/_1492/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_2860]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packag
```es\wandb\integration\keras\keras.py:119) ]]

b3326023 · 2021-04-26T01:46:21Z

Same problem here. Has anyone found the solution?

dantarion · 2021-05-01T06:47:57Z

on Win10 TF 2.4.1 with the same error message

jainmilan · 2021-06-25T23:04:24Z

Getting same issue in TF 2.5.0 in Windows 10, any work around?

mjohenneken · 2021-06-26T11:36:29Z

As I mention only the NO_TENSOR debug_mode works. However, it not really a workaround if you want to get details on your tensors. Other than that you could try to set up WSL or use a Linux distro and enjoy the journey of setting up another TensorFlow environment.

Scot-Survivor · 2021-08-30T11:30:15Z

Same issues TF 2.5.1 on Windows 10.

…n Windows 10.

fabien-corso · 2022-07-20T11:51:52Z

Hi, I have just run into this issue with Tensorflow 2.9.1 and windows 10.
A workaround was to set eager mode to true tf.config.run_functions_eagerly(True), I have no idea if it is anything of a good workaround though, but at least it runs with tensor_debug_mode='FULL_HEALTH'.

sushreebarsa · 2022-12-08T09:44:17Z

@jainmilan Could you try to use calling tf.config.run_functions_eagerly(True) that will make all invocations of tf.function run eagerly instead of running as a traced graph function. This can be useful for debugging. Please refer to this doc and let us know if it helps?
Thank you!

google-ml-butler · 2022-12-15T10:07:00Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2022-12-22T10:12:19Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2022-12-22T10:12:25Z

Are you satisfied with the resolution of your issue?
Yes
No

JakeTheSnake3p0 · 2022-12-30T18:54:20Z

This issue is still happening. If I run functions eagerly, no graph data is available (as well as no graph executions to select). All I have is a massive list of python execution blocks on the timeline that are impossible to reasonably navigate and don't yield any sort of useful information aside from the stack trace and barebones output.

EricWu23 · 2023-02-21T20:37:05Z

Encountered the same issue (tensorflow 2.10.1 on Windows 10).
Has a solution been found?

jerodway · 2023-02-25T03:04:41Z

same here with tensorflow 2.10.1 on Windows 10

google-ml-butler bot assigned Saduf2019 Sep 27, 2020

Saduf2019 added the TF 2.3 Issues related to TF 2.3 label Sep 28, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Sep 28, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 30, 2020

Saduf2019 added comp:keras Keras related issues type:bug Bug labels Oct 1, 2020

Saduf2019 assigned gowthamkpr and unassigned Saduf2019 Oct 1, 2020

caisq self-assigned this Oct 12, 2020

gowthamkpr removed their assignment Oct 12, 2020

gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 12, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 15, 2020

sushreebarsa added TF 2.5 Issues related to TF 2.5 and removed TF 2.3 Issues related to TF 2.3 labels Aug 11, 2021

Scot-Survivor pushed a commit to Gavin-Development/GavinBackend that referenced this issue Sep 16, 2021

Change to NO_TENSOR to avoid this issue tensorflow/tensorflow#43608 o…

d9b3038

…n Windows 10.

sushreebarsa self-assigned this Dec 8, 2022

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Dec 8, 2022

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 15, 2022

google-ml-butler bot closed this as completed Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

jaimeff commented Sep 27, 2020 •

edited

Saduf2019 commented Sep 28, 2020

jaimeff commented Sep 28, 2020

chrisacc commented Sep 30, 2020

caisq commented Oct 12, 2020

jaimeff commented Oct 13, 2020

caisq commented Oct 13, 2020

jaimeff commented Oct 13, 2020

MareSeestern commented Dec 11, 2020

jaimeff commented Dec 16, 2020

Nozziel commented Feb 17, 2021

mjohenneken commented Mar 25, 2021

mjohenneken commented Mar 25, 2021

b3326023 commented Apr 26, 2021

dantarion commented May 1, 2021

jainmilan commented Jun 25, 2021

mjohenneken commented Jun 26, 2021

Scot-Survivor commented Aug 30, 2021

fabien-corso commented Jul 20, 2022

sushreebarsa commented Dec 8, 2022

google-ml-butler bot commented Dec 15, 2022

google-ml-butler bot commented Dec 22, 2022

google-ml-butler bot commented Dec 22, 2022

JakeTheSnake3p0 commented Dec 30, 2022

EricWu23 commented Feb 21, 2023 •

edited

jerodway commented Feb 25, 2023

Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

Comments

jaimeff commented Sep 27, 2020 • edited

System information

Saduf2019 commented Sep 28, 2020

jaimeff commented Sep 28, 2020

chrisacc commented Sep 30, 2020

caisq commented Oct 12, 2020

jaimeff commented Oct 13, 2020

caisq commented Oct 13, 2020

jaimeff commented Oct 13, 2020

MareSeestern commented Dec 11, 2020

jaimeff commented Dec 16, 2020

Nozziel commented Feb 17, 2021

mjohenneken commented Mar 25, 2021

mjohenneken commented Mar 25, 2021

b3326023 commented Apr 26, 2021

dantarion commented May 1, 2021

jainmilan commented Jun 25, 2021

mjohenneken commented Jun 26, 2021

Scot-Survivor commented Aug 30, 2021

fabien-corso commented Jul 20, 2022

sushreebarsa commented Dec 8, 2022

google-ml-butler bot commented Dec 15, 2022

google-ml-butler bot commented Dec 22, 2022

google-ml-butler bot commented Dec 22, 2022

JakeTheSnake3p0 commented Dec 30, 2022

EricWu23 commented Feb 21, 2023 • edited

jerodway commented Feb 25, 2023

jaimeff commented Sep 27, 2020 •

edited

EricWu23 commented Feb 21, 2023 •

edited