Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 #43608

Closed
jaimeff opened this issue Sep 27, 2020 · 25 comments
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.5 Issues related to TF 2.5 type:bug Bug

Comments

@jaimeff
Copy link

jaimeff commented Sep 27, 2020

System information

  • I have used the test example from here
  • OS: Windows 10
  • Tensorflow 2.3.1 (installed with pip):
  • Python 3.6
  • CUDA 10.1
  • nVidia GeForce GTX 1050

I cannot make the example work with Debugger V2.

By executing the example from the link above I get the following output:

D:\src\ai\visualthing\venv\Scripts\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 50790 --file D:/src/ai/visualthing/debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

pydev debugger: process 8484 is connecting

Connected to pydev debugger (build 192.5728.105)
2020-09-27 20:31:08.451881: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
INFO:tensorflow:Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
I0927 20:31:11.284601  1260 dumping_callback.py:871] Enabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir, tensor debug mode: FULL_HEALTH)
2020-09-27 20:31:11.557685: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-27 20:31:11.584474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.584652: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.588047: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.591169: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.592204: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.595773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.597733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.605092: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.605244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:11.605644: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-27 20:31:11.614513: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4c545b410 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:11.614778: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-09-27 20:31:11.615119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-09-27 20:31:11.615425: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-09-27 20:31:11.615585: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-09-27 20:31:11.615691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-09-27 20:31:11.615830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-09-27 20:31:11.615921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-09-27 20:31:11.616011: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-09-27 20:31:11.616099: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-09-27 20:31:11.616214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-09-27 20:31:12.188255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-27 20:31:12.188425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-27 20:31:12.188484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-27 20:31:12.188686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2987 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-09-27 20:31:12.191306: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f4e366a9f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-09-27 20:31:12.191431: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2020-09-27 20:31:13.537229: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2060, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 2054, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1405, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\pydevd.py", line 1412, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 238, in <module>
    absl.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "D:\src\ai\visualthing\venv\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "D:/src/ai/visualthing/debug_mnist_v2.py", line 223, in main
    y = model(x_train)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1933, in _call_flat
    cancellation_manager=cancellation_manager)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\function.py", line 550, in call
    ctx=ctx)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 138, in execute_with_callbacks
    tensors = quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
  File "D:\src\ai\visualthing\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
	 [[x/_1]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26
	 [[{{node StatefulPartitionedCall/MatMul/ReadVariableOp/DebugNumericSummaryV2}}]]
0 successful operations.
0 derived errors ignored. [Op:__forward_model_324]

Function call stack:
model -> model

INFO:tensorflow:Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)
I0927 20:31:55.200698  1260 dumping_callback.py:895] Disabled dumping callback in thread MainThread (dump root: /tmp/tfdbg2_logdir)

Process finished with exit code 1

I have also tried to build my own example with no success, same error:
DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53)

@jaimeff jaimeff changed the title debug_mnist_v2.py Error: Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 Debugger V2 not working. Invalid argument: DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:26 Sep 28, 2020
@Saduf2019 Saduf2019 added the TF 2.3 Issues related to TF 2.3 label Sep 28, 2020
@Saduf2019
Copy link
Contributor

@jaimeff
Please share stand alone code or if possible share a colab gist with error reported.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Sep 28, 2020
@jaimeff
Copy link
Author

jaimeff commented Sep 28, 2020

@Saduf2019
I have used the exact same example as in this repo https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/examples/v2/debug_mnist_v2.py

you can run it by executing:

python debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

In my specific case, I have run it with the following command:

D:\src\ai\visualthing\venv\Scripts\python.exe D:/src/ai/visualthing/debug_mnist_v2.py --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 30, 2020
@chrisacc
Copy link

Same problem here. Searched all over for a solution and can't find one. Any help would be appreciated.

@Saduf2019 Saduf2019 added comp:keras Keras related issues type:bug Bug labels Oct 1, 2020
@Saduf2019 Saduf2019 assigned gowthamkpr and unassigned Saduf2019 Oct 1, 2020
@caisq
Copy link
Contributor

caisq commented Oct 12, 2020

@Saduf2019 Does the approach suggested by @jaimeff solve your problem? I'm not able to reproduce your issue with either the latest tf-nightly (2.4.0-dev20201007) or tf 2.3.1. I'm using the command

python -m tensorflow.python.debug.examples.v2.debug_mnist_v2 \
    --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH

@caisq caisq self-assigned this Oct 12, 2020
@gowthamkpr gowthamkpr removed their assignment Oct 12, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 12, 2020
@jaimeff
Copy link
Author

jaimeff commented Oct 13, 2020

I get the following using that command:

(venv) D:\src\ai\visualthing>python -m tensorflow.python.debug.examples.v2.debug_mnist_v2 --dump_dir /tmp/tfdbg2_logdir --dump_tensor_debug_mode FULL_HEALTH
2020-10-13 15:02:57.350397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
D:\src\ai\visualthing\venv\Scripts\python.exe: Error while finding module specification for 'tensorflow.python.debug.examples.v2.debug_mnist_v2' (ModuleNotFoundError: No module named 'tensorflow.python.debug.examples')

This is the output of 'pip list' that is related to tensorflow:

tensorboard              2.3.0
tensorboard-plugin-wit   1.7.0
tensorflow               2.3.1
tensorflow-addons        0.11.2
tensorflow-estimator     2.3.0

So apparently, tensorflow package 2.3.1 installed with pip doesn't have DebugV2 support?
Is there any other package that I'm missing?

@caisq
Copy link
Contributor

caisq commented Oct 13, 2020

@jaimeff This may be an operating system-specific issue. I see you are using Windows. I failed to reproduce the issue on Linux.

Can you try running this Python script directly (instead of using python -m ...) and see what happens? https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/examples/v2/debug_mnist_v2.py

@jaimeff
Copy link
Author

jaimeff commented Oct 13, 2020

Yes, that's exactly what I did when I saw I didn't have the file. So I copied it and ran it.

The full output is in the first message (click here) (the one that I posted to open this issue).

To summarize I'm getting the error message DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53)

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Oct 15, 2020
@MareSeestern
Copy link

I got the same error. You have got a fix?

@jaimeff
Copy link
Author

jaimeff commented Dec 16, 2020

I solved my problem by removing this line:
tf.debugging.experimental.enable_dump_debug_info(path, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)

AND restarting my Kernel after removing this line.

System information

OS: Windows 10
Tensorflow 2.3.0 (installed with pip):
Python 3.8
CUDA 10.1
nVidia GeForce GTX 1050

Yes, but now you don't have debugging information. Am I right?

The problem is that we cannot use Debugger V2 on Windows 10. The whole purpose of this ticket is to figure out how to make it work. Of course, if you disable it the problem is gone :-D

copybara-service bot pushed a commit that referenced this issue Dec 30, 2020
…indows

- A test in debug_v2_ops_test previously called `np.power(2, 53)` without specifying
  dtype. As a result, the output had the int32 dtype on Windows and caused
  overflowing. This apparently does not happen on Linux or Mac.
  - This CL fixes that by explicitly specifying `dtype=np.int64` in the call.
- In debug_events_write.cc, check for whether the DebugEventsWriter instance
  is initialized and return early if not so.
  - This resolves a directory-not-empty test failure in debug_events_writer_test
    on Windows

This is a step towards fixing #43608

PiperOrigin-RevId: 349569528
Change-Id: I8112f8faebe662542e80c03d5d95e8e089446fe8
@Nozziel
Copy link

Nozziel commented Feb 17, 2021

I can still reproduce this bug on tf 2.4.1 on windows.
tf.debugging.experimental.enable_dump_debug_info still results in the mentioned exception

@mjohenneken
Copy link

I ran into the same issue on Windows 10 with tf 2.3.0

@mjohenneken
Copy link

I played around with the parameters. It seems that the debugger runs with the defaults. i.e.
tf.debugging.experimental.enable_dump_debug_info( "tfdbg_logs",tensor_debug_mode="NO_TENSOR" ). But other options for the parameter tensor_debug_mode fail.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packages\wandb\integration\keras\keras.py:119) ]]
	 
[[broadcast_weights_1/assert_broadcastable/is_valid_shape/else/_486/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/then/_1492/broadcast_weights_1/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_2860]]
  (1) Invalid argument:  DebugNumericSummaryV2Op requires tensor_id to be less than or equal to (2^53). Given tensor_id:4156
	 [[node functional_1/batch_normalization_8/FusedBatchNormV3/ReadVariableOp_1/DebugNumericSummaryV2 (defined at C:\ProgramData\Anaconda3\envs\ml\lib\site-packag
```es\wandb\integration\keras\keras.py:119) ]]

@b3326023
Copy link

Same problem here. Has anyone found the solution?

@dantarion
Copy link

on Win10 TF 2.4.1 with the same error message

@jainmilan
Copy link

Getting same issue in TF 2.5.0 in Windows 10, any work around?

@mjohenneken
Copy link

As I mention only the NO_TENSOR debug_mode works. However, it not really a workaround if you want to get details on your tensors. Other than that you could try to set up WSL or use a Linux distro and enjoy the journey of setting up another TensorFlow environment.

@sushreebarsa sushreebarsa added TF 2.5 Issues related to TF 2.5 and removed TF 2.3 Issues related to TF 2.3 labels Aug 11, 2021
@Scot-Survivor
Copy link

Same issues TF 2.5.1 on Windows 10.

Scot-Survivor pushed a commit to Gavin-Development/GavinBackend that referenced this issue Sep 16, 2021
@fabien-corso
Copy link

Hi, I have just run into this issue with Tensorflow 2.9.1 and windows 10.
A workaround was to set eager mode to true tf.config.run_functions_eagerly(True), I have no idea if it is anything of a good workaround though, but at least it runs with tensor_debug_mode='FULL_HEALTH'.

@sushreebarsa
Copy link
Contributor

@jainmilan Could you try to use calling tf.config.run_functions_eagerly(True) that will make all invocations of tf.function run eagerly instead of running as a traced graph function. This can be useful for debugging. Please refer to this doc and let us know if it helps?
Thank you!

@sushreebarsa sushreebarsa self-assigned this Dec 8, 2022
@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Dec 8, 2022
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Dec 15, 2022
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@JakeTheSnake3p0
Copy link

This issue is still happening. If I run functions eagerly, no graph data is available (as well as no graph executions to select). All I have is a massive list of python execution blocks on the timeline that are impossible to reasonably navigate and don't yield any sort of useful information aside from the stack trace and barebones output.

@EricWu23
Copy link

EricWu23 commented Feb 21, 2023

Encountered the same issue (tensorflow 2.10.1 on Windows 10).
Has a solution been found?

@jerodway
Copy link

same here with tensorflow 2.10.1 on Windows 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.5 Issues related to TF 2.5 type:bug Bug
Projects
None yet
Development

No branches or pull requests