Training starts
Traceback (most recent call last):
File "FFP_/train_w_pruning.py", line 76, in <module>
train_step(*data)
File "/home/cvip/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 787, in __call__
result = self._call(*args, **kwds)
File "/home/cvip/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 854, in _call
filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
File "/home/cvip/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1920, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/cvip/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 561, in call
ctx=ctx)
File "/home/cvip/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access resource ResNet/conv/kernel/replica_1_879 located in device /job:localhost/replica:0/task:0/device:GPU:0 [Op:__inference_train_step_dist_88943]
System information
Describe the current behavior
When I train my model on multi-gpu with XLA compiling below error is occurred.
Describe the expected behavior
I want to compile my multi-gpu code but it seems unavailable.
Standalone code to reproduce the issue
https://github.com/sseung0703/TF2-multi-gpu-training