Can't find libdevice directory when training object detection

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [ j] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [j ] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [ j] I checked to make sure that this issue has not already been filed.

## 1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

## 2. Describe the bug

While attempting to train a model I get the following errors:
```text
Instructions for updating:
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
2022-06-16 08:42:21.581826: W tensorflow/core/framework/op_kernel.cc:1733] UNKNOWN: JIT compilation failed.
Traceback (most recent call last):
  File "E:\Git\models\research\object_detection\model_main_tf2.py", line 114, in <module>
    tf.compat.v1.app.run()
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\platform\app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\absl\app.py", line 312, in run
    _run_main(main, args)
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\absl\app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "E:\Git\models\research\object_detection\model_main_tf2.py", line 105, in main
    model_lib_v2.train_loop(
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 685, in train_loop
    losses_dict = _dist_train_step(train_input_iter)
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'train_input_images/write_summary/mod' defined at (most recent call last):
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\threading.py", line 930, in _bootstrap
      self._bootstrap_inner()
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\threading.py", line 973, in _bootstrap_inner
      self.run()
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 629, in train_step_fn
      if record_summaries:
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 630, in train_step_fn
      tf.compat.v2.summary.image(
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorboard\plugins\image\summary_v2.py", line 141, in image
      tag=tag, tensor=lazy_tensor, step=step, metadata=summary_metadata
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 599, in <lambda>
      lambda: global_step % num_steps_per_iteration == 0):
Node: 'train_input_images/write_summary/mod'
Detected at node 'train_input_images/write_summary/mod' defined at (most recent call last):
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\threading.py", line 930, in _bootstrap
      self._bootstrap_inner()
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\threading.py", line 973, in _bootstrap_inner
      self.run()
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 629, in train_step_fn
      if record_summaries:
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 630, in train_step_fn
      tf.compat.v2.summary.image(
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorboard\plugins\image\summary_v2.py", line 141, in image
      tag=tag, tensor=lazy_tensor, step=step, metadata=summary_metadata
    File "C:\Users\mergt\AppData\Local\Programs\Python\Python39\lib\site-packages\object_detection\model_lib_v2.py", line 599, in <lambda>
      lambda: global_step % num_steps_per_iteration == 0):
Node: 'train_input_images/write_summary/mod'
2 root error(s) found.
  (0) UNKNOWN:  JIT compilation failed.
         [[{{node train_input_images/write_summary/mod}}]]
         [[Identity_5/_494]]
         [[{{node train_input_images/write_summary/mod}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dist_train_step_67016]
```

## 3. Steps to reproduce

I am following [this](https://blog.tensorflow.org/2021/01/custom-object-detection-in-browser.html) tutorial and
copying the code from [this collab](https://colab.research.google.com/drive/1MdzgmdYJk947sXyls45V7auMPttHelBZ?usp=sharing)

## 4. Expected behavior

The model should train with no errors.

## 5. Additional context

I deviated from the tutorial in that I installed protobuf 3.20.1, due to otherwise getting an error like this.
```text
ImportError: cannot import name 'builder' from 'google.protobuf.internal' 
```

This error does not occur if I uninstall CUDA

## 6. System information

- Windows 10 Pro 21H1
- TensorFlow installed via pip
- TensorFlow version: 2.9.1
- Python 3.9.7
- Cuda: V11.7.64, cuDNN:  8.4.1
- GPU model and memory: RTX 2070s 8Gb



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't find libdevice directory when training object detection #10673

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't find libdevice directory when training object detection #10673

Description

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions