Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 9] Bad file descriptor raised on program exit #50487

Closed
charliermarsh opened this issue Jun 28, 2021 · 32 comments
Closed

OSError: [Errno 9] Bad file descriptor raised on program exit #50487

charliermarsh opened this issue Jun 28, 2021 · 32 comments
Assignees
Labels
comp:dist-strat Distribution Strategy related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.5 Issues related to TF 2.5 type:bug Bug

Comments

@charliermarsh
Copy link

charliermarsh commented Jun 28, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.5.0-rc3-213-ga4dfb8d1a71 2.5.0
  • Python version: Python 3.8.5
  • CUDA/cuDNN version: 11.2 / 8.1.0.77-1
  • GPU model and memory: P100

Describe the current behavior

When using MirroredStrategy as a context manager, Python raises an ignored exception on program exit:

Exception ignored in: <function Pool.__del__ at 0x7f21f942e4c0>
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/root/miniconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/root/miniconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Describe the expected behavior

Python exits without the aforementioned exception. (In my testing, there is no such exception raised on TensorFlow 2.4.0, so this seems new in TensorFlow 2.5.0.)

Contributing

  • Do you want to contribute a PR? (yes/no): No

Standalone code to reproduce the issue

import tensorflow


def f():
    strategy = tensorflow.distribute.MirroredStrategy()
    with strategy.scope():
        tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same")(
            tensorflow.keras.layers.Input(shape=(88, 88, 3))
        )


f()

Removing the strategy.scope() causes the program to exit without the ignored exception, as does removing the function definition (i.e., getting rid of def f() and f(), and invoking at the top level).

@tilakrayal tilakrayal added TF 2.5 Issues related to TF 2.5 comp:dist-strat Distribution Strategy related issues comp:keras Keras related issues labels Jun 29, 2021
@SysuJayce
Copy link

can confirm this in tf2.5.0 from pypi

@tilakrayal tilakrayal added TF 2.4 for issues related to TF 2.4 and removed TF 2.5 Issues related to TF 2.5 labels Jun 29, 2021
@tilakrayal
Copy link
Contributor

@crm416 ,

Can you please try to execute the code in tf v2.5 and let us know if you are facing same issue? Thanks!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jun 29, 2021
@charliermarsh
Copy link
Author

@tilakrayal - Yes, this only occurs for me in tf v2.5 (and not in tf v2.3 or tf v2.4).

@tilakrayal tilakrayal added TF 2.5 Issues related to TF 2.5 and removed TF 2.4 for issues related to TF 2.4 stat:awaiting response Status - Awaiting response from author labels Jun 29, 2021
@tilakrayal tilakrayal assigned ymodak and unassigned tilakrayal Jun 29, 2021
@ymodak ymodak added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed comp:keras Keras related issues labels Jun 29, 2021
@bryanlimy
Copy link

Same issue in tf v2.6. OSError on program exit if strategy.scope() is called within a function.

The following code causes OSError on exit.

import tensorflow as tf

def main():
  strategy = tf.distribute.MirroredStrategy()
  print(f'\nNumber of devices: {strategy.num_replicas_in_sync}\n')
  with strategy.scope():
    model = tf.keras.Sequential([tf.keras.layers.Dense(10)])
    model.compile(
      loss=tf.keras.losses.MSE,
      optimizer=tf.keras.optimizers.Adam(),
      metrics=['accuracy']
    )

  print('\nDONE\n')

if __name__ == '__main__':
  main()

with the following output:

2021-08-27 12:00:25.516889: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-27 12:00:32.832857: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-08-27 12:00:32.832944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9659 MB memory:  -> device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:1c:00.0, compute capability: 7.5
2021-08-27 12:00:32.834864: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-08-27 12:00:32.834898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9659 MB memory:  -> device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:1d:00.0, compute capability: 7.5

Number of devices: 2

DONE

Exception ignored in: <function Pool.__del__ at 0x7fbecd304040>
Traceback (most recent call last):
  File "/miniconda3/envs/test/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/miniconda3/envs/test/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/miniconda3/envs/test/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/miniconda3/envs/test/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/miniconda3/envs/test/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

Whereas the one below is fine

import tensorflow as tf

strategy = tf.distribute.MirroredStrategy()
print(f'\nNumber of devices: {strategy.num_replicas_in_sync}\n')
with strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(10)])
  model.compile(
    loss=tf.keras.losses.MSE,
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
  )

print('\nDONE\n')

Also tested the same code snippet with tf v2.4 and it ran fine in both cases.

@jayfurmanek
Copy link
Contributor

jayfurmanek commented Aug 30, 2021

I see a similar error when running the recommendation model from the models repo on TF2.5.0 and later on python3.8
https://github.com/tensorflow/models/tree/v2.5.1/official/recommendation

python ncf_keras_main.py --data_dir=./data --dataset=ml-1m
I0830 13:25:38.313174 140736018925024 ncf_keras_main.py:331] Keras evaluation is done.
I0830 13:25:38.313945 140736018925024 ncf_keras_main.py:555] Result is {'loss': 0.3801446557044983, 'eval_loss': 0.0, 'eval_hit_rate': 0.09089403396520465, 'step_timestamp_log': ['BatchTimestamp<batch_index: 0, timestamp: 1630344333.780229>', 'BatchTimestamp<batch_index: 100, timestamp: 1630344337.5320396>'], 'train_finish_time': 1630344337.9485285, 'avg_exp_per_second': 2638725.9874249455}
Exception ignored in: <function Pool.__del__ at 0x7fffa3c7cf70>
Traceback (most recent call last):
  File "/tmp/furmanek/miniconda3/envs/opence-conda-env-py3.8-cuda-openmpi-11.2/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/tmp/furmanek/miniconda3/envs/opence-conda-env-py3.8-cuda-openmpi-11.2/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/tmp/furmanek/miniconda3/envs/opence-conda-env-py3.8-cuda-openmpi-11.2/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/tmp/furmanek/miniconda3/envs/opence-conda-env-py3.8-cuda-openmpi-11.2/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/tmp/furmanek/miniconda3/envs/opence-conda-env-py3.8-cuda-openmpi-11.2/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

@jayfurmanek
Copy link
Contributor

The other interesting thing is this only happens (for me at least) on py38 and py39. It runs just fine on py37, so maybe this is a python bug. Perhaps this one?
https://bugs.python.org/issue39995

@npanpaliya
Copy link
Contributor

I tried changing the MirroredStrategy to OneDeviceStrategy and the exception went away. So, not sure if it is an issue caused by both combination of python and TF problems.

@tekumara
Copy link

This happens in TF 2. 7 too with python 3.9

I think it's because MirroredStrategy creates a multiprocessing ThreadPool, but doesn't close it before the program ends, so its resources aren't properly cleaned up and it errors on shutdown.

You can explicitly close the pool on exit using:

import atexit

....

strategy = tf.distribute.MirroredStrategy()

atexit.register(strategy._extended._collective_ops._pool.close) # type: ignore

Which should prevent the error for now (until there is a fix).

@Tingbopku
Copy link
Contributor

This happens in TF 2. 7 too with python 3.9

I think it's because MirroredStrategy creates a multiprocessing ThreadPool, but doesn't close it before the program ends, so its resources aren't properly cleaned up and it errors on shutdown.

You can explicitly close the pool on exit using:

import atexit

....

strategy = tf.distribute.MirroredStrategy()

atexit.register(strategy._extended._collective_ops._pool.close) # type: ignore

Which should prevent the error for now (until there is a fix).

This works for me, thank you!

@negvet
Copy link

negvet commented Feb 4, 2022

For me in TF 2.5.0 the problem is hardware-dependant.
It is present with V100, but not with 2080 Ti.

JanuszL added a commit to JanuszL/DALI that referenced this issue Mar 16, 2022
- fixes lack of multiprocess thread pool teardown in
  TF Mirrored strategy as stated in tensorflow/tensorflow#50487

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
JanuszL added a commit to JanuszL/DALI that referenced this issue Mar 16, 2022
- fixes lack of multiprocess thread pool teardown in
  TF Mirrored strategy as stated in tensorflow/tensorflow#50487

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@suchunxie
Copy link

suchunxie commented Jul 6, 2022 via email

@npanpaliya
Copy link
Contributor

@suchunxie
Copy link

Hi, @npanpaliya It workes! I tried this way before but not worked, and after you pointed to me I checked it again, found there's a back slash lost before I pass --distribution_strategy. Stupid me > <.
Thanks greatly for your help !

@npanpaliya
Copy link
Contributor

Hi @suchunxie, This is great! Glad to hear this! :)

@QuantHao
Copy link

It seems that a fix is submitted #56279 (comment) and users need to wait for tf 2.10 release.

@gadagashwini gadagashwini removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 29, 2022
@gadagashwini
Copy link
Contributor

Hi @charliermarsh, Looks like issue is resolved with stable version Tensorflow 2.9

>>> import tensorflow
>>> def f():
...    strategy = tensorflow.distribute.MirroredStrategy()
...    with strategy.scope():
...       tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same")(
...             tensorflow.keras.layers.Input(shape=(88, 88, 3))
...         )
... 
>>> f()
2022-07-29 10:47:18.249928: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.250980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.377905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.378958: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.379854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.380701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.384793: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 10:47:18.841990: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.842974: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.843762: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.844505: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.845247: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:18.846005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.846699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.847686: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.848526: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.849336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.850099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.850823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13725 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5
2022-07-29 10:47:20.854556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-07-29 10:47:20.855362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13791 MB memory:  -> device: 1, name: Tesla T4, pci bus id: 0000:00:05.0, compute capability: 7.5
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')

@gadagashwini gadagashwini added the stat:awaiting response Status - Awaiting response from author label Jul 29, 2022
@gadagashwini gadagashwini self-assigned this Jul 29, 2022
@ZJaume
Copy link

ZJaume commented Jul 29, 2022

Not for me. Using TensorFlow 2.9.1 when exiting the interpreter, it shows the exception:

In [1]: import tensorflow
   ...: def f():
   ...:    strategy = tensorflow.distribute.MirroredStrategy()
   ...:    with strategy.scope():
   ...:       tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same")(
   ...:             tensorflow.keras.layers.Input(shape=(88, 88, 3))
   ...:         )
   ...: f()
2022-07-29 12:54:45.169943: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 12:54:47.305006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 429 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:17:00.0, compute capability: 7.5
2022-07-29 12:54:47.305948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9651 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5
2022-07-29 12:54:47.306459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 427 MB memory:  -> device: 2, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5
2022-07-29 12:54:47.306939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 429 MB memory:  -> device: 3, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:b4:00.0, compute capability: 7.5
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')

In [2]:
Do you really want to exit ([y]/n)?
Exception ignored in: <function Pool.__del__ at 0x7ff160d75c10>
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
    self._change_notifier.put(None)
  File "/home/user/miniconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/home/user/miniconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/user/miniconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/user/miniconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 5, 2022
@gadagashwini
Copy link
Contributor

Hi @ZJaume, Could you share the system configuration, I am not able to replicate the issue. Thank you!

@ZJaume
Copy link

ZJaume commented Aug 9, 2022

Hi, sorry for the inconvenience but now I've tried with a fresh new virtual environment and the error just disappeared, so I think the issue can be closed. The virtual environment that is throwing the exception has had many different tensorflow versions from 2.3 to 2.9. Maybe some outdated dependency is causing the error.

In case you want to reproduce it my versions are:
Tensorflow version: 2.9.1
Python version: 2.8.13
OS: Ubuntu 18.04

And the output of pip freeze:

absl-py==1.1.0
aiohttp==3.8.1
aiosignal==1.2.0
antlr4-python3-runtime==4.8
astunparse==1.6.3
async-timeout==4.0.1
atomicwrites==1.4.0
attrs==21.2.0
backcall==0.2.0
bitarray==2.3.7
blessed==1.19.0
cachetools==4.2.4
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.7
clang==5.0
click==8.0.3
colorama==0.4.4
Cython==0.29.24
dataclasses==0.6
datasets==1.16.1
decorator==5.1.0
dill==0.3.4
enlighten==1.10.1
fairseq==0.10.2
fastspell==0.1.5
fasttext==0.9.2
filelock==3.3.2
flatbuffers==1.12
frozenlist==1.2.0
fsspec==2021.11.1
ftfy==6.1.1
fuzzywuzzy==0.18.0
gast==0.4.0
gensim==4.1.2
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.41.1
h5py==3.1.0
hanzidentifier==1.0.2
huggingface-hub==0.1.0
hunspell==0.5.5
hydra-core==1.1.1
idna==3.3
importlib-resources==5.4.0
ipython==7.29.0
jedi==0.18.0
joblib==0.14.1
keras==2.9.0
Keras-Preprocessing==1.1.2
latexcodec==2.0.1
libclang==13.0.0
Markdown==3.3.4
matplotlib-inline==0.1.3
monocleaner==1.0
more-itertools==8.10.0
mtdata==0.3.1
multidict==5.2.0
multiprocess==0.70.12.2
nltk==3.6.5
numpy==1.23.0
oauthlib==3.1.1
omegaconf==2.1.1
opt-einsum==3.3.0
packaging==21.2
pandas==1.3.5
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.4.0
pluggy==0.13.1
portalocker==2.3.0
prefixed==0.3.2
prompt-toolkit==3.0.22
protobuf==3.19.1
psutil==5.8.0
ptyprocess==0.7.0
py==1.10.0
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.8.1
pybtex==0.24.0
pycld2==0.31
pycparser==2.21
Pygments==2.10.0
pyparsing==2.4.7
pypinyin==0.46.0
pytest==5.1.2
python-dateutil==2.8.2
python-Levenshtein==0.12.2
pytz==2021.3
PyYAML==5.4.1
regex==2022.3.2
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
ruamel.yaml==0.17.17
ruamel.yaml.clib==0.2.6
sacrebleu==2.1.0
sacremoses==0.0.43
scikit-learn==0.22.1
scipy==1.4.1
sentence-transformers==2.1.0
sentencepiece==0.1.94
six==1.15.0
smart-open==5.2.1
tabulate==0.8.9
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.24.0
termcolor==1.1.0
tf-estimator-nightly==2.8.0.dev2021122109
threadpoolctl==3.0.0
tokenizers==0.12.1
toolwrapper==0.4.1
torch==1.10.1
torch-train==0.0.3
torchsummary==1.5.1
torchvision==0.11.2
tqdm==4.62.3
traitlets==5.1.1
transformers==4.20.1
typing-extensions==3.7.4.3
Unidecode==1.2.0
urllib3==1.26.7
wcwidth==0.2.5
Werkzeug==2.0.2
wrapt==1.12.1
xxhash==2.0.2
yarl==1.7.2
zhon==1.1.5
zipp==3.7.0

@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

JanuszL added a commit to JanuszL/DALI that referenced this issue Dec 16, 2022
…ersion

- fixes the issue with the latest TensorFlow version and YOLO
  example that results in
  `AttributeError: 'CollectiveAllReduce' object has no attribute '_pool'`.
  The issue comes from the workaround for
  tensorflow/tensorflow#50487

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
JanuszL added a commit to NVIDIA/DALI that referenced this issue Dec 16, 2022
…ersion (#4522)

- fixes the issue with the latest TensorFlow version and YOLO
  example that results in
  `AttributeError: 'CollectiveAllReduce' object has no attribute '_pool'`.
  The issue comes from the workaround for
  tensorflow/tensorflow#50487

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Bruno-Messias added a commit to Bruno-Messias/yoloret that referenced this issue Jan 4, 2023
MirroredStrategy creates a multiprocessing ThreadPool, but doesn't close it before the program ends, so its resources aren't properly cleaned up and it errors on shutdown
(tensorflow/tensorflow#50487 (comment))
krfricke added a commit to ray-project/ray that referenced this issue Feb 17, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
cadedaniel pushed a commit to cadedaniel/ray that referenced this issue Mar 22, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this issue Mar 22, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
peytondmurray pushed a commit to peytondmurray/ray that referenced this issue Mar 22, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
scottsun94 pushed a commit to scottsun94/ray that referenced this issue Mar 28, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
cassidylaidlaw pushed a commit to cassidylaidlaw/ray that referenced this issue Mar 28, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this issue Apr 22, 2023
We are currently at 2.9.1 which runs into some cosmetic errors (ray-project#25142, tensorflow/tensorflow#50487 (comment)). This PR upgrades to the latest Tensorflow release.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Avnish <avnishnarayan@gmail.com>
Co-authored-by: Avnish <avnishnarayan@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:dist-strat Distribution Strategy related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.5 Issues related to TF 2.5 type:bug Bug
Projects
None yet
Development

No branches or pull requests