Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h5py==3.0.0 causes issues with keras model loads in tensorflow 2.1.0 #44467

Closed
danzafar opened this issue Oct 30, 2020 · 55 comments · Fixed by #45487
Closed

h5py==3.0.0 causes issues with keras model loads in tensorflow 2.1.0 #44467

danzafar opened this issue Oct 30, 2020 · 55 comments · Fixed by #45487
Assignees
Labels

Comments

@danzafar
Copy link

h5py released version 3.0.0 today and it causes this code to fail:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/saving/hdf5_format.py#L182
with error:

File "/databricks/python/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/databricks/python/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 166, in load_model_from_hdf5
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

It looks like in version 2.1.0 the h5py version is not pinned (it is pinned in master), which is causing the issue.

@bhack
Copy link
Contributor

bhack commented Oct 30, 2020

I think it will be hard to expect a backport on 2.1.0.

@danzafar
Copy link
Author

danzafar commented Oct 30, 2020

the new h5py version just came out today: https://pypi.org/project/h5py/#history

Pinning it to a version <3.0.0 (or at least throwing a warning) is a very lightweight change with no risk.

@bhack
Copy link
Contributor

bhack commented Oct 30, 2020

the new h5py version just came out today: https://pypi.org/project/h5py/#history

Pinning it to a version <3.0.0 (or at least throwing a warning) is a very lightweight change with no risk.

Yes but as you know generally we had very few patch releases (e.g. 2.1.x) so It Is quite hard to have wheels with these fixes on old versions.

@danzafar
Copy link
Author

@bhack I'm not going to add any more messages after this, but I think you can see based on these other issues the Keras API is essentially broken because h5py new release. I'm not sure why the version was not pinned as it is in master, but I strongly advise that you pin to h5py==2.10.0 this for all TF >= 2.1.

@bhack
Copy link
Contributor

bhack commented Oct 31, 2020

/cc @mihaimaruseac @angerson there are many Dockerfile and other file not constrained.

@yan12125
Copy link

yan12125 commented Nov 1, 2020

FWIW, I got a similar error message with h5py 3.0, and removing .decode('utf-8') from tensorflow/python/keras/saving/hdf5_format.py allows me to load Keras models as before.

@mihaimaruseac
Copy link
Collaborator

We cannot pin the versions without doing a patch release. We only do patch releases for security issues.

This issue has a quick workaround: pip install tensorflow h5py<3.0.0.

We are in the process of releasing TF 2.4 which should not be affected by this issue.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@mihaimaruseac mihaimaruseac reopened this Nov 2, 2020
@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Nov 3, 2020
@convoliution
Copy link

convoliution commented Nov 4, 2020

Thought it might be pertinent to link to the h5py repo:
This has been labeled as a bug in their 3.0.0 release (rather than a backwards incompatibility),
and is slated to be fixed in an upcoming release: h5py/h5py#1732

edit:
Actually, there might be two separate but related issues here. The bug in the h5py thread occurs during the model-save codepath, but the bug in this TensorFlow thread is in the model-load codepath, which if I had to wildly guess might require a typecheck/cast on TensorFlow's side

andreasbotsikas pushed a commit to andreasbotsikas/ignite-learning-paths-training-aiml that referenced this issue Nov 5, 2020
Addresses training and inference issues  tensorflow/tensorflow#44467
@yan12125
Copy link

yan12125 commented Nov 7, 2020

there might be two separate but related issues here

That seems the case. I tried h5py 3.1.0, and the error AttributeError: 'str' object has no attribute 'decode' still happens.

@MarkSCQ
Copy link

MarkSCQ commented Nov 10, 2020

😆 Thanks!!

@bhack
Copy link
Contributor

bhack commented Apr 8, 2021

Is this still a problem in TF 2.4.0? I still get the error in this case. Need to downgrade h5py.

https://pypi.org/project/tensorflow/2.5.0rc0/

riederd added a commit to icbi-lab/nextNEOpi that referenced this issue Jun 15, 2021
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Jun 22, 2021
Tensorflow update required following update to cuda 11.0.  Based on
https://www.tensorflow.org/install/source#gpu, the 2.4 branch of
tensorflow should be used with cuda 11.0.

- Removed pinned version of keras/h5py, no longer needed.

  tensorflow/tensorflow#44467 (comment)

- Updated tflite version to 2.4.2.  Also, tflite install script now
  reads the installed version of tensorflow, to keep the version
  matched in the future.
mbrookhart pushed a commit to apache/tvm that referenced this issue Jun 22, 2021
* [Docker] Updated tensorflow/tflite version to 2.4.2

Tensorflow update required following update to cuda 11.0.  Based on
https://www.tensorflow.org/install/source#gpu, the 2.4 branch of
tensorflow should be used with cuda 11.0.

- Removed pinned version of keras/h5py, no longer needed.

  tensorflow/tensorflow#44467 (comment)

- Updated tflite version to 2.4.2.  Also, tflite install script now
  reads the installed version of tensorflow, to keep the version
  matched in the future.

* [Docker] Corrected version pinning of xgboost

Previously, due to missing quotes, installed most recent version of
xgboost, piping the results to a file named '=1.1.0'.  Now, installs
xgboost at least at version 1.1.0.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
@Remheob
Copy link

Remheob commented Jul 1, 2021

Hey guys, when Iam trying to downgrade the version of h5py I get the follwing error:

ERROR: Failed building wheel for h5py
Running setup.py clean for h5py
Failed to build h5py
WARNING: Ignoring invalid distribution -ip (d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages)
Installing collected packages: h5py
Attempting uninstall: h5py
WARNING: Ignoring invalid distribution -ip (d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages)
Found existing installation: h5py 3.1.0
Uninstalling h5py-3.1.0:
Successfully uninstalled h5py-3.1.0
Running setup.py install for h5py ... error
ERROR: Command errored out with exit status 1:
command: 'd:\tensorflow object detection\tfodcourse\tfod\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = "C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-install-8sioo065\h5py_f1f64cd207fd4e3fbbc73e1ddc4b40d2\setup.py"; file="C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-install-8sioo065\h5py_f1f64cd207fd4e3fbbc73e1ddc4b40d2\setup.py";f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-record-prai98me\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\tensorflow object detection\tfodcourse\tfod\include\site\python3.9\h5py'
cwd: C:\Users\Arne's PC\AppData\Local\Temp\pip-install-8sioo065\h5py_f1f64cd207fd4e3fbbc73e1ddc4b40d2\

Rolling back uninstall of h5py
Moving to d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages\h5py-3.1.0.dist-info
from D:\Tensorflow Object Detection\TFODCourse\tfod\Lib\site-packages~5py-3.1.0.dist-info
Moving to d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages\h5py
from D:\Tensorflow Object Detection\TFODCourse\tfod\Lib\site-packages~5py
ERROR: Command errored out with exit status 1: 'd:\tensorflow object detection\tfodcourse\tfod\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = "C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-install-8sioo065\h5py_f1f64cd207fd4e3fbbc73e1ddc4b40d2\setup.py"; file="C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-install-8sioo065\h5py_f1f64cd207fd4e3fbbc73e1ddc4b40d2\setup.py";f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Arne'"'"'s PC\AppData\Local\Temp\pip-record-prai98me\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\tensorflow object detection\tfodcourse\tfod\include\site\python3.9\h5py' Check the logs for full command output.
WARNING: Ignoring invalid distribution -ip (d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (d:\tensorflow object detection\tfodcourse\tfod\lib\site-packages)

Would be awesome if someone can help me! :)

felker added a commit to PPPLDeepLearning/plasma-python that referenced this issue Jul 7, 2021
ylc pushed a commit to ylc/tvm that referenced this issue Sep 29, 2021
* [Docker] Updated tensorflow/tflite version to 2.4.2

Tensorflow update required following update to cuda 11.0.  Based on
https://www.tensorflow.org/install/source#gpu, the 2.4 branch of
tensorflow should be used with cuda 11.0.

- Removed pinned version of keras/h5py, no longer needed.

  tensorflow/tensorflow#44467 (comment)

- Updated tflite version to 2.4.2.  Also, tflite install script now
  reads the installed version of tensorflow, to keep the version
  matched in the future.

* [Docker] Corrected version pinning of xgboost

Previously, due to missing quotes, installed most recent version of
xgboost, piping the results to a file named '=1.1.0'.  Now, installs
xgboost at least at version 1.1.0.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
@Nafees-060
Copy link

Nafees-060 commented Oct 27, 2021

I am facing the same issue. Is there any solution came out to fix this bug:

File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py", line 1424, in load_weights saving.load_weights_from_hdf5_group(f, self.layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 711, in load_weights_from_hdf5_group original_keras_version = f.attrs['keras_version'].decode('utf8') AttributeError: 'str' object has no attribute 'decode' python-BaseException

pgunn added a commit to regro-cf-autotick-bot/caiman-feedstock that referenced this issue Dec 2, 2021
Apparently tensorflow/tensorflow#44467 will take a bit longer to really get sorted out and distributed. Sigh.
pgunn added a commit to flatironinstitute/CaImAn that referenced this issue Dec 2, 2021
@60south
Copy link

60south commented Jan 5, 2022

Wait, why is this closed? I'm certainly still having the problem. Reverting to an earlier version of h5py is not a solution, it's a work-around.

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jan 5, 2022

TF 2.1.x is no longer supported. TF 2.4.x was supposed to have this fixed (but also is no longer updated). Any version of TF above 2.4.0 should not have this issue. If there is a similar issue, please open a new issue.

Edit: locking to prevent notifications being sent to everyone that followed this issue over its lifetime and are no longer affected

@tensorflow tensorflow locked as resolved and limited conversation to collaborators Jan 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.