Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with cloudpickle==1.5.0 #991

Closed
ltetrel opened this issue Jul 1, 2020 · 22 comments
Closed

Incompatibility with cloudpickle==1.5.0 #991

ltetrel opened this issue Jul 1, 2020 · 22 comments

Comments

@ltetrel
Copy link

ltetrel commented Jul 1, 2020

Hi all,

Due to a new update, it is not possible to import tensorflow_probability anymore.
Using cloudpickle <= 1.4.1 fixed the issue

>>> import tensorflow_probability as tfp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/__init__.py", line 76, in <module>
    from tensorflow_probability.python import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/__init__.py", line 23, in <module>
    from tensorflow_probability.python import distributions
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/distributions/__init__.py", line 88, in <module>
    from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in <module>
    from tensorflow_probability.python.layers import weight_norm
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/layers/__init__.py", line 31, in <module>
    from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in <module>
    from cloudpickle.cloudpickle import CloudPickler
ImportError: cannot import name 'CloudPickler'
@mli
Copy link

mli commented Jul 1, 2020

+1 got the same issue

@terrytangyuan
Copy link
Member

@jburnim Is 5cc832b a temporary workaround for this or some other issues? Is there any particular reason to pin this specific version of cloudpickle?

@matthewfeickert
Copy link
Contributor

Hi. I opened up an Issue on cloudpickle for this, but we're also observing this in pyhf for the same reasons.

ogrisel added a commit to ogrisel/probability that referenced this issue Jul 2, 2020
matthewfeickert added a commit to scikit-hep/pyhf that referenced this issue Jul 2, 2020
* Explicitly disallow cloudpickle v1.5.0 to avoid breaking TensorFlow Probability 
   - This is a temporary solution to unblock development, but should be removed once TensorFlow Probability v0.11 has been released
   - tensorflow/probability#991
   - cloudpipe/cloudpickle#390
st-- added a commit to GPflow/GPflow that referenced this issue Jul 3, 2020
kaczmarj added a commit to neuronets/nobrainer that referenced this issue Jul 4, 2020
kaczmarj added a commit to neuronets/nobrainer that referenced this issue Jul 4, 2020
st-- added a commit to GPflow/GPflow that referenced this issue Jul 6, 2020
* pin cloudpickle==1.3.0 as temporary workaround for tensorflow/probability#991 to unblock our build (to be reverted once fixed upstream)
@jburnim
Copy link
Member

jburnim commented Jul 6, 2020

Thank you for the report, @ltetrel !

It looks like we pinned the CloudPickle dependency to 1.3 in 5cc832b because of compatibility issues with CloudPickle and some versions of Python 3.5 . (It also looks like CloudPickle has since fixed these issues in cloudpipe/cloudpickle#359 and cloudpipe/cloudpickle#361 .)

As a quick fix, we are considering a TFP 0.10.1 release that is just TFP 0.10.0 but requiring CloudPickle 1.3. (If this would cause problems for you -- e.g., you're using TFP 0.10 and a higher version of CloudPickle -- please comment on this issue to let us know.)

We are also investigating further and taking a look at the fix #993 from @matthewfeickert .

@terrytangyuan
Copy link
Member

@jburnim Sounds great. We have no issue requiring that specific version. Thanks for the prompt response!

@matthewfeickert
Copy link
Contributor

matthewfeickert commented Jul 6, 2020

As a quick fix, we are considering a TFP 0.10.1 release that is just TFP 0.10.0 but requiring CloudPickle 1.3

For the library I work on (pyhf) this would work for the near term. cc @lukasheinrich @kratsg

Though if is possible to have releases that don't explicitly pin dependencies to a single version number I think that's nicer.

We are also investigating further and taking a look at the fix #993

Cool. Let me know if there is anything you need me to iterate on. I haven't taken the time to debug what the one test that is failing in CI is due to (given that all of CI fails at the moment).

@emilyfertig
Copy link
Contributor

Update: We've now released TFP 0.10.1, which pins the CloudPickle version to 1.3, and are still looking into #993 .

@matthewfeickert
Copy link
Contributor

Thank you for fixing this @jburnim! 🙇

vdutor added a commit to GPflow/GPflow that referenced this issue Aug 27, 2020
* Update pull request template (#1510)

Clarify template to make it easier for contributors to fill in relevant information.

* Temporary workaround for tensorflow_probability dependency issue (#1522)

* pin cloudpickle==1.3.0 as temporary workaround for tensorflow/probability#991 to unblock our build (to be reverted once fixed upstream)

* Update readme with new project using GPflow (#1530)

* fix bug in varying_noise notebook (#1526)

* Fix formatting in docs (intro.md) and restore link removed by #1498 (#1520)

* pin tensorflow<2.3 tensorflow-probability<0.11 (#1537)

* Quadrature Refactoring (#1505)

* WIP: quadrature refactoring

* Removing old ndiagquad code

* deleted test code

* formatting and type-hint

* merge modules

* black formatting

* formatting

* solving failing tests

* fixing failing tests

* fixes

* adapting tests for new syntax, keeping numerical behavior

* black formatting

* remove printf

* changed code for compiled tf compatibility

* black

* restored to original version

* undoing changes

* renaming

* renaming

* renaming

* reshape kwargs

* quadrature along axis=-2, simplified broadcasting

* black

* docs

* docs

* helper function

* docstrings and typing

* added new and old quadrature equivalence tests

* black

* Removing comments

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Typo

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* notation

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* reshape_Z_dZ return docstring fix

* FIX: quad_old computed with the ndiagquad_old

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* more readable implementation

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* tf.ensure_shape added

* removed ndiagquad

* removed ndiagquad

* Revert "removed ndiagquad"

This reverts commit 7bb0e9f.

* FIX: shape checking of dZ

* Revert "removed ndiagquad"

This reverts commit 8e23524.

Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: ST John <st@prowler.io>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Add base_conditional_with_lm function (#1528)

* Added base_conditional_with_lm function, which accepts Lm instead of Kmm

Co-authored-by: Neil Ferguson <neil@prowler.io>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>
Co-authored-by: st-- <st--@users.noreply.github.com>

* Fixed separate_independent_conditional to correctly handle q_sqrt=None. (#1533)

* Fixed separate_independent_conditional to correctly handle q_sqrt=None.

Co-authored-by: Aidan Scannell <scannell.aidan@gmail.com>
Co-authored-by: st-- <st--@users.noreply.github.com>

* Bump version numbers to 2.1.0. (#1544)

* Re-introduce pytest-xdist (#1541)

Enables pytest-xdist for locally running tests (`make test`) on multiple cores in parallel.

* check dependency versions are valid on CI (#1536)

* Update to not use custom image (#1545)

* Update to not use custom image

* Add test requirements

* Update parameter to be savable (#1518)

* Fix for quadrature failure mode when autograph was set to False (#1548)

* Fix and test

* Change shape of quadrature tensors for better broadcasting (#1542)

* using the first dimension to hold the quadrature summation

* adapting ndiagquad wrapper

* Changed bf for bX in docstrings

Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: st-- <st--@users.noreply.github.com>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Update min TFP supported version to 0.10 (#1551)

* Broadcasting constant and zero mean function (#1550)

* Broadcasting constant and zero mean function

* Use rank instead of ndim

Co-authored-by: st-- <st--@users.noreply.github.com>
Co-authored-by: joelberkeley-pio <joel.berkeley@prowler.io>
Co-authored-by: gustavocmv <47801305+gustavocmv@users.noreply.github.com>
Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: ST John <st@prowler.io>
Co-authored-by: Neil Ferguson <nfergu@users.noreply.github.com>
Co-authored-by: Neil Ferguson <neil@prowler.io>
Co-authored-by: Aidan Scannell <as12528@my.bristol.ac.uk>
Co-authored-by: Aidan Scannell <scannell.aidan@gmail.com>
Co-authored-by: Sandeep Tailor <s.tailor@insysion.net>
Co-authored-by: Artem Artemev <art.art.v@gmail.com>
@StevenSong
Copy link

this is an issue again with the latest version of tensorflow 2.3.1 (security patch from 4 days ago) which has cloudpickle dependency >= 1.5.0 - meanwhile tensorflow-probability 0.11.0 still has cloudpickle == 1.3.0

@matthewfeickert
Copy link
Contributor

this is an issue again with the latest version of tensorflow 2.3.1 (security patch from 4 days ago) which has cloudpickle dependency >= 1.5.0 - meanwhile tensorflow-probability 0.11.0 still has cloudpickle == 1.3.0

The good news is that this was already resolved in TFP 7601ef6. So the next TFP release should have this all taken care of. I'm not sure what the release schedule for TFP is though.

@csuter
Copy link
Member

csuter commented Sep 28, 2020

@matthewfeickert we generally build a new stable release whenever TF does, since in general we end up depending on new TF features in between their (and hence our) stable releases. We could increase our (TFP's) release cadence, so long as we either a) don't have such deps on not-yet-in-stable-TF features, or b) can easily hack around such issues on the release branch.

@matthewfeickert
Copy link
Contributor

Thanks for that info @csuter. 👍 I wasn't meaning to complain about not knowing (it isn't super important to me and I know full well that having people ask about release schedules can be a tiresome discussion point), but I do appreciate you offering up this information here as that probably helps people (though I'm sure had I searched harder I would have already found this information in another Issue).

@csuter
Copy link
Member

csuter commented Sep 28, 2020

Definitely didn't detect any complaint! And I could talk release schedules all day! 😁

I think we (TFP) could probably do a bit better at communicating these processes, so folks don't have to go digging in Issues to find the info. Then again Google is a really good search engine, so maybe it's fine to have these bits buried in here 😅

@csuter
Copy link
Member

csuter commented Sep 28, 2020

Quick update (h/t to @brianwa84 for pointing out to me the actual context here, which I overlooked) -- TFP should actually release a patch to go with the TF 2.3.1 patch here. We'll look into it ASAP.

csuter pushed a commit to csuter/probability that referenced this issue Sep 28, 2020
…1.3.

Checked that distribution_layer_test passes with:
 - CloudPickle 1.3.0, 1.4.1, and 1.5.0 .
 - Python 3.5 and 3.8 .

Fixes tensorflow#991 .

Thanks to https://github.com/matthewfeickert and https://github.com/ogrisel for helping with this issue!

PiperOrigin-RevId: 323834575
@csuter
Copy link
Member

csuter commented Sep 28, 2020

TFP 0.11.1 is up on pypi now, and should work fine with TF 2.3.1 and newer cloudpickles. Please let us know if you run into further issues!

@Edvard-D
Copy link

Edvard-D commented Apr 11, 2021

Sorry, not sure if this is the right place to post this, but I'm trying to run a training job on Google Cloud AI Platform and this error is being thrown when specifying Python v3.7 and TensorFlow v2.3.1 when setting up the training job.

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trader_trainer/training/trader.py", line 2, in
from trader_trainer.training import trainers
File "/root/.local/lib/python3.7/site-packages/trader_trainer/training/trainers.py", line 7, in
from trader_trainer.shared.predictors import ActorCriticTimeSeriesPredictor
File "/root/.local/lib/python3.7/site-packages/trader_trainer/shared/predictors.py", line 3, in
import tensorflow_probability as tfp
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/init.py", line 77, in
from tensorflow_probability.python import * # pylint: disable=wildcard-import
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/init.py", line 23, in
from tensorflow_probability.python import distributions
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/init.py", line 94, in
from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in
from tensorflow_probability.python.layers import weight_norm
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/init.py", line 31, in
from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in
from cloudpickle.cloudpickle import CloudPickler
ImportError: cannot import name 'CloudPickler' from 'cloudpickle.cloudpickle' (/opt/conda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py)

@jimzer
Copy link

jimzer commented Apr 22, 2021

We encounter the same problem on the Google Cloud AI Platform, deploying a TFX pipeline with a model using TensorFlow probability

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 364, in
main()
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/kubeflow/container_entrypoint.py", line 357, in main
execution_info = launcher.launch()
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py", line 209, in launch
copy.deepcopy(execution_decision.exec_properties))
File "/opt/conda/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py", line 72, in _run_executor
copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))
File "/opt/conda/lib/python3.7/site-packages/tfx/components/trainer/executor.py", line 182, in Do
run_fn = udf_utils.get_fn(exec_properties, 'run_fn')
File "/opt/conda/lib/python3.7/site-packages/tfx/components/util/udf_utils.py", line 49, in get_fn
exec_properties[_MODULE_FILE_KEY], fn_name)
File "/opt/conda/lib/python3.7/site-packages/tfx/utils/import_utils.py", line 127, in import_func_from_source
loader.exec_module(module)
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "./censeo/models/vae_trainer.py", line 10, in
import tensorflow_probability as tfp
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/init.py", line 77, in
from tensorflow_probability.python import * # pylint: disable=wildcard-import
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/init.py", line 23, in
from tensorflow_probability.python import distributions
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/init.py", line 94, in
from tensorflow_probability.python.distributions.pixel_cnn import PixelCNN
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/distributions/pixel_cnn.py", line 37, in
from tensorflow_probability.python.layers import weight_norm
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/init.py", line 31, in
from tensorflow_probability.python.layers.distribution_layer import CategoricalMixtureOfOneHotCategorical
File "/opt/conda/lib/python3.7/site-packages/tensorflow_probability/python/layers/distribution_layer.py", line 28, in
from cloudpickle.cloudpickle import CloudPickler
ImportError: cannot import name 'CloudPickler' from 'cloudpickle.cloudpickle' (/opt/conda/lib/python3.7/site-packages/cloudpickle/cloudpickle.py)

@Edvard-D
Copy link

Edvard-D commented Apr 22, 2021

We encounter the same problem on the Google Cloud AI Platform, deploying a TFX pipeline with a model using TensorFlow probability

I resolved this by forcing AI Platform to update tensorflow_probability by adding it as one of the required packages in the setup file using: 'tensorflow_probability>=0.11.1'. It seems that AI Platform is using an out of date version of tensorflow_probability. Definitely needs to be fixed on Google's end, but at least there's a work around.

@entrpn
Copy link

entrpn commented Jun 18, 2021

@jimzer @Edvard-D can you guys pls share your configuration.

I am having the same issue using TFX pipeline with a model using tensorflow probability in gcp ai platform. But I can't figure out how to set the runtime version correctly. If i just add the runtime version, I get:

'description': "The specified runtime version '2.4' with the Python version '' is not supported or is deprecated. Please specify a different runtime version.

args:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'runtime-version': 2.4
}

Then I try adding python and I get:

'description': 'Only one of runtime version or the master Docker image URI should be provided.'}]}]">

args:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'runtimeVersion': '2.4',
    'pythonVersion': '3.7'
}

Thank you.

@Edvard-D
Copy link

Edvard-D commented Jun 18, 2021

@entrpn runtimeVersion refers to the Tensorflow version. I'm not sure about the error you're getting, but I'm submitting it using the following code which essentially submits it via command line:

command_arguments = \
[
    'gcloud', 'ai-platform', 'jobs', 'submit', 'training', TRAINING_NAME,
    '--scale-tier', 'custom',
    '--master-machine-type', CPU_TYPE,
    '--job-dir', JOB_DIRECTORY,
    '--package-path', PACKAGE_PATH,
    '--module-name', MODULE_NAME,
    '--region', 'us-central1',
    '--runtime-version', '2.4',
    '--python-version', '3.7'
]
subprocess.Popen(command_arguments, shell=True)

(if you're using Linux I'm pretty sure you should remove "shell=True")

@entrpn
Copy link

entrpn commented Jun 18, 2021

@Edvard-D thank you for the quick reply. I'm using tfx as the framework to launch the training job and unfortunately I can't pass parameters like you do. Hopefully @jimzer has a working example with tfx.

@entrpn
Copy link

entrpn commented Jun 25, 2021

I finally solved my issue. Had to dig through the tfx code to figure out how the Trainer component works. If anyone comes across this and is struggling, hopefully it will help.

The issue with tfx and ai platform is that you can't specify runtime version or python version because tfx uses containers. So the way to go about this is to first create a container that uses a tfx image as base. In my case, I needed tensorflow probability so:

FROM gcr.io/tfx-oss-public/tfx:0.30.0
RUN pip install tensorflow-probability==0.12.2

build it and push it to your projects container registry. Then add it to the trainer args like:

_ai_platform_training_args = {
    'project': PROJECT_ID,
    'region': GCP_REGION,
    'masterConfig' : {'imageUri': 'gcr.io/my_project/tfp_trainer:latest'}
}

Also copy the trainer python file that is used by the Trainer component into a gcs bucket so that the image can access it. Then run the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet