Skip to content

[DataflowRuntimeException] ImportError: No module named tfdv.statistics.stats_impl #38

@yonromai

Description

@yonromai

Context

When running tfdv.generate_statistics_from_tfrecord on Dataflow, the job gets submitted successfully to the cluster but I get a:
ImportError: No module named tensorflow_data_validation.statistics.stats_impl during the job unpickling phase in the Dataflow worker

Error trace

---------------------------------------------------------------------------
DataflowRuntimeException                  Traceback (most recent call last)
<ipython-input-23-8f1147effd88> in <module>()
     16 # for more options about stats, run `?tfdv.generate_statistics_from_tfrecord`
     17 tfdv.generate_statistics_from_tfrecord(TFRECORDS_PATH, 
---> 18                                        pipeline_options=pipeline_options)

/Users/romain/dev/venv/lib/python2.7/site-packages/tensorflow_data_validation/utils/stats_gen_lib.pyc in generate_statistics_from_tfrecord(data_location, output_path, stats_options, pipeline_options)
     86             shard_name_template='',
     87             coder=beam.coders.ProtoCoder(
---> 88                 statistics_pb2.DatasetFeatureStatisticsList)))
     89   return load_statistics(output_path)
     90 

/Users/romain/dev/venv/lib/python2.7/site-packages/apache_beam/pipeline.pyc in __exit__(self, exc_type, exc_val, exc_tb)
    421   def __exit__(self, exc_type, exc_val, exc_tb):
    422     if not exc_type:
--> 423       self.run().wait_until_finish()
    424 
    425   def visit(self, visitor):

/Users/romain/dev/venv/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.pyc in wait_until_finish(self, duration)
   1164         raise DataflowRuntimeException(
   1165             'Dataflow pipeline failed. State: %s, Error:\n%s' %
-> 1166             (self.state, getattr(self._runner, 'last_error_msg', None)), self)
   1167     return self.state
   1168 

DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 130, in execute
    test_shuffle_sink=self._test_shuffle_sink)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 104, in create_operation
    is_streaming=False)
  File "apache_beam/runners/worker/operations.py", line 636, in apache_beam.runners.worker.operations.create_operation
    op = create_pgbk_op(name_context, spec, counter_factory, state_sampler)
  File "apache_beam/runners/worker/operations.py", line 482, in apache_beam.runners.worker.operations.create_pgbk_op
    return PGBKCVOperation(step_name, spec, counter_factory, state_sampler)
  File "apache_beam/runners/worker/operations.py", line 538, in apache_beam.runners.worker.operations.PGBKCVOperation.__init__
    fn, args, kwargs = pickler.loads(self.spec.combine_fn)[:3]
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 246, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 316, in loads
    return load(file, ignore)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 304, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 465, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named tensorflow_data_validation.statistics.stats_impl

What code did I run?

!pip install -U tensorflow \
                tensorflow-data-validation \
                apache-beam[gcp]
import tensorflow_data_validation as tfdv
from apache_beam.options.pipeline_options import PipelineOptions, GoogleCloudOptions, StandardOptions, SetupOptions

# Create and set your PipelineOptions.
pipeline_options = PipelineOptions()

# For Cloud execution, set the Cloud Platform project, job_name,
# staging location, temp_location and specify DataflowRunner.
google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
google_cloud_options.project = PROJECT_ID
google_cloud_options.job_name = JOB_NAME
google_cloud_options.staging_location = GCS_STAGING_LOCATION
google_cloud_options.temp_location = GCS_TMP_LOCATION
pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'
    
tfdv.generate_statistics_from_tfrecord(TFRECORDS_PATH, 
                                       pipeline_options=pipeline_options)

Pip trace

Requirement already up-to-date: tensorflow in /Users/romain/dev/venv/lib/python2.7/site-packages (1.12.0)
Requirement already up-to-date: tensorflow-data-validation in /Users/romain/dev/venv/lib/python2.7/site-packages (0.11.0)
Requirement already up-to-date: apache-beam[gcp] in /Users/romain/dev/venv/lib/python2.7/site-packages (2.8.0)
Requirement already satisfied, skipping upgrade: enum34>=1.1.6 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.1.6)
Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.0.5)
Requirement already satisfied, skipping upgrade: wheel in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (0.31.1)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (0.7.1)
Requirement already satisfied, skipping upgrade: backports.weakref>=1.0rc1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.0.post1)
Requirement already satisfied, skipping upgrade: mock>=2.0.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (2.0.0)
Requirement already satisfied, skipping upgrade: tensorboard<1.13.0,>=1.12.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.12.0)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.1.0)
Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (3.6.1)
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (0.2.0)
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (0.3.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.13.0)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.10.0)
Requirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.0.6)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow) (1.14.0)
Requirement already satisfied, skipping upgrade: IPython<6,>=5.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow-data-validation) (5.7.0)
Requirement already satisfied, skipping upgrade: tensorflow-metadata<0.10,>=0.9 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow-data-validation) (0.9.0)
Requirement already satisfied, skipping upgrade: tensorflow-transform<0.12,>=0.11 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow-data-validation) (0.11.0)
Requirement already satisfied, skipping upgrade: pandas<1,>=0.18 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow-data-validation) (0.22.0)
Requirement already satisfied, skipping upgrade: oauth2client<5,>=2.0.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (4.1.3)
Requirement already satisfied, skipping upgrade: dill<=0.2.8.2,>=0.2.6 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.2.8.2)
Requirement already satisfied, skipping upgrade: pydot<1.3,>=1.2.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (1.2.4)
Requirement already satisfied, skipping upgrade: pyyaml<4.0.0,>=3.12 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (3.12)
Requirement already satisfied, skipping upgrade: pyvcf<0.7.0,>=0.6.8 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.6.8)
Requirement already satisfied, skipping upgrade: typing<3.7.0,>=3.6.0; python_version < "3.5.0" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (3.6.4)
Requirement already satisfied, skipping upgrade: avro<2.0.0,>=1.8.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (1.8.2)
Requirement already satisfied, skipping upgrade: future<1.0.0,>=0.16.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.16.0)
Requirement already satisfied, skipping upgrade: fastavro<0.22,>=0.21.4 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.21.13)
Requirement already satisfied, skipping upgrade: crcmod<2.0,>=1.7 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (1.7)
Requirement already satisfied, skipping upgrade: httplib2<=0.11.3,>=0.8 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.11.3)
Requirement already satisfied, skipping upgrade: futures<4.0.0,>=3.1.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (3.2.0)
Requirement already satisfied, skipping upgrade: hdfs<3.0.0,>=2.1.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (2.1.0)
Requirement already satisfied, skipping upgrade: pytz<=2018.4,>=2018.3 in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (2018.4)
Requirement already satisfied, skipping upgrade: google-apitools<=0.5.20,>=0.5.18; extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.5.20)
Requirement already satisfied, skipping upgrade: proto-google-cloud-pubsub-v1==0.15.4; extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.15.4)
Requirement already satisfied, skipping upgrade: googledatastore==7.0.1; python_version < "3.0" and extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (7.0.1)
Requirement already satisfied, skipping upgrade: google-cloud-bigquery==0.25.0; extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.25.0)
Requirement already satisfied, skipping upgrade: google-cloud-pubsub==0.26.0; extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.26.0)
Requirement already satisfied, skipping upgrade: proto-google-cloud-datastore-v1<=0.90.4,>=0.90.0; extra == "gcp" in /Users/romain/dev/venv/lib/python2.7/site-packages (from apache-beam[gcp]) (0.90.4)
Requirement already satisfied, skipping upgrade: funcsigs>=1; python_version < "3.3" in /Users/romain/dev/venv/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow) (1.0.2)
Requirement already satisfied, skipping upgrade: pbr>=0.11 in /Users/romain/dev/venv/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow) (1.10.0)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.10 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (2.6.11)
Requirement already satisfied, skipping upgrade: setuptools in /Users/romain/dev/venv/lib/python2.7/site-packages (from protobuf>=3.6.1->tensorflow) (39.1.0)
Requirement already satisfied, skipping upgrade: h5py in /Users/romain/dev/venv/lib/python2.7/site-packages (from keras-applications>=1.0.6->tensorflow) (2.8.0)
Requirement already satisfied, skipping upgrade: simplegeneric>0.8 in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (0.8.1)
Requirement already satisfied, skipping upgrade: pygments in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (2.2.0)
Requirement already satisfied, skipping upgrade: backports.shutil-get-terminal-size; python_version == "2.7" in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (1.0.0)
Requirement already satisfied, skipping upgrade: pexpect; sys_platform != "win32" in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (4.6.0)
Requirement already satisfied, skipping upgrade: prompt-toolkit<2.0.0,>=1.0.4 in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (1.0.15)
Requirement already satisfied, skipping upgrade: decorator in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (4.3.0)
Requirement already satisfied, skipping upgrade: pickleshare in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (0.7.4)
Requirement already satisfied, skipping upgrade: appnope; sys_platform == "darwin" in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (0.1.0)
Requirement already satisfied, skipping upgrade: traitlets>=4.2 in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (4.3.2)
Requirement already satisfied, skipping upgrade: pathlib2; python_version == "2.7" or python_version == "3.3" in /Users/romain/dev/venv/lib/python2.7/site-packages (from IPython<6,>=5.0->tensorflow-data-validation) (2.3.2)
Requirement already satisfied, skipping upgrade: googleapis-common-protos in /Users/romain/dev/venv/lib/python2.7/site-packages (from tensorflow-metadata<0.10,>=0.9->tensorflow-data-validation) (1.5.3)
Requirement already satisfied, skipping upgrade: python-dateutil in /Users/romain/dev/venv/lib/python2.7/site-packages (from pandas<1,>=0.18->tensorflow-data-validation) (2.7.3)
Requirement already satisfied, skipping upgrade: rsa>=3.1.4 in /Users/romain/dev/venv/lib/python2.7/site-packages (from oauth2client<5,>=2.0.1->apache-beam[gcp]) (3.4.2)
Requirement already satisfied, skipping upgrade: pyasn1>=0.1.7 in /Users/romain/dev/venv/lib/python2.7/site-packages (from oauth2client<5,>=2.0.1->apache-beam[gcp]) (0.1.9)
Requirement already satisfied, skipping upgrade: pyasn1-modules>=0.0.5 in /Users/romain/dev/venv/lib/python2.7/site-packages (from oauth2client<5,>=2.0.1->apache-beam[gcp]) (0.0.8)
Requirement already satisfied, skipping upgrade: pyparsing>=2.1.4 in /Users/romain/dev/venv/lib/python2.7/site-packages (from pydot<1.3,>=1.2.0->apache-beam[gcp]) (2.1.10)
Requirement already satisfied, skipping upgrade: docopt in /Users/romain/dev/venv/lib/python2.7/site-packages (from hdfs<3.0.0,>=2.1.0->apache-beam[gcp]) (0.6.2)
Requirement already satisfied, skipping upgrade: requests>=2.7.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from hdfs<3.0.0,>=2.1.0->apache-beam[gcp]) (2.11.1)
Requirement already satisfied, skipping upgrade: fasteners>=0.14 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-apitools<=0.5.20,>=0.5.18; extra == "gcp"->apache-beam[gcp]) (0.14.1)
Requirement already satisfied, skipping upgrade: google-cloud-core<0.26dev,>=0.25.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-cloud-bigquery==0.25.0; extra == "gcp"->apache-beam[gcp]) (0.25.0)
Requirement already satisfied, skipping upgrade: gapic-google-cloud-pubsub-v1<0.16dev,>=0.15.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-cloud-pubsub==0.26.0; extra == "gcp"->apache-beam[gcp]) (0.15.4)
Requirement already satisfied, skipping upgrade: ptyprocess>=0.5 in /Users/romain/dev/venv/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->IPython<6,>=5.0->tensorflow-data-validation) (0.5.2)
Requirement already satisfied, skipping upgrade: wcwidth in /Users/romain/dev/venv/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->IPython<6,>=5.0->tensorflow-data-validation) (0.1.7)
Requirement already satisfied, skipping upgrade: ipython-genutils in /Users/romain/dev/venv/lib/python2.7/site-packages (from traitlets>=4.2->IPython<6,>=5.0->tensorflow-data-validation) (0.2.0)
Requirement already satisfied, skipping upgrade: scandir; python_version < "3.5" in /Users/romain/dev/venv/lib/python2.7/site-packages (from pathlib2; python_version == "2.7" or python_version == "3.3"->IPython<6,>=5.0->tensorflow-data-validation) (1.7)
Requirement already satisfied, skipping upgrade: monotonic>=0.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from fasteners>=0.14->google-apitools<=0.5.20,>=0.5.18; extra == "gcp"->apache-beam[gcp]) (1.5)
Requirement already satisfied, skipping upgrade: google-auth-httplib2 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-cloud-core<0.26dev,>=0.25.0->google-cloud-bigquery==0.25.0; extra == "gcp"->apache-beam[gcp]) (0.0.3)
Requirement already satisfied, skipping upgrade: google-auth<2.0.0dev,>=0.4.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-cloud-core<0.26dev,>=0.25.0->google-cloud-bigquery==0.25.0; extra == "gcp"->apache-beam[gcp]) (1.1.1)
Requirement already satisfied, skipping upgrade: grpc-google-iam-v1<0.12dev,>=0.11.1 in /Users/romain/dev/venv/lib/python2.7/site-packages (from gapic-google-cloud-pubsub-v1<0.16dev,>=0.15.0->google-cloud-pubsub==0.26.0; extra == "gcp"->apache-beam[gcp]) (0.11.4)
Requirement already satisfied, skipping upgrade: google-gax<0.16dev,>=0.15.7 in /Users/romain/dev/venv/lib/python2.7/site-packages (from gapic-google-cloud-pubsub-v1<0.16dev,>=0.15.0->google-cloud-pubsub==0.26.0; extra == "gcp"->apache-beam[gcp]) (0.15.16)
Requirement already satisfied, skipping upgrade: cachetools>=2.0.0 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-auth<2.0.0dev,>=0.4.0->google-cloud-core<0.26dev,>=0.25.0->google-cloud-bigquery==0.25.0; extra == "gcp"->apache-beam[gcp]) (2.0.1)
Requirement already satisfied, skipping upgrade: ply==3.8 in /Users/romain/dev/venv/lib/python2.7/site-packages (from google-gax<0.16dev,>=0.15.7->gapic-google-cloud-pubsub-v1<0.16dev,>=0.15.0->google-cloud-pubsub==0.26.0; extra == "gcp"->apache-beam[gcp]) (3.8)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions